Skip to content

daxichen/SpectralMoE-main

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpectralMoE

Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts (CVPR2026)

Paper web page:Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts.

SpectralMoE Framework

SpectralMoE is inserted as a lightweight plugin into each layer of frozen VFMs and DFMs. At its core is a dual-gated MoE mechanism. A dual-gated network independently routes visual and depth feature tokens to specialized experts, enabling fine-grained, spatially-adaptive adjustments that overcome the limitations of global, homogeneous methods. Following this expert-based refinement, a Cross-Attention Fusion Module adaptively injects the robust spatial structural information from the adjusted depth features into the visual features. This fusion process effectively mitigates semantic ambiguity caused by spectral shifts, significantly enhancing the model's cross-domain generalization capability.

Citation:

Please cite us if our project is helpful to you!

@misc{chen2026localpreciserefinementdualgated,
      title={Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts}, 
      author={Xi Chen and Maojun Zhang and Yu Liu and Shen Yan},
      year={2026},
      eprint={2603.13352},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2603.13352}, 
}

Cross-Sensor and Cross-Geospatial Generalization Tasks

Our experiments establish cross-sensor and cross-geospatial generalization tasks based on GF-2 MSIs from the Five-Billion-Pixels dataset. For the cross-sensor task, these GF-2 MSIs serve as the source domain, while MSIs from GF-1, PlanetScope, and Sentinel-2 form the target domains. For the cross-geospatial task, we partition the GF-2 MSIs within the Five-Billion-Pixels dataset into geographically disjoint source domain and target domain. Geographical-distribution Subfigure (a) presents the domain distribution for the cross-sensor task, where locations corresponding to the source domain (GF-2 imagery) are marked by blue solid circles, and those corresponding to the target domains (PlanetScope, GF-1, and Sentinel-2 imagery) are indicated by red circles. Subfigure (b) illustrates the domain distribution for the cross-geospatial task, with blue solid circles representing the source domain (GF-2 imagery from various regions) and red solid circles denoting the target domain (GF-2 imagery from designated cities).

Visualization

  • Qualitative results for cross-sensor multispectral DGSS task. Comparative visualization of land cover classification from the DSTC, frozen RSFM + Mask2Former decoder (SoftCon, Galileo, SenPaMAE, Copernicus, DOFA), frozen VFM + Mask2Former decoder (CLIP, SAM, EVA02, DINOv2,DINOv3), FM-based DG semantic segmentation methods (SET, FADA, Rein, DepthForge), and our proposed SpectralMoE. Input MSIs and corresponding ground truth maps are also shown for reference. SpectralMoE exhibits superior accuracy in challenging cross-sensor scenarios. Please zoom in to the white box region to see more details. Cross_sensor

  • Quantitative Results for Cross-Sensor Multispectral DGSS Task

Setting mIoU Config Checkpoint
Cross-sensor task 66.19 config checkpoint

Environment Setup

conda create -n SpectralMoE python=3.10
conda activate SpectralMoE
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -U openmim
mim install mmengine==0.10.7
mim install mmcv==2.0.0
mim install mmdet==3.3.0
mim install mmsegmentation==1.2.2
pip install xformers==0.0.20
pip install pillow==11.1.0
pip install numpy==1.26.3
pip install timm==0.4.12
pip install einops==0.8.0
pip install ftfy==6.3.1
pip install matplotlib==3.10.0
pip install prettytable==3.12.0
pip install GDAL==3.6.1
pip install future tensorboard

Data Processing

The data folder structure should look like this:

data
├── GID
│   ├── source_dir
│   │   ├── image
│   │   ├── label
│   ├── target_dir
│   │   ├── image
│   │   ├── label
├── Potsdam2Vaihingen
│   ├── Potsdam
│   |   ├── image
│   │   ├── label
│   ├── Vaihingen
│   |   ├── image
│   |   ├── label
├── ...

Constructing Cross-Sensor Generalization Tasks

  • Remove padding: This script is designed to remove the "padding" from your GID (Five-Billion-Pixels dataset) images and their corresponding labels. GID images come with extra borders around the actual data, and this script helps you crop them out consistently.

    python tools\convert_datasets\remove_padding.py
    

    Note: You need to specify your input data paths and the paths for processing output data.

  • Crop GID images and labels: Crop GF-2 multispectral images and labels to a size of 512x512.

    python tools\convert_datasets\cut_GID.py
    python tools\convert_datasets\cut_GID_label.py
    

    Note: You need to specify your input data paths and the paths for processing output data.

  • Rename: Unify the names of labels with their corresponding GF-2 multispectral images.

    python tools\convert_datasets\rename_gid_label.py
    

    Note: You need to specify your input data paths.

  • Target Domain Data Processing: Extract multispectral images and labels from the annotated regions of five megacities in China to serve as target domain data.

    python tools\convert_datasets\GID_target_process.py
    

    Note: You need to specify your input data paths and the paths for processing output data.

    Then, crop the target domain multispectral images and their corresponding labels to a size of 512x512.

    python tools\convert_datasets\cut_GID.py
    python tools\convert_datasets\cut_GID_label.py
    

    Note: You need to specify your input data paths and the paths for processing output data.

Ecaluation

  • First, download DINOv3 and PromptDA pre-trained weights and process the Dinov3 Large and PromptDA Large pre-trained model weights.
python tools/convert_models/convert_dinov3_depthmoe.py checkpoints/DINOv3/lvd1689m/dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth checkpoints/promptda_vitl.ckpt checkpoints/dinov3_converted_depthmoe.pth
  • Then, perform inference using the trained model.
python tools/test.py configs/dinov3/depthmoe_dinov3_mask2former_512x512_bs1x8_gid_k1_Ne6_r16.py work_dirs/iter_70884.pth --backbone checkpoints/dinov3_converted_depthmoe.pth
  • Finally, visualize the land cover classification results.
python tools\visualizesegmentationmap.py

Note: You need to specify your input data paths and the paths for processing output data.

Training

  • First, download DINOv3 and PromptDA pre-trained weights and process the Dinov3 Large and PromptDA Large pre-trained model weights.
python tools/convert_models/convert_dinov3_depthmoe.py checkpoints/DINOv3/lvd1689m/dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth checkpoints/promptda_vitl.ckpt checkpoints/dinov3_converted_depthmoe.pth
  • Then, begin training.
PORT=12345 CUDA_VISIBLE_DEVICES=0,1 bash tools/dist_train.sh configs/dinov3/depthmoe_dinov3_mask2former_512x512_bs1x8_gid_k1_Ne6_r16.py 2

Acknowledgment

Our implementation is mainly based on following repositories. Thanks for their authors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors