SpectralMoE

Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts (CVPR2026)

Paper web page:Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts.

SpectralMoE is inserted as a lightweight plugin into each layer of frozen VFMs and DFMs. At its core is a dual-gated MoE mechanism. A dual-gated network independently routes visual and depth feature tokens to specialized experts, enabling fine-grained, spatially-adaptive adjustments that overcome the limitations of global, homogeneous methods. Following this expert-based refinement, a Cross-Attention Fusion Module adaptively injects the robust spatial structural information from the adjusted depth features into the visual features. This fusion process effectively mitigates semantic ambiguity caused by spectral shifts, significantly enhancing the model's cross-domain generalization capability.

Citation：

Please cite us if our project is helpful to you!

@misc{chen2026localpreciserefinementdualgated,
      title={Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts}, 
      author={Xi Chen and Maojun Zhang and Yu Liu and Shen Yan},
      year={2026},
      eprint={2603.13352},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2603.13352}, 
}

Cross-Sensor and Cross-Geospatial Generalization Tasks

Our experiments establish cross-sensor and cross-geospatial generalization tasks based on GF-2 MSIs from the Five-Billion-Pixels dataset. For the cross-sensor task, these GF-2 MSIs serve as the source domain, while MSIs from GF-1, PlanetScope, and Sentinel-2 form the target domains. For the cross-geospatial task, we partition the GF-2 MSIs within the Five-Billion-Pixels dataset into geographically disjoint source domain and target domain. Subfigure (a) presents the domain distribution for the cross-sensor task, where locations corresponding to the source domain (GF-2 imagery) are marked by blue solid circles, and those corresponding to the target domains (PlanetScope, GF-1, and Sentinel-2 imagery) are indicated by red circles. Subfigure (b) illustrates the domain distribution for the cross-geospatial task, with blue solid circles representing the source domain (GF-2 imagery from various regions) and red solid circles denoting the target domain (GF-2 imagery from designated cities).

Visualization

Qualitative results for cross-sensor multispectral DGSS task. Comparative visualization of land cover classification from the DSTC, frozen RSFM + Mask2Former decoder (SoftCon, Galileo, SenPaMAE, Copernicus, DOFA), frozen VFM + Mask2Former decoder (CLIP, SAM, EVA02, DINOv2,DINOv3), FM-based DG semantic segmentation methods (SET, FADA, Rein, DepthForge), and our proposed SpectralMoE. Input MSIs and corresponding ground truth maps are also shown for reference. SpectralMoE exhibits superior accuracy in challenging cross-sensor scenarios. Please zoom in to the white box region to see more details.
Quantitative Results for Cross-Sensor Multispectral DGSS Task

Setting	mIoU	Config	Checkpoint
Cross-sensor task	66.19	config	checkpoint

Environment Setup

conda create -n SpectralMoE python=3.10
conda activate SpectralMoE
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -U openmim
mim install mmengine==0.10.7
mim install mmcv==2.0.0
mim install mmdet==3.3.0
mim install mmsegmentation==1.2.2
pip install xformers==0.0.20
pip install pillow==11.1.0
pip install numpy==1.26.3
pip install timm==0.4.12
pip install einops==0.8.0
pip install ftfy==6.3.1
pip install matplotlib==3.10.0
pip install prettytable==3.12.0
pip install GDAL==3.6.1
pip install future tensorboard

Data Processing

The data folder structure should look like this:

data
├── GID
│   ├── source_dir
│   │   ├── image
│   │   ├── label
│   ├── target_dir
│   │   ├── image
│   │   ├── label
├── Potsdam2Vaihingen
│   ├── Potsdam
│   |   ├── image
│   │   ├── label
│   ├── Vaihingen
│   |   ├── image
│   |   ├── label
├── ...

Constructing Cross-Sensor Generalization Tasks

Remove padding: This script is designed to remove the "padding" from your GID (Five-Billion-Pixels dataset) images and their corresponding labels. GID images come with extra borders around the actual data, and this script helps you crop them out consistently.
```
python tools\convert_datasets\remove_padding.py
```
Note: You need to specify your input data paths and the paths for processing output data.
Crop GID images and labels: Crop GF-2 multispectral images and labels to a size of 512x512.
```
python tools\convert_datasets\cut_GID.py
python tools\convert_datasets\cut_GID_label.py
```
Note: You need to specify your input data paths and the paths for processing output data.
Rename: Unify the names of labels with their corresponding GF-2 multispectral images.
```
python tools\convert_datasets\rename_gid_label.py
```
Note: You need to specify your input data paths.
Target Domain Data Processing: Extract multispectral images and labels from the annotated regions of five megacities in China to serve as target domain data.
```
python tools\convert_datasets\GID_target_process.py
```
Note: You need to specify your input data paths and the paths for processing output data.

Then, crop the target domain multispectral images and their corresponding labels to a size of 512x512.
```
python tools\convert_datasets\cut_GID.py
python tools\convert_datasets\cut_GID_label.py
```
Note: You need to specify your input data paths and the paths for processing output data.

Ecaluation

First, download DINOv3 and PromptDA pre-trained weights and process the Dinov3 Large and PromptDA Large pre-trained model weights.

python tools/convert_models/convert_dinov3_depthmoe.py checkpoints/DINOv3/lvd1689m/dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth checkpoints/promptda_vitl.ckpt checkpoints/dinov3_converted_depthmoe.pth

Then, perform inference using the trained model.

python tools/test.py configs/dinov3/depthmoe_dinov3_mask2former_512x512_bs1x8_gid_k1_Ne6_r16.py work_dirs/iter_70884.pth --backbone checkpoints/dinov3_converted_depthmoe.pth

Finally, visualize the land cover classification results.

python tools\visualizesegmentationmap.py

Note: You need to specify your input data paths and the paths for processing output data.

Training

First, download DINOv3 and PromptDA pre-trained weights and process the Dinov3 Large and PromptDA Large pre-trained model weights.

python tools/convert_models/convert_dinov3_depthmoe.py checkpoints/DINOv3/lvd1689m/dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth checkpoints/promptda_vitl.ckpt checkpoints/dinov3_converted_depthmoe.pth

Then, begin training.

PORT=12345 CUDA_VISIBLE_DEVICES=0,1 bash tools/dist_train.sh configs/dinov3/depthmoe_dinov3_mask2former_512x512_bs1x8_gid_k1_Ne6_r16.py 2

Acknowledgment

Our implementation is mainly based on following repositories. Thanks for their authors.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
SpectralMoE		SpectralMoE
checkpoints		checkpoints
configs		configs
data/GID		data/GID
docs		docs
rs_dataset		rs_dataset
tools		tools
work_dirs		work_dirs
README.md		README.md
visualizesegmentationmap.py		visualizesegmentationmap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpectralMoE

Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts (CVPR2026)

Citation：

Cross-Sensor and Cross-Geospatial Generalization Tasks

Visualization

Environment Setup

Data Processing

Constructing Cross-Sensor Generalization Tasks

Ecaluation

Training

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpectralMoE

Local Precise Refinement: A Dual-Gated Mixture-of-Experts for Enhancing Foundation Model Generalization against Spectral Shifts (CVPR2026)

Citation：

Cross-Sensor and Cross-Geospatial Generalization Tasks

Visualization

Environment Setup

Data Processing

Constructing Cross-Sensor Generalization Tasks

Ecaluation

Training

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages