MM-OVSeg: Optical–SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing

✨CVPR 2026✨

Yimin Wei^1,2*, Aoran Xiao^2*, Hongruixuan Chen^1,2, Junshi Xia², Naoto Yokoya^{1,2 †}

¹ The University of Tokyo, ² RIKEN AIP

^* Equal contribution, ^† Corresponding author

🛎️News

Mar 20th, 2026: The arXiv paper of MM-OVSeg is now online. If you are interested in details of MM-OVSeg, do not hesitate to take a look!!
Notice☀️☀️: MM-OVSeg has been accepted by the CVPR 2026 conference on February 21, 2026!! Related data and benchmark suites will be released soon!

🔥TODO

Release Datasets for CVPR version (Feb 22, 2026)
Release Train/Evaluation code for CVPR version
Release pre-trained weights for CVPR version

Abstract

Open-vocabulary segmentation enables pixel-level recognition from an open set of textual categories, allowing generalization beyond fixed classes. Despite great potential in remote sensing, progress in this area remains largely limited to clear-sky optical data and struggles under cloudy or haze-contaminated conditions. We present MM-OVSeg, a multimodal Optical–SAR fusion framework for resilient open-vocabulary segmentation under adverse weather conditions. MM-OVSeg leverages the complementary strengths of the two modalities—optical imagery provides rich spectral semantics, while synthetic aperture radar (SAR) offers cloud-penetrating structural cues. To address the cross-modal domain gap and the limited dense prediction capability of current vision–language models, we propose two key designs: a cross-modal unification process for multi-sensor representation alignment, and a dual-encoder fusion module that integrates hierarchical features from multiple vision foundation models for text-aligned multimodal segmentation. Extensive experiments demonstrate that MM-OVSeg achieves superior robustness and generalization across diverse cloud conditions.

Dependencies and Installation

# 1. git clone this repository
git clone https://github.com/Jimmyxichen/MM-OVSeg.git
cd MM-OVSeg

# 2. create new anaconda env
conda create -n MMOVSeg python=3.8
conda activate MMOVSeg

# 3. install torch and dependencies
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

# Optional: install the latest version of PyTorch to use the DINO v3 model as backbone. In my case, the versions are PyTorch 2.5.0, Python 3.10, CUDA 12.6, and cuDNN 9.3.0, respectively.

# The dependent versions are not strict, and in general you only need to pay attention to pytorch and detectron2.

Datasets

We include the following multimodal RS dataset configurations under diverse weather and domain conditions in this repo:

clear-sky weather: PIE-RGB-SAR-clean
synthetic cloud cover with varying opacity (thin vs. thick vs. varied): PIE-RGB-SAR-cloud (varied cloud), DDHR-SK (varied cloud), OpenEarthMap-SAR (OEM-thin & OEM-thick)
cross-domain generalization: DDHR-CH (varied cloud)

We provide aboved processed datasets for your convenience. Download them from here.

🤝Acknowledgments

The authors would also like to give special thanks to GSNet, DINOv3 and SegEarth-OV.

🙋Q & A

For any questions, please feel free to leave it in the issue section or contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MM-OVSeg: Optical–SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing

✨CVPR 2026✨

🛎️News

🔥TODO

Abstract

Dependencies and Installation

Datasets

🤝Acknowledgments

🙋Q & A

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MM-OVSeg: Optical–SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing

✨CVPR 2026✨

🛎️News

🔥TODO

Abstract

Dependencies and Installation

Datasets

🤝Acknowledgments

🙋Q & A

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages