SLGNet 🎮: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection

Xiantai Xiang^1,2,3 Guangyao Zhou^1,2 Zixiao Wen^1,2 Wenshuai Li^1,2 Ben Niu^1,2 Feng Wang^1,2 Lijia Huang^1,2,3 Qiantong Wang^1,2 Yuhan Liu^1,2,3 Zongxu Pan⁴ Yuxin Hu^1,2,3

¹ Aerospace Information Research Institute, CAS;
² Key Laboratory of Target Cognition and Application Technology, CAS;
³ University of Chinese Academy of Sciences;
⁴ Xi'an Jiaotong University

This repository is the official implementation of the paper: SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection.

SLGNet is a parameter-efficient framework designed to transfer the capabilities of frozen RGB foundation models (e.g., DINOv2) to multimodal RGB-IR object detection. By synergizing a Structure-Aware Adapter and Language-Guided Modulation (LGM), SLGNet addresses structural degradation in Vision Transformers while endowing the model with robust environmental awareness.

Figure 1: Overview of the SLGNet framework. The architecture synergizes a frozen ViT backbone with a Structure-Aware Adapter for geometric localization and LGM for environmental adaptation

Core Features 🌟

Parameter-Efficient Tuning: Effectively adapts frozen foundation models while reducing trainable parameters by approximately 87% compared to full fine-tuning.
Structure-Aware Adapter (SA-Adapter): Extracts hierarchical structural priors (edges/contours) from both modalities to compensate for spatial detail loss in ViTs.
Language-Guided Modulation (LGM): Exploits VLM-driven structured captions (Environment, Scene, Objects, Thermal) to dynamically recalibrate visual features via affine transformations.
State-of-the-art Performance: Establishes new SOTA results on LLVIP, FLIR, KAIST, and Drone Vehicle benchmarks.

📋 TODO List

🔧 Development Roadmap

Status	Task Description
🔜	Release model checkpoints (Coming soon after publication)
🔜	Training and evaluation scripts (Coming soon after publication)
🔜	Open-source model code (Coming soon after publication)

ℹ️ Note: All the resources mentioned above will be made publicly available upon the official publication of our research paper.

Performance 📊

Quantitative Comparison (mAP)

Methods	Modality	LLVIP ($mAP_{50}$)	FLIR ($mAP_{50}$)	Drone Vehicle ($mAP$)	Trainable Params
Full Fine-tuning	RGB+IR	54.5	82.2	75.3	96.0M
UniRGB-IR	RGB+IR	96.1	81.4	-	8.9M
COFNet	RGB+IR	97.7	83.6	-	90.2M
SLGNet (Ours)	RGB+IR	98.3	85.8	80.7	12.1M

Detailed results and comparisons are available in the technical paper.

Contact and Acknowledge

Feel free to contact me through my email (xiangxiantai@gmail.com) or github issue.

Citation

If you find our paper or benchmark helpful for your research, please consider citing our paper. Thank you very much!

@misc{xiang2026slgnetsynergizingstructuralpriors,
      title={SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection}, 
      author={Xiantai Xiang and Guangyao Zhou and Zixiao Wen and Wenshuai Li and Ben Niu and Feng Wang and Lijia Huang and Qiantong Wang and Yuhan Liu and Zongxu Pan and Yuxin Hu},
      year={2026},
      eprint={2601.02249},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.02249}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
mmdet		mmdet
tests		tests
tools		tools
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLGNet 🎮: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection

Core Features 🌟

📋 TODO List

🔧 Development Roadmap

Performance 📊

Quantitative Comparison (mAP)

Contact and Acknowledge

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SLGNet 🎮: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection

Core Features 🌟

📋 TODO List

🔧 Development Roadmap

Performance 📊

Quantitative Comparison (mAP)

Contact and Acknowledge

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages