Dong-Guw Lee1* Tai Hyoung Rhee1* Hyunsoo Jang1
Young-Sik Shin2 Ukcheol Shin3 Ayoung Kim1†
1 Seoul National University 2 Kyungpook National University 3 KENTECH
* Equal Contribution † Corresponding Author
CVPR 2026
- ⚡(2026-04-03): TherA & R2T2 Dataset repo opening
🔥 TherA: Thermal-Aware VLM-based Controllable RGB→TIR Translation
- 🧠 Thermal-Aware VLM Conditioning
- 🎯 Dual-Level Controllability
- 🌡 Physically Plausible Synthesis
- 🏆 SoTA Performance & Strong Zero-Shot Generalization
📦 R2T2: 100k+ Aligned RGB–TIR–Text Dataset
- 🖼 112,970 Aligned Triplets: RGB image + TIR image + Canonical thermal schema
- 🏙 Scene Diversity: Driving, CCTV, Aerial, Ego-view
- 🌗 Temporal Diversity: Day/Night, Diurnal transitions
- 🌦 Environmental Diversity: Weather, season, illumination variability
- 🧱 Material & Object-Level Annotation with Structured Canonicalization
- 📚 Compiled from 9 aligned RGB-TIR datasets with additional pseudo-aligned pairs
Click to Expand
| Category | Dataset | Train | Test | Extra | Scene | Weather | Season | Location | Avg. Resolution |
|---|---|---|---|---|---|---|---|---|---|
| Outdoor | MS² | 93,746 | 18,896 | 84,358 | Campus, Urban, Residential | Clear, Cloudy, Rainy | Summer | Korea | 544 × 191 |
| STHeReO | 46,437 | 9,745 | Campus, Suburban | Clear | Summer | Korea | 601 × 245 | ||
| ViViD | 35,796 | 14,597 | Campus | Clear, Cloudy | Spring | Korea | 629 × 497 | ||
| NSAVP | 65,333 | 78,823 | Urban, Suburban | Clear, Cloudy | Summer | Korea | 640 × 512 | ||
| CAMEL (Outdoor) | 8,581 | 4,482 | Campus, Road, Urban | Clear, Cloudy, Snow | Spring, Fall, Winter | USA | 404 × 230 | ||
| TRI2I | 19,768 | 11,913 | Campus, Road | Clear, Cloudy | Spring, Summer | USA | 229 × 228 | ||
| METU-VisTIR | 33 | 1,052 | Campus | Clear | Turkey | 632 × 497 | |||
| MIRAGE Outdoor | 269,694 | 139,508 | 84,358 | ||||||
| Indoor | Trimodal | 4,550 | 2,653 | Room | Austria | 640 × 480 | |||
| MultiSpectralMotion | 11,575 | 5,777 | 3,647 | Room | China | 640 × 480 | |||
| OdomBeyondVision | 20,904 | 4,372 | Room | UK | 328 × 249 | ||||
| CAMEL (Indoor) | 221 | 117 | Hall | USA | 404 × 230 | ||||
| MIRAGE Indoor | 37,250 | 12,919 | 88,005 | ||||||
| Total | MIRAGE | 306,944 | 152,427 | 88,005 | |||||
| MIRAGE Raw | 278,341 | 130,491 | 88,005 |
- MIRAGE Raw represents the data pairs providing both 8-bit and raw 14-bit TIR
Refer to the link below for dataset download.
R2T2
├── {$DATASET_NAME}
| └── {$SEQUENCE_NAME}
| ├── RGB
| | ├── 1.jpg
| | └── ...
| └── TIR
| ├── 1.jpg
| └── ...
├── ...
├── ViVID
| ├── img_campus_day1
| | ├── RGB
| | | ├── 000001.png
| | | └── ...
| | └── TIR
| | ├── 000001.png
| | └── ...
| ├── ...
├── ...
If you found our work useful, please cite
@inproceedings{lee2025thera,
title={TherA: Thermal-Aware Visual-Language Prompting for<br>Controllable RGB-to-Thermal Infrared Translation},
author={Lee, Dong-Guw and Rhee, TaiHyoung and Jang, Hyunsoo and Young-Sik Shin and Shin, UkCheol and Kim, Ayoung},
booktitle={CVPR},
year={2026}
}


