Repository for managing, analyzing, and using the Sentinel2Cap dataset, which contains captions for remote sensing images, both manually annotated and automatically generated with Qwen3-VL-8B-Instruct.
├── scripts/ # Training, inference, and utility scripts
├── script_dataset/ # Dataset analysis and statistics scripts
├── Sentinel2Cap.zip # 12k manually annotated captions
├── Sentinel2Cap.parquet # Structured dataset metadata
├── Qwen3-VL-8B-Instruct_... # Outputs from two studies (different prompts)
├── install_flash_attn.sh # Flash Attention installation script
├── pyproject.toml # Project dependencies
├── .python-version
└── .gitignore
Contains 12,000 manually annotated captions associated with Sentinel-2 RGB, Sentinel-2 multi-spectral and Sentinel-1 SAR images with a pseudo-RGB representations.
File containing structured metadata for each dataset sample.
keyimage_indexnumber_of_classesnumber_of_classes_30file_namepath_to_S2→ path to Sentinel-2 imagepath_to_SM→ path to reference mapsset→ train / val / testusedmonthoccurrencess1_name→ associated Sentinel-1 image name
key: N9999_R037_T29SNB_16_20 image_index: 431416 number_of_classes: 12 number_of_classes_30: 11 file_name: S2B_MSIL2A_20180326T112109_... path_to_S2: BigEarthNet-S2/S2B_MSIL2A_20180326T112109_... path_to_SM: Reference_Maps/S2B_MSIL2A_20180326T112109_... set: train used: True month: march occurrences: 3 s1_name: S1A_IW_GRDH_1SDV_20180327T064326_29SNB_16_20
This file contains outputs from two studies performed using the same model:
- Model: Qwen3-VL-8B-Instruct
- Main difference: prompting strategies used for caption generation
Use cases:
- comparison of prompting strategies
- qualitative and quantitative analysis of generated captions
Install dependencies:
pip install -e .Dataset Preparation
Make sure that paths in the .parquet file are correctly set:
path_to_S2 → Sentinel-2 images path_to_SM → reference maps Training / Inference
Main scripts are located in:
scripts/
Examples:
python scripts/train.py python scripts/inference.py Dataset Analysis
Scripts available in:
script_dataset/
Useful for:
class distribution analysis temporal distribution caption analysis
The dataset combines information from:
Sentinel-2 (RGB imagery) Sentinel-2 (multi-spectral imagery) Sentinel-1 (SAR imagery) reference land cover maps
Manually annotated captions can be used as:
ground truth benchmark for generative models
Copyright (c) 2026 Tosato Lucrezia MIT License for Sentinel2Cap dataset
Copyright (c) 2026 Tosato Lucrezia, Gianluca Lombardi CC BY 4.0 License for the code
Lucrezia Tosato: ltosato (at) sarmap.ch Gianluca Lombardi: gianluca.lombardi.fr (at) gmail.com Ronny Hansch: rww.haensch (at) gmail.com
The paper is under review; for the moment, please use the following citation: xxx