# KuroSiwo Flood Mapping Dataset

## Intro

KuroSiwo [(Bountos et al. 2024)](https://proceedings.neurips.cc/paper_files/paper/2024/hash/43612b0662cb6a4986edf859fd6ebafe-Abstract-Datasets_and_Benchmarks_Track.html) is a comprehensive multi-temporal satellite dataset designed for rapid flood detection and monitoring using Synthetic Aperture Radar (SAR) imagery. The dataset combines SAR Ground Range Detected products with Single Look Complex data featuring minimal preprocessing, enabling researchers to leverage both phase and amplitude information for downstream flood mapping applications and algorithm development.

## Dataset Characteristics

- **Modalities**: 
  - Sentinel-1 SAR imagery (VV and VH polarizations)
  - Digital Elevation Model (DEM) auxiliary data
  - Slope auxiliary data derived from DEM
- **Spatial Resolution**: 10m ground sample distance
- **Temporal Resolution**: Multi-temporal observations (pre-event, event, post-event)
- **Spectral Bands**: 
  - SAR VV polarization
  - SAR VH polarization
- **Image Dimensions**: Variable patch sizes (typically 256x256 to 512x512 pixels)
- **Labels**: 4-class flood segmentation masks
  - Class 0: No-Water
  - Class 1: Permanent Water
  - Class 2: Flood Water 
  - Class 3: Invalid/No-dat
- **Geographic Distribution**: Multiple Areas of Interest (AOIs) across different flood-prone regions

## Dataset Setup and Initialization

In [None]:
from pathlib import Path
from geobench_v2.datamodules import GeoBenchKuroSiwoDataModule

# Setup paths
PROJECT_ROOT = Path("../../")

# Initialize datamodule
datamodule = GeoBenchKuroSiwoDataModule(
    img_size=256,
    batch_size=8,
    num_workers=4,
    root=PROJECT_ROOT / "data" / "kuro_siwo",
    download=True
)
datamodule.setup("fit")
datamodule.setup("test")

print("Kuro-Siwo datamodule initialized successfully!")
print(f"Training samples: {len(datamodule.train_dataset)}")
print(f"Validation samples: {len(datamodule.val_dataset)}")
print(f"Test samples: {len(datamodule.test_dataset)}")

## Geographic Distribution Visualization


In [None]:
geo_fig = datamodule.visualize_geospatial_distribution()

## Sample Data Visualization

In [None]:
fig, batch = datamodule.visualize_batch()

## GeoBenchV2 Processing Pipeline

### Preprocessing Steps

1. **Split Generation**:
   - Use referenced train/val/test splits from the original dataset

2. **Dataset Subsampling**:
    - The final version consists of
        - 4,000 training samples
        - 1,000 validation samples
        - 2,000 test samples

## References

1. Bountos, N.I., Sdraka, M., Zavras, A., Karavias, A., Karasante, I., Herekakis, T., Thanasou, A., Michail, D. and Papoutsis, I., 2024. Kuro Siwo: 33 billion $ m^ 2$ under the water. A global multi-temporal satellite dataset for rapid flood mapping. Advances in Neural Information Processing Systems, 37, pp.38105-38121. https://proceedings.neurips.cc/paper_files/paper/2024/hash/43612b0662cb6a4986edf859fd6ebafe-Abstract-Datasets_and_Benchmarks_Track.html