# MMFlood Dataset

## Intro

MMFlood is a comprehensive multi-modal dataset for flood detection and monitoring using satellite imagery. The dataset combines optical and SAR observations to enable robust flood mapping across different weather conditions and temporal scales. It addresses the critical challenge of rapid flood detection for disaster response, risk assessment, and emergency management applications, providing all-weather monitoring capabilities essential for timely flood response.

## Dataset Characteristics

- **Modalities**: 
  - Sentinel-1 SAR imagery (C-band)
  - Sentinel-2 optical imagery
- **Spatial Resolution**: 10m (harmonized across sensors)
- **Temporal Resolution**: Pre- and post-flood event pairs
- **Spectral Bands**: 
  - S1: VV and VH polarizations
  - S2: 10 bands (B02, B03, B04, B05, B06, B07, B08, B8A, B11, B12)
- **Image Dimensions**: 512x512 pixels per patch
- **Labels**: Binary flood segmentation masks
  - Flooded vs. non-flooded areas
  - Permanent water body exclusion
- **Geographic Distribution**: Global coverage of major flood events
- **Temporal Coverage**: Multi-year flood events (2016-2021)
- **Event Types**: River floods, coastal floods, flash floods

## Dataset Setup and Initialization

In [None]:
from pathlib import Path
from geobench_v2.datamodules import GeoBenchMMFloodDataModule

# Setup paths
PROJECT_ROOT = Path("../../")

# Initialize datamodule
datamodule = GeoBenchMMFloodDataModule(
    img_size=512,
    batch_size=4,
    num_workers=4,
    root=PROJECT_ROOT / "data" / "mmflood",
    download=True
)
datamodule.setup("fit")
datamodule.setup("test")

print("MMFlood datamodule initialized successfully!")
print(f"Training samples: {len(datamodule.train_dataset)}")
print(f"Validation samples: {len(datamodule.val_dataset)}")
print(f"Test samples: {len(datamodule.test_dataset)}")

## Geographic Distribution Visualization

The MMFlood dataset provides global coverage of major flood events, representing diverse hydrological and climatic conditions:

In [None]:
geo_fig = datamodule.visualize_geospatial_distribution()

## Sample Data Visualization

The dataset provides multi-modal satellite imagery with precise flood segmentation for disaster monitoring and emergency response:

In [None]:
fig, batch = datamodule.visualize_batch()

## GeoBenchV2 Processing Pipeline

### Preprocessing Steps

1. **Multi-Modal Data Fusion**:
   - Co-registered Sentinel-1 SAR and Sentinel-2 optical imagery
   - Applied geometric correction and temporal alignment
   - Harmonized spatial resolution to 10m common grid

2. **Flood Event Processing**:
   - Identified pre-flood and post-flood image pairs
   - Applied change detection algorithms for flood mapping
   - Integrated permanent water body masks for accurate flood delineation

3. **Quality Control and Filtering**:
   - Filtered images with excessive cloud cover in optical data
   - Applied temporal consistency checks for flood persistence
   - Maintained diversity across different flood types and magnitudes

4. **Split Generation**:
   - Applied event-based splitting to prevent temporal data leakage
   - Used geographic clustering for spatial independence
   - Maintained flood type diversity across train/validation/test splits

### Label Processing
- **Binary Flood Segmentation**: Precise flood boundary delineation for emergency response
- **Multi-Modal Validation**: Flood maps validated using both SAR and optical observations
- **Expert Validation**: Flood extent validated by hydrological and disaster management experts

## References

1. Bonafilia, D., Tellman, B., Anderson, T., & Issenberg, E. (2020). Sen1Floods11: A georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1. *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops*, 210-211.

2. Flood Mapping with SAR: Martinis, S., Kersten, J., & Twele, A. (2015). A fully automated TerraSAR-X based flood service. *ISPRS Journal of Photogrammetry and Remote Sensing*, 104, 203-212.

3. Multi-Modal Flood Detection: Kang, W., Xiang, Y., Wang, F., & You, H. (2018). EU-Net: An efficient fully convolutional network for building extraction from optical remote sensing images. *Remote Sensing*, 10(2), 305.

4. Disaster Management: Joyce, K. E., Belliss, S. E., Samsonov, S. V., McNeill, S. J., & Glassey, P. J. (2009). A review of the status of satellite remote sensing and image processing techniques for mapping natural hazards and disasters. *Progress in Physical Geography*, 33(2), 183-207.