# SpaceNet2 Dataset

## Intro

SpaceNet2 [(van Etten et al. 2018)](https://arxiv.org/abs/1807.01232) is a building footprint extraction dataset using high-resolution satellite imagery from diverse global urban areas. The dataset provides very high-resolution optical imagery with precise building footprint annotations, enabling automated building detection and mapping for urban planning, disaster response, and infrastructure monitoring. It addresses the critical challenge of accurate building extraction across diverse architectural styles and urban development patterns.

## Dataset Characteristics

- **Modalities**: 
  - Very high-resolution satellite imagery
- **Spatial Resolution**: 0.3m (pan-sharpened)
- **Temporal Resolution**: Single acquisition per location
- **Spectral Bands**: 
  - RGB: 3 channels (Red, Green, Blue)
- **Image Dimensions**: 512x512 pixels per patch
- **Labels**:
  - Binary segmentation (building vs. non-building)
- **Geographic Distribution**: Global urban areas (Las Vegas, Paris, Shanghai, Khartoum)

## Dataset Setup and Initialization

In [None]:
from pathlib import Path
from geobench_v2.datamodules import GeoBenchSpaceNet2DataModule

# Setup paths
PROJECT_ROOT = Path("../../")

# Initialize datamodule
datamodule = GeoBenchSpaceNet2DataModule(
    img_size=512,
    batch_size=4,
    num_workers=4,
    root=PROJECT_ROOT / "data" / "spacenet2",
    download=True,
)
datamodule.setup("fit")
datamodule.setup("test")

print("SpaceNet2 datamodule initialized successfully!")
print(f"Training samples: {len(datamodule.train_dataset)}")
print(f"Validation samples: {len(datamodule.val_dataset)}")
print(f"Test samples: {len(datamodule.test_dataset)}")

## Geographic Distribution Visualization

The SpaceNet2 dataset covers diverse global urban areas, representing various architectural styles and urban development patterns:

In [None]:
geo_fig = datamodule.visualize_geospatial_distribution()

## Sample Data Visualization

The dataset provides very high-resolution satellite imagery with precise building footprint segmentation for urban analysis:

In [None]:
fig, batch = datamodule.visualize_batch()

## GeoBenchV2 Processing Pipeline

### Preprocessing Steps

1. **Split Generation**:
   - For each of the four different AOIs, we chose a checkerboard style split, where the area extent over each AOI is overlayed with a grid (checkerboard) that determines which samples belong to each grid cell. Each grid cell is then either assigned to train/val/test such that the percentage distribution is roughly 70/10/20 across train/val/test

2. **Dataset Subsampling**:
    - The final version consists of
        - 4,000 training samples
        - 1,000 validation samples
        - 2,000 test samples

## References

1. Van Etten, A., Lindenbaum, D. and Bacastow, T.M., 2018. Spacenet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232.