# TreeSatAI Time Series Dataset

## Intro

TreeSatAI is a comprehensive dataset for forest species classification using multi-temporal Sentinel-2 satellite imagery. The dataset provides time series of optical imagery with detailed forest inventory annotations, enabling automated tree species identification and forest monitoring across diverse European forest ecosystems. It addresses the critical challenge of large-scale forest species mapping for biodiversity assessment, forest management, and ecological monitoring applications.

## Dataset Characteristics

- **Modalities**: 
  - Multi-temporal Sentinel-2 optical imagery
- **Spatial Resolution**: 10m (resampled from native Sentinel-2 bands)
- **Temporal Resolution**: Monthly time series across growing seasons
- **Spectral Bands**: 
  - S2: 10 bands (B02, B03, B04, B05, B06, B07, B08, B8A, B11, B12)
- **Image Dimensions**: 32x32 pixels per patch (forest stand level)
- **Labels**: Tree species classification
  - 13 major European tree species
  - Including deciduous and coniferous species
- **Geographic Distribution**: France (comprehensive forest coverage)
- **Temporal Coverage**: Multi-year time series (2018-2020)
- **Forest Types**: Diverse European forest ecosystems

## Dataset Setup and Initialization

In [None]:
from pathlib import Path
from geobench_v2.datamodules import GeoBenchTreeSatAIDataModule

# Setup paths
PROJECT_ROOT = Path("../../")

# Initialize datamodule
datamodule = GeoBenchTreeSatAIDataModule(
    img_size=32,
    batch_size=32,
    num_workers=4,
    root=PROJECT_ROOT / "data" / "treesatai",
    download=True
)
datamodule.setup("fit")
datamodule.setup("test")

print("TreeSatAI datamodule initialized successfully!")
print(f"Training samples: {len(datamodule.train_dataset)}")
print(f"Validation samples: {len(datamodule.val_dataset)}")
print(f"Test samples: {len(datamodule.test_dataset)}")

## Geographic Distribution Visualization

The TreeSatAI dataset covers diverse French forest ecosystems, representing the major European forest types and species distributions:

In [None]:
geo_fig = datamodule.visualize_geospatial_distribution()

## Sample Data Visualization

The dataset provides multi-temporal Sentinel-2 imagery with detailed tree species classification for forest monitoring:

In [None]:
fig, batch = datamodule.visualize_batch()

## GeoBenchV2 Processing Pipeline

### Preprocessing Steps

1. **Multi-Temporal Forest Analysis**:
   - Created monthly time series from Sentinel-2 acquisitions across growing seasons
   - Applied cloud masking and temporal interpolation for data gaps
   - Optimized temporal sampling for phenological discrimination

2. **Forest Inventory Integration**:
   - Aligned satellite imagery with detailed forest inventory data
   - Applied spatial aggregation to forest stand level (32x32 pixels)
   - Integrated ground truth from national forest inventory programs

3. **Quality Control and Filtering**:
   - Filtered forest stands with mixed species or unclear boundaries
   - Applied temporal completeness filtering for reliable species classification
   - Maintained representation across all major European tree species

4. **Split Generation**:
   - Applied geographic clustering to prevent spatial autocorrelation
   - Used forest region-based splitting for ecological independence
   - Maintained species diversity and forest type representation across splits

### Label Processing
- **13-Class Species Classification**: Comprehensive European tree species identification
- **Phenological Validation**: Species classification validated using seasonal patterns
- **Forest Inventory Cross-Reference**: Labels validated against national forest inventory data

## References

1. Ahlswede, S., Schulz, C., Gava, C., Helber, P., Bischke, B., Förster, M., ... & Kleinschmit, B. (2023). TreeSatAI Benchmark Archive: A multi-sensor, multi-label dataset for tree species classification in remote sensing. *Earth System Science Data*, 15(2), 681-695.

2. TreeSatAI Dataset: https://huggingface.co/datasets/IGNF/TreeSatAI-Time-Series

3. Forest Species Classification: Immitzer, M., Atzberger, C., & Koukal, T. (2012). Tree species classification with random forest using very high spatial resolution 8-band WorldView-2 satellite data. *Remote Sensing*, 4(9), 2661-2693.

4. Multi-Temporal Forest Monitoring: Pasquarella, V. J., Holden, C. E., Kaufman, L., & Woodcock, C. E. (2016). From imagery to ecology: leveraging time series of all available Landsat observations to map and monitor ecosystem state and dynamics. *Remote Sensing in Ecology and Conservation*, 2(3), 152-170.