# GFM Data Download Example

This notebook demonstrates how to download GFM flood data for Pakistan using our custom downloader module.

## Setup

First, let's import the necessary libraries and our custom GFM downloader.

In [None]:
import os
import json
from datetime import datetime

from ds_flood_gfm.gfm_downloader import GFMDownloader

## Initialize the GFM Downloader

Create a downloader instance that connects to the EODC STAC API.

In [None]:
# Initialize the downloader
downloader = GFMDownloader()
print("‚úÖ Connected to EODC STAC API")

## Explore the GFM Collection

Let's get information about the Global Flood Monitoring collection.

In [None]:
# Get collection information
collection_info = downloader.get_collection_info()

print(f"Collection Title: {collection_info.get('title', 'N/A')}")
print(f"\nDescription: {collection_info.get('description', 'N/A')[:200]}...")

# Check temporal and spatial extent
if 'extent' in collection_info:
    extent = collection_info['extent']
    if 'temporal' in extent:
        temporal = extent['temporal']['interval'][0]
        print(f"\nTemporal Coverage: {temporal[0]} to {temporal[1] if temporal[1] else 'ongoing'}")
    
    if 'spatial' in extent:
        spatial = extent['spatial']['bbox'][0]
        print(f"Spatial Coverage: {spatial} (global)")

## Search for Pakistan Flood Data

Let's search for flood data over Pakistan during the major 2022 flood event.

In [None]:
# Search for Pakistan floods during the 2022 monsoon season
items = downloader.search_pakistan_floods(
    start_date="2022-09-15",
    end_date="2022-09-16", 
    limit=10
)

print(f"Found {len(items)} GFM items for Pakistan floods")

# Display summary
downloader.print_item_summary(items)

## Examine Available Assets

Each GFM item contains multiple assets with different flood extent products from various algorithms.

In [None]:
# Examine the first item's assets
if len(items) > 0:
    first_item = items[0]
    print(f"Assets available for {first_item.id}:")
    print("=" * 60)
    
    for asset_name, asset in first_item.assets.items():
        print(f"\n{asset_name}:")
        print(f"  - Title: {asset.title}")
        print(f"  - Roles: {asset.roles}")
        print(f"  - Media Type: {asset.media_type}")
        
        # Highlight key flood extent assets
        if 'flood_extent' in asset_name:
            print(f"  üåä FLOOD EXTENT DATA")
else:
    print("No items found for the specified search criteria.")

## Download Flood Data

Let's download the ensemble flood extent data for analysis.

In [None]:
# Download ensemble flood extent data
if len(items) > 0:
    print("Downloading ensemble flood extent data...")
    
    downloaded_files = downloader.download_item_assets(
        items[:1],  # Download first item
        download_dir="../data/gfm",
        asset_types=["ensemble_flood_extent"],
        create_subdirs=True
    )
    
    print("\nDownload completed!")
    print("Files downloaded:")
    
    for item_id, files in downloaded_files.items():
        print(f"\n{item_id}:")
        for file_path in files:
            file_size = os.path.getsize(file_path) / (1024*1024)  # Size in MB
            print(f"  - {os.path.basename(file_path)} ({file_size:.1f} MB)")
else:
    print("No data to download.")

## Examine Downloaded Data

Let's look at the structure of the downloaded flood data.

In [None]:
import rasterio
import numpy as np

if len(items) > 0 and downloaded_files:
    # Get the first downloaded TIF file
    item_id = list(downloaded_files.keys())[0]
    tif_files = [f for f in downloaded_files[item_id] if f.endswith('.tif')]
    
    if tif_files:
        flood_file = tif_files[0]
        
        print(f"Examining: {os.path.basename(flood_file)}")
        print("=" * 50)
        
        with rasterio.open(flood_file) as src:
            # Basic properties
            print(f"Shape: {src.shape}")
            print(f"CRS: {src.crs}")
            print(f"Bounds: {src.bounds}")
            print(f"Resolution: {src.res[0]:.0f}m")
            print(f"Data type: {src.dtypes[0]}")
            print(f"Nodata value: {src.nodata}")
            
            # Sample the data
            data_sample = src.read(1, window=((0, 1000), (0, 1000)))
            unique_values = np.unique(data_sample)
            
            print(f"\nSample data values: {unique_values}")
            print(f"Value meanings:")
            print(f"  0 = No flood")
            print(f"  1 = Flood detected (ensemble agreement)")
            print(f"  255 = Background/nodata")
            
            # Count pixels
            for val in unique_values:
                count = np.sum(data_sample == val)
                pct = 100 * count / data_sample.size
                print(f"  Value {val}: {count:,} pixels ({pct:.1f}%)")
    else:
        print("No TIF files found in downloaded data")
else:
    print("No data was downloaded to examine")

## Next Steps

Now that you have downloaded real GFM flood data, you can:

1. **Analyze flood patterns** using the GeoTIFF data
2. **Calculate affected population** by overlaying with population data
3. **Create visualizations** of flood extent and impact
4. **Run time series analysis** by downloading multiple dates

See the `gfm_affected_population_demo.ipynb` notebook for a complete example of calculating affected population using real data.

In [None]:
print("üìö AVAILABLE ANALYSIS OPTIONS:")
print("")
print("üåä Flood Analysis:")
print("   - Load GeoTIFF files with rasterio")
print("   - Calculate flood statistics (area, severity)")
print("   - Create flood extent visualizations")
print("")
print("üë• Population Impact:")
print("   - Overlay with GHSL population data")
print("   - Calculate affected population")
print("   - Generate impact reports")
print("")
print("‚è∞ Temporal Analysis:")
print("   - Download multiple dates")
print("   - Track flood evolution")
print("   - Create time series animations")
print("")
print("üó∫Ô∏è Spatial Analysis:")
print("   - Integration with GIS systems")
print("   - Overlay with infrastructure data")
print("   - Regional impact assessment")