# 🛰️ Function 5: STAC Integration

## Building the `search_and_load_stac_data` Function

**Learning Objectives:**
- Understand STAC (SpatioTemporal Asset Catalog) concepts and standards
- Learn to search for satellite imagery using STAC APIs
- Master loading and processing remote sensing data
- Implement cloud-based geospatial data workflows
- Handle authentication and rate limiting in STAC requests

**Professional Context:**
STAC has revolutionized how we discover and access geospatial data. As a GIS professional, you'll regularly use STAC APIs to find satellite imagery, aerial photos, and other remote sensing datasets. This function represents modern cloud-based geospatial workflows that are becoming industry standard.

## 🎯 Function Overview

**Function Signature:**
```python
def search_and_load_stac_data(stac_url, collection_id, bbox, datetime_range, 
                             asset_key='visual', max_items=10, 
                             cloud_cover_max=20):
    """
    Search and load geospatial data using STAC (SpatioTemporal Asset Catalog) API.
    
    Parameters:
    -----------
    stac_url : str
        URL of the STAC API endpoint
    collection_id : str
        ID of the STAC collection to search
    bbox : list
        Bounding box [min_lon, min_lat, max_lon, max_lat]
    datetime_range : str
        Date range in ISO format (e.g., '2023-01-01/2023-12-31')
    asset_key : str, optional
        Asset key to load (default: 'visual')
    max_items : int, optional
        Maximum number of items to return (default: 10)
    cloud_cover_max : float, optional
        Maximum cloud cover percentage (default: 20)
    
    Returns:
    --------
    dict
        Dictionary containing search results, loaded data, and metadata
    """
```

**Key Capabilities:**
- 🔍 Search STAC catalogs with spatial and temporal filters
- ☁️ Filter by cloud cover and other quality metrics  
- 📊 Load and stack multiple raster assets
- 🗺️ Handle coordinate system transformations
- 📈 Generate metadata summaries and statistics
- 🔄 Manage API rate limits and authentication

## 🏗️ Implementation Strategy

Our STAC integration function will follow this workflow:

### Step 1: STAC Client Setup
```python
# Initialize STAC client with API endpoint
from pystac_client import Client
import rasterio
from rasterio.merge import merge
import numpy as np

# Connect to STAC API
catalog = Client.open(stac_url)
```

### Step 2: Search Configuration
```python
# Configure search parameters
search = catalog.search(
    collections=[collection_id],
    bbox=bbox,
    datetime=datetime_range,
    max_items=max_items,
    query={"eo:cloud_cover": {"lt": cloud_cover_max}}
)
```

### Step 3: Data Loading and Processing
```python
# Load and merge multiple raster assets
raster_arrays = []
for item in items:
    asset_url = item.assets[asset_key].href
    with rasterio.open(asset_url) as src:
        raster_arrays.append(src.read())
```

## 🚀 Hands-On Example: Building the Function

Let's build the complete STAC integration function:

In [None]:
def search_and_load_stac_data(stac_url, collection_id, bbox, datetime_range, 
                             asset_key='visual', max_items=10, 
                             cloud_cover_max=20):
    """
    Search and load geospatial data using STAC API.
    """
    import rasterio
    import numpy as np
    from pystac_client import Client
    from datetime import datetime
    import warnings
    warnings.filterwarnings('ignore')
    
    print(f"🛰️ Starting STAC data search...")
    print(f"   📍 STAC URL: {stac_url}")
    print(f"   📁 Collection: {collection_id}")
    print(f"   📅 Date Range: {datetime_range}")
    
    # Step 1: Initialize STAC client
    try:
        catalog = Client.open(stac_url)
        print(f"✅ Connected to STAC catalog")
    except Exception as e:
        print(f"❌ Failed to connect to STAC catalog: {e}")
        return {'error': f'Connection failed: {e}'}
    
    # Step 2: Configure and execute search
    try:
        search_params = {
            'collections': [collection_id],
            'bbox': bbox,
            'datetime': datetime_range,
            'max_items': max_items
        }
        
        if cloud_cover_max is not None:
            search_params['query'] = {"eo:cloud_cover": {"lt": cloud_cover_max}}
        
        search = catalog.search(**search_params)
        items = list(search.items())
        
        print(f"🔍 Found {len(items)} items matching criteria")
        
        if not items:
            return {
                'items_found': 0,
                'message': 'No items found matching search criteria',
                'search_params': search_params
            }
            
    except Exception as e:
        print(f"❌ Search failed: {e}")
        return {'error': f'Search failed: {e}'}
    
    # Step 3: Process items and collect metadata
    items_metadata = []
    loaded_data = []
    
    for i, item in enumerate(items[:max_items]):
        try:
            # Extract item metadata
            metadata = {
                'id': item.id,
                'datetime': str(item.datetime) if item.datetime else 'Unknown',
                'cloud_cover': item.properties.get('eo:cloud_cover', 'Unknown'),
                'assets': list(item.assets.keys()),
                'bbox': item.bbox if hasattr(item, 'bbox') else 'Unknown'
            }
            items_metadata.append(metadata)
            
            # Load specified asset if available
            if asset_key in item.assets:
                asset_url = item.assets[asset_key].href
                print(f"📥 Loading asset {i+1}/{min(len(items), max_items)}: {asset_key}")
                
                try:
                    with rasterio.open(asset_url) as src:
                        # Read first band or all bands if small
                        if src.count == 1:
                            data = src.read(1)
                        else:
                            data = src.read()
                        
                        profile = src.profile.copy()
                        
                        loaded_data.append({
                            'item_id': item.id,
                            'data': data,
                            'profile': profile,
                            'bounds': src.bounds,
                            'crs': str(src.crs),
                            'shape': data.shape,
                            'dtype': str(data.dtype)
                        })
                        
                except Exception as load_error:
                    print(f"⚠️ Failed to load asset for {item.id}: {load_error}")
            else:
                print(f"⚠️ Asset '{asset_key}' not found for {item.id}")
                print(f"   Available assets: {list(item.assets.keys())}")
                
        except Exception as item_error:
            print(f"❌ Error processing item {getattr(item, 'id', 'unknown')}: {item_error}")
            continue
    
    # Step 4: Calculate summary statistics
    cloud_covers = [item['cloud_cover'] for item in items_metadata 
                   if isinstance(item['cloud_cover'], (int, float))]
    
    datetimes = [item['datetime'] for item in items_metadata 
                if item['datetime'] != 'Unknown']
    
    summary_stats = {
        'total_items_found': len(items),
        'items_processed': len(items_metadata),
        'items_loaded': len(loaded_data),
        'date_range_actual': {
            'start': min(datetimes) if datetimes else None,
            'end': max(datetimes) if datetimes else None
        },
        'cloud_cover_stats': {
            'min': min(cloud_covers) if cloud_covers else None,
            'max': max(cloud_covers) if cloud_covers else None,
            'avg': np.mean(cloud_covers) if cloud_covers else None
        },
        'data_stats': {
            'total_datasets': len(loaded_data),
            'total_size_mb': sum([data['data'].nbytes / (1024*1024) for data in loaded_data]) if loaded_data else 0
        }
    }
    
    # Step 5: Return comprehensive results
    results = {
        'search_successful': True,
        'items_metadata': items_metadata,
        'loaded_data': loaded_data,
        'summary_stats': summary_stats,
        'search_parameters': {
            'stac_url': stac_url,
            'collection_id': collection_id,
            'bbox': bbox,
            'datetime_range': datetime_range,
            'asset_key': asset_key,
            'max_items': max_items,
            'cloud_cover_max': cloud_cover_max
        }
    }
    
    print(f"\n🎉 STAC search and load complete!")
    print(f"   📊 Found: {len(items)} items")
    print(f"   💾 Loaded: {len(loaded_data)} datasets")
    print(f"   📏 Total data: {summary_stats['data_stats']['total_size_mb']:.2f} MB")
    
    return results

## 🧪 Test the Function

Let's test our STAC integration function with a real example:

In [None]:
# Test with Microsoft Planetary Computer STAC
# NOTE: This requires internet connectivity and pystac-client package

# Example STAC search parameters
test_params = {
    'stac_url': "https://planetarycomputer.microsoft.com/api/stac/v1",
    'collection_id': "sentinel-2-l2a",
    'bbox': [-122.5, 37.7, -122.3, 37.9],  # San Francisco Bay Area
    'datetime_range': "2023-07-01/2023-07-31",  # July 2023
    'asset_key': 'rendered_preview',  # Use preview for faster loading
    'max_items': 3,
    'cloud_cover_max': 15
}

print("🚀 Testing STAC integration function...\n")

# Run the search
try:
    results = search_and_load_stac_data(**test_params)
    
    # Display results
    if 'error' in results:
        print(f"❌ Error: {results['error']}")
    elif results.get('items_found', 0) == 0:
        print(f"⚠️ No items found: {results.get('message', 'Unknown reason')}")
    else:
        print(f"\n📋 Search Results Summary:")
        print(f"   Items found: {results['summary_stats']['total_items_found']}")
        print(f"   Items loaded: {results['summary_stats']['items_loaded']}")
        print(f"   Total data size: {results['summary_stats']['data_stats']['total_size_mb']:.2f} MB")
        
        if results['summary_stats']['cloud_cover_stats']['avg']:
            print(f"   Average cloud cover: {results['summary_stats']['cloud_cover_stats']['avg']:.1f}%")
        
        if results['items_metadata']:
            print(f"\n📅 Items found:")
            for item in results['items_metadata'][:3]:  # Show first 3
                print(f"   - ID: {item['id']}")
                print(f"     Date: {item['datetime']}")
                print(f"     Cloud cover: {item['cloud_cover']}%")
                print(f"     Assets: {len(item['assets'])} available")
        
        if results['loaded_data']:
            print(f"\n💾 Loaded datasets:")
            for data in results['loaded_data']:
                print(f"   - {data['item_id']}: {data['shape']} {data['dtype']}")
                print(f"     CRS: {data['crs']}")
                print(f"     Bounds: {data['bounds']}")

except Exception as e:
    print(f"❌ Test failed: {e}")
    print("\n💡 This might be due to:")
    print("   - Missing pystac-client package (pip install pystac-client)")
    print("   - No internet connectivity")
    print("   - STAC service temporarily unavailable")

## 💡 Understanding STAC Concepts

### What is STAC?
STAC (SpatioTemporal Asset Catalog) is a specification that provides a common language to describe geospatial information. It enables discovery and access to satellite imagery and other geospatial data.

### Key STAC Components:
- **Catalog**: Root container organizing collections
- **Collection**: Group of related items (e.g., all Landsat 8 imagery)
- **Item**: Individual asset (e.g., single satellite scene)
- **Asset**: Actual data file (e.g., GeoTIFF, metadata)

### Common STAC Endpoints:
- **Microsoft Planetary Computer**: `https://planetarycomputer.microsoft.com/api/stac/v1`
- **AWS Earth**: `https://earth-search.aws.element84.com/v1`
- **Google Earth Engine**: `https://earthengine-stac.storage.googleapis.com/catalog/catalog.json`

### Popular Collections:
- `landsat-c2-l2`: Landsat Collection 2 Level 2
- `sentinel-2-l2a`: Sentinel-2 Level 2A
- `naip`: National Agriculture Imagery Program
- `modis`: MODIS satellite imagery

## 🎯 Your Task: Implement and Test

**Requirements:**
1. **Implement the function** exactly as shown above
2. **Add error handling** for network issues and missing dependencies
3. **Test with different collections** (try Landsat, Sentinel-2, NAIP)
4. **Optimize for memory usage** when loading large datasets
5. **Add data visualization** capabilities (optional enhancement)

**Key Implementation Points:**
- Use proper exception handling for API calls
- Validate input parameters before making requests
- Handle cases where assets are not available
- Provide informative progress messages
- Return structured results with metadata

**Testing Strategy:**
```python
# Test different scenarios:
# 1. Valid search with results
# 2. Search with no results
# 3. Invalid STAC URL
# 4. Missing asset keys
# 5. Network connectivity issues
```

## 📚 Additional Resources

### STAC Learning Materials:
- [STAC Specification](https://stacspec.org/)
- [PySTAC Client Documentation](https://pystac-client.readthedocs.io/)
- [Microsoft Planetary Computer](https://planetarycomputer.microsoft.com/)
- [STAC Index](https://stacindex.org/) - Discover STAC catalogs

### Required Dependencies:
```bash
pip install pystac-client rasterio numpy requests
```

### Professional Applications:
- **Environmental Monitoring**: Track deforestation, urban growth, natural disasters
- **Agriculture**: Crop monitoring, yield prediction, irrigation management
- **Climate Science**: Long-term change analysis, temperature monitoring
- **Disaster Response**: Rapid assessment using before/after imagery
- **Urban Planning**: Land use analysis, infrastructure monitoring

## 🔧 Testing Your Implementation

Run the official tests to verify your function works correctly:

```bash
cd /workspaces/your-repo
python -m pytest tests/test_advanced_rasterio_analysis.py::test_search_and_load_stac_data -v
```

### Additional Testing Notes:
- STAC functions require internet connectivity
- Install required dependencies: `pip install pystac-client`
- Test with different STAC catalogs and collections
- Handle rate limiting and API errors gracefully
- Consider implementing caching for repeated requests

### Manual Test Cases:
1. **Basic functionality**: Search and load data successfully
2. **Error handling**: Test with invalid URLs and parameters
3. **Edge cases**: Empty results, missing assets, large datasets
4. **Performance**: Monitor memory usage and loading times

## 🚀 Next Steps

Congratulations! You've completed all 5 advanced rasterio analysis functions:

1. ✅ **Topographic Metrics** - Calculate terrain derivatives from elevation data
2. ✅ **Vegetation Indices** - Analyze multispectral imagery for vegetation health
3. ✅ **Spatial Sampling** - Extract point and zonal statistics from rasters
4. ✅ **Cloud Optimized GeoTIFF** - Create modern, efficient raster formats
5. ✅ **STAC Integration** - Search and access cloud-based satellite imagery

### Master Level Skills Achieved:
- 🛰️ **Remote Sensing Workflows** - Professional satellite data processing
- ☁️ **Cloud-Native Geospatial** - Modern data access patterns
- 📊 **Advanced Analytics** - Terrain analysis and vegetation monitoring
- 🗄️ **Data Management** - Efficient storage and access patterns
- 🔗 **API Integration** - Working with modern geospatial services

### Professional Applications:
These functions represent real-world workflows used in:
- Environmental consulting and monitoring
- Agricultural technology and precision farming
- Urban planning and smart city initiatives
- Climate research and earth system science
- Disaster response and emergency management

**You're now equipped with advanced rasterio skills that are highly valued in the geospatial industry! 🎉**