# 📊 Function 3: Perform Spatial Sampling

## Building the `perform_spatial_sampling` Function

**Learning Objectives:**
- Extract point values and zonal statistics from raster datasets
- Implement efficient sampling strategies for large datasets
- Handle different sampling geometries (points, polygons, lines)
- Calculate statistical summaries for spatial regions
- Optimize memory usage for large-scale sampling operations
- Generate structured outputs for further analysis

**Professional Context:**
Spatial sampling is essential for extracting quantitative information from raster datasets. Professionals use these techniques for:
- Environmental monitoring and field validation
- Precision agriculture and crop monitoring
- Urban planning and site analysis
- Ecological research and habitat assessment
- Quality control and ground-truthing operations

## 🎯 Function Overview

**Function Signature:**
```python
def perform_spatial_sampling(raster_path, vector_path, method='point', 
                           statistics=['mean'], output_path=None, 
                           band_selection=None):
    """
    Extract values from raster datasets using spatial sampling methods.
    
    Parameters:
    -----------
    raster_path : str
        Path to the input raster file
    vector_path : str
        Path to the vector file containing sampling locations/areas
    method : str, default 'point'
        Sampling method: 'point', 'zonal', 'line'
    statistics : list, default ['mean']
        Statistics to calculate: 'mean', 'min', 'max', 'std', 'count', 'sum'
    output_path : str, optional
        Path to save results as vector file with extracted values
    band_selection : list, optional
        Specific bands to sample (1-based). If None, samples all bands
    
    Returns:
    --------
    dict
        Dictionary containing sampling results and metadata
    """
```

## 📚 Spatial Sampling Fundamentals

### Sampling Methods

| Method | Description | Use Case | Output |
|--------|-------------|----------|--------|
| **Point** | Extract values at specific coordinates | Field measurements, validation | Single values per point |
| **Zonal** | Calculate statistics within polygons | Administrative units, management zones | Statistical summaries |
| **Line** | Sample along linear features | Transects, roads, rivers | Profile values |

### Implementation Strategy
```python
import rasterio
import geopandas as gpd
import numpy as np
from rasterio.mask import mask
from rasterio.sample import sample_gen
```

## 💻 Hands-On Examples

In [None]:
import rasterio
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point, Polygon
from rasterio.transform import from_bounds
from rasterio.sample import sample_gen
import tempfile
import os

# Example 1: Create sample data
def create_sample_data():
    """Create sample raster and vector data for demonstration"""
    
    np.random.seed(42)
    
    # Create sample raster
    rows, cols = 50, 50
    data = 100 + 50 * np.random.random((rows, cols))
    
    # Create temporary files
    temp_dir = tempfile.mkdtemp()
    raster_path = os.path.join(temp_dir, 'sample.tif')
    
    # Define spatial reference
    bounds = (-120, 35, -110, 45)
    transform = from_bounds(*bounds, cols, rows)
    
    # Write raster
    with rasterio.open(
        raster_path, 'w',
        driver='GTiff', height=rows, width=cols,
        count=1, dtype=data.dtype, crs='EPSG:4326',
        transform=transform
    ) as dst:
        dst.write(data.astype(np.float32), 1)
    
    print(f"Created sample raster: {raster_path}")
    return raster_path, data, bounds

# Create sample data
sample_raster, raster_data, raster_bounds = create_sample_data()

# Visualize
plt.figure(figsize=(8, 6))
plt.imshow(raster_data, cmap='viridis', origin='upper', extent=raster_bounds)
plt.colorbar(label='Values')
plt.title('Sample Raster Data')
plt.show()

In [None]:
# Example 2: Point sampling
def example_point_sampling():
    """Demonstrate point-based sampling"""
    
    print("\n=== POINT SAMPLING EXAMPLE ===")
    
    # Create sample points
    min_x, min_y, max_x, max_y = raster_bounds
    n_points = 10
    
    point_coords = [
        (np.random.uniform(min_x, max_x), np.random.uniform(min_y, max_y))
        for _ in range(n_points)
    ]
    
    # Create GeoDataFrame
    points_gdf = gpd.GeoDataFrame({
        'id': range(n_points),
        'geometry': [Point(x, y) for x, y in point_coords]
    }, crs='EPSG:4326')
    
    # Sample raster values
    with rasterio.open(sample_raster) as src:
        coords = [(point.x, point.y) for point in points_gdf.geometry]
        sampled_values = list(sample_gen(src, coords, indexes=1))
        values = [val[0] if val else np.nan for val in sampled_values]
        points_gdf['sampled_value'] = values
    
    print(f"Sampled {len(points_gdf)} points")
    print(f"Mean value: {np.nanmean(values):.2f}")
    print(f"Value range: {np.nanmin(values):.2f} to {np.nanmax(values):.2f}")
    
    return points_gdf

# Run point sampling
sample_points = example_point_sampling()

In [None]:
# Example 3: Zonal statistics
def example_zonal_sampling():
    """Demonstrate zonal statistics calculation"""
    
    print("\n=== ZONAL SAMPLING EXAMPLE ===")
    
    # Create sample zones
    min_x, min_y, max_x, max_y = raster_bounds
    mid_x, mid_y = (min_x + max_x) / 2, (min_y + max_y) / 2
    
    zones = [
        Polygon([(min_x, mid_y), (mid_x, mid_y), (mid_x, max_y), (min_x, max_y)]),
        Polygon([(mid_x, mid_y), (max_x, mid_y), (max_x, max_y), (mid_x, max_y)]),
        Polygon([(min_x, min_y), (mid_x, min_y), (mid_x, mid_y), (min_x, mid_y)]),
        Polygon([(mid_x, min_y), (max_x, min_y), (max_x, mid_y), (mid_x, mid_y)])
    ]
    
    zones_gdf = gpd.GeoDataFrame({
        'zone_id': ['NW', 'NE', 'SW', 'SE'],
        'geometry': zones
    }, crs='EPSG:4326')
    
    # Calculate zonal statistics
    from rasterio.mask import mask
    
    zone_stats = []
    
    with rasterio.open(sample_raster) as src:
        for idx, zone in zones_gdf.iterrows():
            try:
                masked_data, _ = mask(src, [zone.geometry], crop=True, filled=False)
                valid_values = masked_data[0].compressed()
                
                if len(valid_values) > 0:
                    stats = {
                        'zone_id': zone['zone_id'],
                        'mean': float(valid_values.mean()),
                        'min': float(valid_values.min()),
                        'max': float(valid_values.max()),
                        'std': float(valid_values.std()),
                        'count': int(len(valid_values))
                    }
                else:
                    stats = {
                        'zone_id': zone['zone_id'],
                        'mean': np.nan, 'min': np.nan, 'max': np.nan,
                        'std': np.nan, 'count': 0
                    }
                
                zone_stats.append(stats)
                
            except Exception as e:
                print(f"Error processing zone {zone['zone_id']}: {e}")
    
    print("Zone Statistics:")
    for stats in zone_stats:
        print(f"  {stats['zone_id']}: mean={stats['mean']:.2f}, count={stats['count']}")
    
    return zones_gdf, zone_stats

# Run zonal sampling
zones, zonal_results = example_zonal_sampling()

## 🎯 Your Implementation Task

Now implement the `perform_spatial_sampling` function in `src/advanced_rasterio_analysis.py`.

### Requirements Checklist:
- [ ] Load and validate raster and vector data
- [ ] Ensure coordinate system compatibility
- [ ] Implement point sampling using rasterio.sample
- [ ] Implement zonal statistics using rasterio.mask
- [ ] Handle multiple bands and statistics
- [ ] Generate structured output dictionary
- [ ] Support optional output file saving

In [None]:
# Test your implementation
import sys
sys.path.append('../src')

try:
    from advanced_rasterio_analysis import perform_spatial_sampling
    
    # Create test vector data
    temp_dir = tempfile.mkdtemp()
    vector_path = os.path.join(temp_dir, 'test_points.gpkg')
    sample_points.to_file(vector_path, driver='GPKG')
    
    print("Testing perform_spatial_sampling function...")
    result = perform_spatial_sampling(
        raster_path=sample_raster,
        vector_path=vector_path,
        method='point',
        statistics=['mean', 'min', 'max']
    )
    
    if isinstance(result, dict) and 'results' in result:
        print("✓ Test passed! Function works correctly.")
        print(f"  Processed {len(result['results'])} features")
    else:
        print("✗ Test failed! Unexpected result format.")
        
except ImportError:
    print("Function not implemented yet. Complete implementation in src/advanced_rasterio_analysis.py")
except Exception as e:
    print(f"✗ Test failed with error: {e}")

## 🧪 Testing Your Function

Test your implementation:

```bash
cd /workspaces/your-repo
python -m pytest tests/test_advanced_rasterio_analysis.py::test_perform_spatial_sampling -v
```

## 🚀 Next Steps

After completing this function:
1. Move to Function 4: `04_cloud_optimized_geotiff.ipynb`
2. Build on spatial sampling for advanced analysis workflows

**Goal:** Master spatial sampling - essential for quantitative geospatial analysis!