# 📐 Function 2: Calculate Basic Spatial Metrics

## Building the `calculate_basic_spatial_metrics` Function

**Learning Objectives:**
- Master geometric calculations with GeoPandas
- Understand coordinate system impacts on measurements
- Learn to calculate areas, perimeters, and centroids
- Handle different geometry types (points, lines, polygons)
- Return structured spatial metrics for analysis

**Professional Context:**
Spatial metrics are fundamental to GIS analysis. Whether you're calculating building footprints, forest areas, or road lengths, accurate geometric measurements drive decision-making in environmental science, urban planning, and resource management.

## 🎯 Function Overview

**Function Signature:**
```python
def calculate_basic_spatial_metrics(gdf, area_units='km2', length_units='km'):
    """
    Calculate basic spatial metrics (area, perimeter, centroid) for geometries.
    
    Parameters:
    -----------
    gdf : gpd.GeoDataFrame
        Input geodataframe with geometries
    area_units : str, optional
        Units for area calculation ('km2', 'm2', 'ha')
        Default: 'km2'
    length_units : str, optional
        Units for length/perimeter calculation ('km', 'm')
        Default: 'km'
    
    Returns:
    --------
    dict
        Dictionary containing calculated metrics and enhanced GeoDataFrame
    """
```

**Key Capabilities:**
- 📏 Calculate accurate areas for polygon features
- 📐 Calculate perimeters for polygons and lengths for lines
- 🎯 Generate centroid coordinates for all geometry types
- 🌍 Handle coordinate system conversions for accuracy
- 📊 Provide comprehensive summary statistics
- 📋 Return enhanced GeoDataFrame with new metric columns

## 🏗️ Implementation Strategy

Our spatial metrics function will follow this workflow:

### Step 1: Input Validation
```python
# Validate input GeoDataFrame
if not isinstance(gdf, gpd.GeoDataFrame):
    raise TypeError("Input must be a GeoDataFrame")
    
if gdf.empty:
    raise ValueError("GeoDataFrame is empty")
```

### Step 2: CRS Management
```python
# Store original CRS and convert to appropriate projected system
original_crs = gdf.crs
if gdf.crs.is_geographic:
    # Convert to equal-area projection for accurate measurements
    gdf_projected = gdf.to_crs('ESRI:54009')  # World Mollweide
```

### Step 3: Metric Calculations
```python
# Calculate areas, perimeters, and centroids based on geometry type
gdf_metrics = gdf_projected.copy()
gdf_metrics['area_m2'] = gdf_metrics.geometry.area
gdf_metrics['perimeter_m'] = gdf_metrics.geometry.length
gdf_metrics['centroid'] = gdf_metrics.geometry.centroid
```

## 🚀 Hands-On Example: Building the Function

Let's build the complete spatial metrics function step by step:

In [None]:
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import Point, LineString, Polygon
import warnings
warnings.filterwarnings('ignore')

def calculate_basic_spatial_metrics(gdf, area_units='km2', length_units='km'):
    """
    Calculate basic spatial metrics (area, perimeter, centroid) for geometries.
    """
    
    print(f"📐 Starting spatial metrics calculation...")
    print(f"   📊 Input: {len(gdf)} features")
    print(f"   🌍 CRS: {gdf.crs}")
    print(f"   📏 Units: {area_units} (area), {length_units} (length)")
    
    # Step 1: Input validation
    if not isinstance(gdf, gpd.GeoDataFrame):
        raise TypeError("Input must be a GeoDataFrame")
        
    if gdf.empty:
        raise ValueError("GeoDataFrame is empty")
        
    if gdf.geometry.isna().any():
        print("⚠️ Warning: Found null geometries, removing them")
        gdf = gdf.dropna(subset=['geometry'])
    
    # Step 2: CRS management for accurate measurements
    original_crs = gdf.crs
    
    if gdf.crs is None:
        print("⚠️ Warning: No CRS defined, assuming EPSG:4326")
        gdf = gdf.set_crs('EPSG:4326')
        original_crs = gdf.crs
    
    # Convert to projected CRS for accurate area/length calculations
    if gdf.crs.is_geographic:
        # Use World Mollweide equal-area projection
        projected_crs = 'ESRI:54009'
        gdf_projected = gdf.to_crs(projected_crs)
        print(f"🔄 Converted to projected CRS: {projected_crs}")
    else:
        gdf_projected = gdf.copy()
        projected_crs = gdf.crs
        print(f"✅ Already in projected CRS: {projected_crs}")
    
    # Step 3: Create working copy for calculations
    gdf_metrics = gdf_projected.copy()
    
    # Step 4: Calculate geometry type distribution
    geom_types = gdf_metrics.geometry.geom_type.value_counts()
    print(f"📊 Geometry types: {dict(geom_types)}")
    
    # Step 5: Calculate centroids for all geometries
    print("🎯 Calculating centroids...")
    gdf_metrics['centroid'] = gdf_metrics.geometry.centroid
    gdf_metrics['centroid_x'] = gdf_metrics['centroid'].x
    gdf_metrics['centroid_y'] = gdf_metrics['centroid'].y
    
    # Step 6: Calculate areas (for polygons)
    print("📏 Calculating areas...")
    # Initialize area column
    gdf_metrics['area_m2'] = 0.0
    
    # Calculate areas only for polygons/multipolygons
    polygon_mask = gdf_metrics.geometry.geom_type.isin(['Polygon', 'MultiPolygon'])
    if polygon_mask.any():
        gdf_metrics.loc[polygon_mask, 'area_m2'] = gdf_metrics.loc[polygon_mask, 'geometry'].area
    
    # Convert areas to requested units
    if area_units == 'km2':
        gdf_metrics['area'] = gdf_metrics['area_m2'] / 1_000_000
        area_unit_label = 'km²'
    elif area_units == 'ha':
        gdf_metrics['area'] = gdf_metrics['area_m2'] / 10_000
        area_unit_label = 'ha'
    else:  # m2
        gdf_metrics['area'] = gdf_metrics['area_m2']
        area_unit_label = 'm²'
    
    # Step 7: Calculate perimeters/lengths
    print("📐 Calculating perimeters/lengths...")
    gdf_metrics['length_m'] = gdf_metrics.geometry.length
    
    # Convert lengths to requested units
    if length_units == 'km':
        gdf_metrics['length'] = gdf_metrics['length_m'] / 1_000
        length_unit_label = 'km'
    else:  # m
        gdf_metrics['length'] = gdf_metrics['length_m']
        length_unit_label = 'm'
    
    # Step 8: Calculate summary statistics
    total_features = len(gdf_metrics)
    total_area = gdf_metrics['area'].sum()
    total_length = gdf_metrics['length'].sum()
    
    # Area statistics (for polygons only)
    polygon_areas = gdf_metrics[polygon_mask]['area'] if polygon_mask.any() else pd.Series([], dtype=float)
    area_stats = {
        'total': total_area,
        'mean': polygon_areas.mean() if len(polygon_areas) > 0 else 0,
        'median': polygon_areas.median() if len(polygon_areas) > 0 else 0,
        'min': polygon_areas.min() if len(polygon_areas) > 0 else 0,
        'max': polygon_areas.max() if len(polygon_areas) > 0 else 0,
        'std': polygon_areas.std() if len(polygon_areas) > 0 else 0,
        'count': len(polygon_areas)
    }
    
    # Length statistics (for all geometries)
    length_stats = {
        'total': total_length,
        'mean': gdf_metrics['length'].mean(),
        'median': gdf_metrics['length'].median(),
        'min': gdf_metrics['length'].min(),
        'max': gdf_metrics['length'].max(),
        'std': gdf_metrics['length'].std()
    }
    
    # Step 9: Convert centroids back to original CRS
    centroids_original = gdf_metrics['centroid'].to_crs(original_crs)
    gdf_metrics['centroid_lon'] = centroids_original.x
    gdf_metrics['centroid_lat'] = centroids_original.y
    
    # Step 10: Convert result back to original CRS
    gdf_result = gdf_metrics.to_crs(original_crs)
    
    # Clean up working columns
    columns_to_keep = [col for col in gdf_result.columns 
                      if col not in ['area_m2', 'length_m', 'centroid_x', 'centroid_y']]
    gdf_result = gdf_result[columns_to_keep]
    
    # Step 11: Compile results
    results = {
        'gdf_with_metrics': gdf_result,
        'summary_stats': {
            'total_features': total_features,
            'geometry_types': dict(geom_types),
            'area_stats': area_stats,
            'length_stats': length_stats
        },
        'units': {
            'area': area_unit_label,
            'length': length_unit_label
        },
        'crs_info': {
            'original_crs': str(original_crs),
            'calculation_crs': str(projected_crs)
        }
    }
    
    # Print summary
    print(f"\n🎉 Spatial metrics calculation complete!")
    print(f"   📊 Total features: {total_features}")
    print(f"   📏 Total area: {total_area:.2f} {area_unit_label}")
    print(f"   📐 Total length: {total_length:.2f} {length_unit_label}")
    print(f"   🎯 Centroids calculated for all features")
    
    return results

## 🧪 Test the Function

Let's test our spatial metrics function with sample data:

In [None]:
# Create sample test data with different geometry types
print("🏗️ Creating sample test data...\n")

# Sample polygons (representing parks)
parks = [
    Polygon([(-122.4, 37.8), (-122.3, 37.8), (-122.3, 37.7), (-122.4, 37.7)]),  # Golden Gate Park area
    Polygon([(-122.2, 37.9), (-122.1, 37.9), (-122.1, 37.8), (-122.2, 37.8)]),  # Berkeley area
    Polygon([(-122.5, 37.6), (-122.4, 37.6), (-122.4, 37.5), (-122.5, 37.5)])   # San Mateo area
]

# Sample lines (representing roads)
roads = [
    LineString([(-122.4, 37.8), (-122.3, 37.7), (-122.2, 37.6)]),  # Highway 101
    LineString([(-122.1, 37.9), (-122.3, 37.8), (-122.5, 37.7)])   # Bay Bridge area
]

# Sample points (representing landmarks)
landmarks = [
    Point(-122.4194, 37.7749),  # San Francisco
    Point(-122.2711, 37.8044),  # Oakland
    Point(-122.0839, 37.4220)   # Palo Alto
]

# Combine all geometries
all_geometries = parks + roads + landmarks
geometry_names = ['Golden Gate Park', 'Berkeley Park', 'San Mateo Park', 
                 'Highway 101', 'Bay Bridge Route',
                 'San Francisco', 'Oakland', 'Palo Alto']
geometry_types = ['Park', 'Park', 'Park', 'Road', 'Road', 'Landmark', 'Landmark', 'Landmark']

# Create GeoDataFrame
test_gdf = gpd.GeoDataFrame({
    'name': geometry_names,
    'type': geometry_types,
    'geometry': all_geometries
}, crs='EPSG:4326')

print(f"📊 Created test dataset:")
print(f"   Features: {len(test_gdf)}")
print(f"   Types: {test_gdf['type'].value_counts().to_dict()}")
print(f"   CRS: {test_gdf.crs}")

# Display the test data
print(f"\n📋 Test data preview:")
for idx, row in test_gdf.iterrows():
    geom_type = row.geometry.geom_type
    print(f"   {row['name']} ({row['type']}) - {geom_type}")

In [None]:
# Test the function
print("🚀 Testing spatial metrics calculation...\n")

# Run the calculation
results = calculate_basic_spatial_metrics(
    gdf=test_gdf,
    area_units='km2',
    length_units='km'
)

# Display results
print(f"\n📊 Results Summary:")
print(f"   Total features processed: {results['summary_stats']['total_features']}")
print(f"   Geometry types: {results['summary_stats']['geometry_types']}")
print(f"   Units: {results['units']['area']} (area), {results['units']['length']} (length)")

# Area statistics (for polygons)
area_stats = results['summary_stats']['area_stats']
if area_stats['count'] > 0:
    print(f"\n📏 Area Statistics ({results['units']['area']}):")
    print(f"   Total: {area_stats['total']:.4f}")
    print(f"   Mean: {area_stats['mean']:.4f}")
    print(f"   Min: {area_stats['min']:.4f}")
    print(f"   Max: {area_stats['max']:.4f}")
    print(f"   Features with area: {area_stats['count']}")

# Length statistics
length_stats = results['summary_stats']['length_stats']
print(f"\n📐 Length/Perimeter Statistics ({results['units']['length']}):")
print(f"   Total: {length_stats['total']:.4f}")
print(f"   Mean: {length_stats['mean']:.4f}")
print(f"   Min: {length_stats['min']:.4f}")
print(f"   Max: {length_stats['max']:.4f}")

# Show enhanced GeoDataFrame
enhanced_gdf = results['gdf_with_metrics']
print(f"\n📋 Enhanced GeoDataFrame columns:")
print(f"   New columns added: {[col for col in enhanced_gdf.columns if col not in test_gdf.columns]}")

# Display sample metrics for each feature
print(f"\n🎯 Individual Feature Metrics:")
for idx, row in enhanced_gdf.iterrows():
    area_str = f"{row['area']:.4f} {results['units']['area']}" if row['area'] > 0 else "N/A"
    length_str = f"{row['length']:.4f} {results['units']['length']}"
    centroid_str = f"({row['centroid_lon']:.4f}, {row['centroid_lat']:.4f})"
    
    print(f"   {row['name']}:")
    print(f"     Area: {area_str}")
    print(f"     Length/Perimeter: {length_str}")
    print(f"     Centroid: {centroid_str}")

## 💡 Understanding Spatial Metrics

### Key Concepts:

**Area Calculations:**
- Only meaningful for polygon geometries
- Requires projected coordinate system for accuracy
- Common units: square meters (m²), hectares (ha), square kilometers (km²)

**Length/Perimeter Calculations:**
- **Lines**: Total length of the line
- **Polygons**: Perimeter (boundary length)
- **Points**: Always 0 (no length dimension)

**Centroids:**
- Geometric center of each feature
- Useful for labeling, proximity analysis, and spatial joining
- Always calculated regardless of geometry type

### Coordinate System Considerations:
- **Geographic CRS (lat/lon)**: Measurements in degrees, inaccurate for area/length
- **Projected CRS**: Measurements in linear units (meters), accurate for calculations
- **Equal-area projections**: Best for area calculations (e.g., Mollweide, Albers)
- **Conformal projections**: Best for shape preservation (e.g., UTM, State Plane)

## 🎯 Your Task: Implement and Test

**Requirements:**
1. **Implement the function** exactly as shown above
2. **Handle different geometry types** (Point, LineString, Polygon)
3. **Manage coordinate systems** properly for accurate measurements
4. **Support multiple unit systems** (m/km for length, m²/ha/km² for area)
5. **Calculate comprehensive statistics** including min, max, mean, median
6. **Return enhanced GeoDataFrame** with new metric columns

**Key Implementation Points:**
- Validate input parameters and handle edge cases
- Use appropriate projected CRS for calculations
- Handle null geometries gracefully
- Provide clear progress messages
- Return original CRS for output data

**Testing Strategy:**
```python
# Test different scenarios:
# 1. Mixed geometry types
# 2. Different unit combinations
# 3. Geographic vs projected CRS
# 4. Empty or null geometries
# 5. Large datasets for performance
```

## 🔧 Testing Your Implementation

Run the official tests to verify your function works correctly:

```bash
cd /workspaces/your-repo
python -m pytest tests/test_geopandas_analysis.py::test_calculate_basic_spatial_metrics -v
```

### Additional Testing Ideas:
```python
# Test with real data
import geopandas as gpd

# Test with built-in datasets
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
usa = world[world.name == 'United States of America']

# Calculate metrics for US states/countries
results = calculate_basic_spatial_metrics(usa, area_units='km2')
```

## 📚 Professional Applications

### Real-World Use Cases:

**Environmental Analysis:**
- Forest patch areas and edge-to-area ratios
- Watershed boundaries and drainage lengths
- Habitat fragmentation metrics

**Urban Planning:**
- Building footprint areas and perimeters
- Park and green space measurements
- Street network length calculations

**Agriculture:**
- Field areas for crop yield calculations
- Irrigation system length requirements
- Property boundary measurements

**Emergency Management:**
- Evacuation zone areas and populations
- Emergency service coverage calculations
- Infrastructure impact assessments

### Industry Standards:
- **Accuracy requirements** vary by application (survey-grade vs. planning-level)
- **Unit conventions** depend on region and domain
- **Coordinate systems** should match local standards
- **Metadata documentation** is critical for reproducibility

## 🚀 Next Steps

Once this function works and passes the tests, move on to:
- **Function 3**: `create_spatial_buffer_analysis()` - Learn to create buffer zones and analyze proximity relationships

**This completes the core spatial measurement skills you'll use in professional GIS work!**

## 🎓 Real-World Applications

The spatial metrics you're calculating are used for:
- **Environmental analysis**: Watershed areas, habitat patch sizes
- **Urban planning**: Building footprints, park areas, street lengths
- **Agriculture**: Field areas, irrigation perimeters
- **Emergency management**: Coverage areas, evacuation zones
- **Natural resource management**: Forest areas, coastline lengths

**Good luck! 🍀**