# 🗺️ Standardizing Coordinate Reference Systems (CRS)

**GIST 604B - Python GeoPandas Introduction**  
**Notebook 4: CRS Transformation and Standardization**

---

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:
- Understand different types of coordinate reference systems
- Transform spatial data between different CRS
- Select appropriate CRS for different analysis needs
- Handle missing or undefined CRS in datasets
- Implement intelligent auto-selection of CRS
- Implement the `standardize_crs()` function

## 🌍 Why CRS Standardization Matters

Coordinate Reference Systems define how 3D Earth coordinates are represented in 2D space:
- **Geographic CRS** - Use latitude/longitude (e.g., WGS84/EPSG:4326)
- **Projected CRS** - Use x/y coordinates in meters/feet (e.g., UTM, State Plane)
- **Analysis requirements** - Different operations need different coordinate systems
- **Data integration** - Multiple datasets must share the same CRS

**Getting CRS wrong can cause analysis failures or completely incorrect results!**

In [None]:
# Import necessary libraries
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Point, Polygon, LineString
from pyproj import CRS
import warnings
warnings.filterwarnings('ignore')

print("📦 Libraries loaded successfully!")
print(f"🐼 GeoPandas version: {gpd.__version__}")
print(f"🌍 PyProj version: Available for CRS operations")

## 🗺️ Understanding Common CRS Types

Let's explore the most common coordinate reference systems and when to use them:

In [None]:
# Create sample data in different CRS for demonstration
print("🌍 Common Coordinate Reference Systems:\n")

# 1. WGS84 (EPSG:4326) - Most common geographic CRS
wgs84_points = [
    Point(-122.419, 37.775),  # San Francisco
    Point(-74.006, 40.713),   # New York
    Point(-87.623, 41.881),   # Chicago
]

wgs84_gdf = gpd.GeoDataFrame({
    'city': ['San Francisco', 'New York', 'Chicago'],
    'population': [883305, 8336817, 2693976],
    'geometry': wgs84_points
}, crs='EPSG:4326')

print(f"📍 1. WGS84 (EPSG:4326) - Geographic Coordinates")
print(f"   Purpose: Global positioning, web mapping, GPS data")
print(f"   Units: Degrees (latitude/longitude)")
print(f"   Sample coordinates: {wgs84_points[0].x:.3f}°, {wgs84_points[0].y:.3f}°")
print(f"   CRS Info: {wgs84_gdf.crs}\n")

# 2. Web Mercator (EPSG:3857) - Web mapping standard
web_mercator_gdf = wgs84_gdf.to_crs('EPSG:3857')
web_merc_point = web_mercator_gdf.geometry.iloc[0]

print(f"🌐 2. Web Mercator (EPSG:3857) - Projected Coordinates")
print(f"   Purpose: Web mapping (Google Maps, OpenStreetMap)")
print(f"   Units: Meters")
print(f"   Sample coordinates: {web_merc_point.x:,.0f}m, {web_merc_point.y:,.0f}m")
print(f"   CRS Info: {web_mercator_gdf.crs}\n")

# 3. UTM Zone (example: EPSG:32633 for Central Europe)
# Create European sample data
europe_wgs84 = gpd.GeoDataFrame({
    'city': ['Berlin', 'Munich'], 
    'geometry': [Point(13.405, 52.520), Point(11.582, 48.135)]
}, crs='EPSG:4326')

utm_gdf = europe_wgs84.to_crs('EPSG:32633')  # UTM Zone 33N
utm_point = utm_gdf.geometry.iloc[0]

print(f"📐 3. UTM Zone 33N (EPSG:32633) - High-Precision Projected")
print(f"   Purpose: Precise measurements, surveying, regional analysis")
print(f"   Units: Meters")
print(f"   Sample coordinates: {utm_point.x:,.0f}m, {utm_point.y:,.0f}m")
print(f"   CRS Info: {utm_gdf.crs}")

## 🔄 CRS Transformation Examples

Let's see how coordinates change when we transform between different CRS:

In [None]:
# Demonstrate coordinate transformations
print("🔄 Coordinate Transformation Examples:\n")

# Start with a point in San Francisco
sf_point = Point(-122.419, 37.775)
original_gdf = gpd.GeoDataFrame({'name': ['San Francisco']}, 
                               geometry=[sf_point], crs='EPSG:4326')

print(f"🏁 Original (WGS84): {sf_point.x:.6f}°, {sf_point.y:.6f}°")

# Transform to different CRS and show coordinate changes
transformations = [
    ('EPSG:3857', 'Web Mercator'),
    ('EPSG:32610', 'UTM Zone 10N (California)'),
    ('EPSG:2227', 'California State Plane Zone III'),
]

for crs_code, crs_name in transformations:
    transformed = original_gdf.to_crs(crs_code)
    point = transformed.geometry.iloc[0]
    
    print(f"➡️  {crs_name} ({crs_code}):")
    print(f"    X: {point.x:,.2f}, Y: {point.y:,.2f}")
    
    # Show units
    crs_obj = CRS.from_epsg(int(crs_code.split(':')[1]))
    axis_info = crs_obj.axis_info
    units = axis_info[0].unit_name if axis_info else 'unknown'
    print(f"    Units: {units}\n")

print("💡 Notice how coordinates change dramatically between CRS types!")
print("   Geographic: Small decimal degrees")
print("   Projected: Large numbers in meters/feet")

## 📏 Impact of CRS on Distance Calculations

Different CRS give very different results for the same measurement operations:

In [None]:
# Create two points for distance calculation
point1 = Point(-122.419, 37.775)  # San Francisco
point2 = Point(-122.389, 37.795)  # About 3km away

test_gdf = gpd.GeoDataFrame({
    'name': ['Point 1', 'Point 2'],
    'geometry': [point1, point2]
}, crs='EPSG:4326')

print("📏 Distance Calculation Comparison:\n")

# Calculate distances in different CRS
distance_tests = [
    ('EPSG:4326', 'WGS84 (Geographic)', 'degrees'),
    ('EPSG:3857', 'Web Mercator', 'meters'),
    ('EPSG:32610', 'UTM Zone 10N', 'meters'),
]

for crs_code, crs_name, units in distance_tests:
    gdf_proj = test_gdf.to_crs(crs_code)
    
    # Calculate distance between the two points
    distance = gdf_proj.geometry.iloc[0].distance(gdf_proj.geometry.iloc[1])
    
    print(f"🎯 {crs_name} ({crs_code}):")
    
    if units == 'degrees':
        print(f"   Distance: {distance:.6f} degrees")
        print(f"   ⚠️  Geographic distances are NOT meaningful!")
    else:
        print(f"   Distance: {distance:.0f} {units}")
        if units == 'meters':
            print(f"   Distance: {distance/1000:.2f} km")
            print(f"   ✅ Projected distances are accurate")
    print()

print("🔑 Key Insight: Always use projected CRS for distance/area calculations!")
print("   Geographic CRS (degrees) give meaningless distance values.")

## 🤖 Intelligent CRS Selection

How do we automatically choose the best CRS for a dataset? Let's explore the logic:

In [None]:
def demonstrate_crs_selection(gdf, dataset_name):
    """
    Demonstrate intelligent CRS selection logic.
    This shows the decision-making process for choosing appropriate CRS.
    """
    print(f"🤖 CRS Selection for {dataset_name}:")
    print("-" * 50)
    
    # Step 1: Check current CRS
    if gdf.crs is None:
        print("❌ No CRS defined - assuming WGS84")
        gdf = gdf.set_crs('EPSG:4326')
    else:
        print(f"✅ Current CRS: {gdf.crs}")
    
    # Step 2: Analyze spatial extent
    bounds = gdf.total_bounds  # minx, miny, maxx, maxy
    minx, miny, maxx, maxy = bounds
    
    # Convert to geographic if needed for analysis
    if not gdf.crs.is_geographic:
        geo_gdf = gdf.to_crs('EPSG:4326')
        geo_bounds = geo_gdf.total_bounds
        minx, miny, maxx, maxy = geo_bounds
    
    print(f"📍 Geographic extent:")
    print(f"   Longitude: {minx:.3f}° to {maxx:.3f}° (width: {maxx-minx:.3f}°)")
    print(f"   Latitude: {miny:.3f}° to {maxy:.3f}° (height: {maxy-miny:.3f}°)")
    
    # Step 3: Make CRS recommendation
    width = maxx - minx
    height = maxy - miny
    center_lon = (minx + maxx) / 2
    center_lat = (miny + maxy) / 2
    
    print(f"\n🎯 CRS Recommendation Logic:")
    
    if width > 60 or height > 30:
        # Global or very large area
        recommended_crs = 'EPSG:4326'
        reason = "Large geographic extent - use WGS84 for global data"
    elif width > 6 or height > 6:
        # Continental scale - Web Mercator might work
        recommended_crs = 'EPSG:3857'
        reason = "Continental scale - Web Mercator for visualization"
    else:
        # Regional scale - UTM would be best
        # Simple UTM zone calculation (rough approximation)
        utm_zone = int((center_lon + 180) / 6) + 1
        if center_lat >= 0:
            utm_epsg = 32600 + utm_zone  # Northern hemisphere
            hemisphere = "N"
        else:
            utm_epsg = 32700 + utm_zone  # Southern hemisphere  
            hemisphere = "S"
        
        recommended_crs = f'EPSG:{utm_epsg}'
        reason = f"Regional data - UTM Zone {utm_zone}{hemisphere} for accurate measurements"
    
    print(f"   📋 Decision: {recommended_crs}")
    print(f"   💭 Reasoning: {reason}")
    
    return recommended_crs

# Test with different datasets
test_datasets = [
    # Global data
    (gpd.GeoDataFrame({
        'name': ['New York', 'London', 'Tokyo'],
        'geometry': [Point(-74, 41), Point(0, 51), Point(139, 35)]
    }, crs='EPSG:4326'), "Global Cities"),
    
    # Regional data (California)
    (gpd.GeoDataFrame({
        'name': ['San Francisco', 'Los Angeles'],
        'geometry': [Point(-122.4, 37.8), Point(-118.2, 34.1)]
    }, crs='EPSG:4326'), "California Cities"),
    
    # Local data (San Francisco Bay Area)
    (gpd.GeoDataFrame({
        'name': ['San Francisco', 'Oakland', 'San Jose'], 
        'geometry': [Point(-122.4, 37.8), Point(-122.3, 37.8), Point(-121.9, 37.3)]
    }, crs='EPSG:4326'), "Bay Area Cities"),
]

recommendations = []
for gdf, name in test_datasets:
    rec_crs = demonstrate_crs_selection(gdf, name)
    recommendations.append((name, rec_crs))
    print("\n" + "="*60 + "\n")

print("📊 Summary of Recommendations:")
for name, crs in recommendations:
    print(f"   {name}: {crs}")

## 🛠️ Building Your standardize_crs() Function

Now let's implement the complete function that handles CRS standardization:

In [None]:
def standardize_crs_demo(gdf, target_crs=None):
    """
    Transform spatial data to a standardized coordinate reference system.
    
    This is the implementation you should copy to your src/spatial_basics.py file.
    
    Args:
        gdf (gpd.GeoDataFrame): Input spatial dataset to reproject
        target_crs (Union[str, int, None]): Target CRS specification:
            - None: Auto-select appropriate CRS based on data extent
            - int: EPSG code (e.g., 4326 for WGS84, 3857 for Web Mercator)
            - str: CRS string (e.g., 'EPSG:4326', '+proj=utm +zone=33 +datum=WGS84')
    
    Returns:
        gpd.GeoDataFrame: Dataset reprojected to target CRS
    """
    # Step 1: Handle missing CRS
    if gdf.crs is None:
        print("⚠️  Warning: No CRS defined. Assuming WGS84 (EPSG:4326)")
        gdf = gdf.set_crs('EPSG:4326')
    
    # Step 2: Handle target_crs parameter
    if target_crs is None:
        # Auto-select appropriate CRS
        target_crs = auto_select_crs(gdf)
        print(f"🤖 Auto-selected CRS: {target_crs}")
    
    # Convert target_crs to string format if it's an integer
    if isinstance(target_crs, int):
        target_crs = f'EPSG:{target_crs}'
    
    # Step 3: Check if transformation is needed
    source_crs = gdf.crs
    
    # Compare CRS (handle different string formats)
    try:
        source_epsg = source_crs.to_epsg()
        if isinstance(target_crs, str) and target_crs.startswith('EPSG:'):
            target_epsg = int(target_crs.split(':')[1])
            if source_epsg == target_epsg:
                print(f"✅ No transformation needed - already in {target_crs}")
                return gdf.copy()
    except:
        # If EPSG comparison fails, proceed with transformation
        pass
    
    # Step 4: Perform the transformation
    print(f"🔄 Transforming from {source_crs} to {target_crs}")
    
    try:
        transformed_gdf = gdf.to_crs(target_crs)
        
        # Step 5: Validate the result
        if len(transformed_gdf) != len(gdf):
            raise ValueError("Feature count changed during transformation")
        
        # Check if geometries are still valid
        invalid_count = (~transformed_gdf.geometry.is_valid).sum()
        if invalid_count > 0:
            print(f"⚠️  Warning: {invalid_count} geometries became invalid during transformation")
        
        print(f"✅ Transformation completed successfully")
        return transformed_gdf
        
    except Exception as e:
        print(f"❌ Transformation failed: {str(e)}")
        raise ValueError(f"Unable to transform to CRS {target_crs}: {str(e)}")


def auto_select_crs(gdf):
    """
    Helper function to automatically select appropriate CRS based on data extent.
    """
    # Convert to geographic CRS for extent analysis if needed
    if not gdf.crs.is_geographic:
        geo_gdf = gdf.to_crs('EPSG:4326')
    else:
        geo_gdf = gdf
    
    # Get geographic bounds
    bounds = geo_gdf.total_bounds  # minx, miny, maxx, maxy
    minx, miny, maxx, maxy = bounds
    
    width = maxx - minx
    height = maxy - miny
    center_lon = (minx + maxx) / 2
    center_lat = (miny + maxy) / 2
    
    # Selection logic
    if width > 60 or height > 30:
        # Global or very large area - use WGS84
        return 'EPSG:4326'
    elif width > 6 or height > 6:
        # Continental scale - Web Mercator for visualization
        return 'EPSG:3857'
    else:
        # Regional scale - select appropriate UTM zone
        utm_zone = int((center_lon + 180) / 6) + 1
        if center_lat >= 0:
            utm_epsg = 32600 + utm_zone  # Northern hemisphere
        else:
            utm_epsg = 32700 + utm_zone  # Southern hemisphere
        
        return f'EPSG:{utm_epsg}'

print("✅ Function implementation completed!")
print("📋 Copy this function to your src/spatial_basics.py file")

## 🧪 Testing Your Implementation

Let's test our standardize_crs function with different scenarios:

In [None]:
print("🧪 Testing standardize_crs() Function\n")
print("=" * 50)

# Test 1: Explicit CRS transformation
print("\n🧪 Test 1: Explicit CRS Transformation")
print("-" * 40)

# Create test data in WGS84
test_data = gpd.GeoDataFrame({
    'id': [1, 2, 3],
    'name': ['Point A', 'Point B', 'Point C'],
    'geometry': [
        Point(-122.419, 37.775),  # San Francisco
        Point(-122.389, 37.795),  # Berkeley
        Point(-122.272, 37.805)   # Oakland
    ]
}, crs='EPSG:4326')

# Transform to Web Mercator
result1 = standardize_crs_demo(test_data, target_crs=3857)
print(f"Original CRS: {test_data.crs}")
print(f"Result CRS: {result1.crs}")
print(f"Sample coordinates: {result1.geometry.iloc[0].x:.0f}, {result1.geometry.iloc[0].y:.0f}")

# Test 2: String CRS specification
print("\n🧪 Test 2: String CRS Specification")
print("-" * 40)

result2 = standardize_crs_demo(test_data, target_crs='EPSG:32610')
print(f"String target: 'EPSG:32610'")
print(f"Result CRS: {result2.crs}")

# Test 3: Auto-selection
print("\n🧪 Test 3: Automatic CRS Selection")
print("-" * 40)

result3 = standardize_crs_demo(test_data, target_crs=None)
print(f"Auto-selected CRS: {result3.crs}")

# Test 4: No transformation needed
print("\n🧪 Test 4: No Transformation Needed")
print("-" * 40)

result4 = standardize_crs_demo(test_data, target_crs=4326)
print(f"Same CRS requested: {result4.crs}")

# Test 5: Missing CRS handling
print("\n🧪 Test 5: Missing CRS Handling")
print("-" * 40)

no_crs_data = gpd.GeoDataFrame({
    'id': [1, 2],
    'geometry': [Point(-120, 45), Point(-118, 46)]
})

result5 = standardize_crs_demo(no_crs_data, target_crs=3857)
print(f"Handled missing CRS, result: {result5.crs}")

print("\n🎉 All tests completed successfully!")

## 🌍 Real-World CRS Selection Guidelines

Here are professional guidelines for choosing the right CRS for your projects:

In [None]:
print("🌍 Professional CRS Selection Guidelines:\n")

guidelines = {
    "📍 Web Mapping & Visualization": [
        "EPSG:3857 (Web Mercator) - Standard for web maps",
        "EPSG:4326 (WGS84) - GPS data, global datasets",
        "Good for: Interactive maps, web applications"
    ],
    
    "📏 Distance & Area Calculations": [
        "UTM zones - Most accurate for regional analysis", 
        "State Plane - Accurate for US state-level work",
        "Local projections - Best for specific countries/regions",
        "Never use: Geographic CRS (degrees) for measurements"
    ],
    
    "🗺️ Data Integration": [
        "Use common CRS for all datasets in analysis",
        "Transform to analysis CRS, not visualization CRS",
        "Document CRS choices for reproducibility"
    ],
    
    "⚡ Performance Considerations": [
        "Minimize transformations - transform once, use many times",
        "Store data in analysis CRS when possible",
        "Web Mercator for display, projected for analysis"
    ]
}

for category, items in guidelines.items():
    print(f"{category}:")
    for item in items:
        if item.startswith("Good for:") or item.startswith("Never use:"):
            print(f"   💡 {item}")
        else:
            print(f"   • {item}")
    print()

## 🎯 Key Takeaways

After completing this notebook, you should understand:

✅ **CRS Types** - Geographic vs. projected coordinate systems and their uses  
✅ **Transformation** - How to convert between different coordinate systems  
✅ **CRS Selection** - Choosing appropriate CRS for different analysis needs  
✅ **Auto-selection** - Intelligent CRS selection based on data characteristics  
✅ **Best practices** - Professional guidelines for CRS management  
✅ **Performance** - Minimizing transformations for efficient workflows  

## 📚 Next Steps

1. **Implement** your `standardize_crs()` function in `src/spatial_basics.py`
2. **Test** your implementation with `uv run pytest tests/ -k "standardize_crs" -v`
3. **Complete** the full GeoPandas assignment by running all tests
4. **Move on** to advanced GeoPandas topics or other assignments

---

*Coordinate Reference Systems are fundamental to spatial analysis. Understanding when and how to transform between CRS is essential for accurate, reliable results. Always choose your CRS thoughtfully!* 🌟