# Function 7: Validate Coordinate Data 📍

**🤖 AI-Enhanced Learning: Advanced Data Quality Assessment**

In this notebook, you'll learn how to build the `validate_coordinate_data()` function using AI assistance. This function performs comprehensive validation of coordinate data to ensure it's suitable for GIS analysis.

## 🎯 What This Function Does
- Validates latitude and longitude ranges
- Checks for missing coordinate values
- Identifies potential precision issues
- Detects duplicate coordinate pairs
- Generates quality scores and recommendations

## 🤖 AI Learning Objectives
By the end of this notebook, you will:
1. **Use Copilot CHAT** to understand coordinate validation concepts
2. **Use AGENT mode** to implement boolean indexing and filtering  
3. **Use EDIT mode** to add error handling and validation logic
4. **Learn about** lambda functions and list comprehensions

## 🔧 Function Signature
```python
def validate_coordinate_data(df, lat_column='latitude', lon_column='longitude',
                           lat_bounds=(-90, 90), lon_bounds=(-180, 180)):
    """
    Args:
        df (pandas.DataFrame): DataFrame containing coordinate data
        lat_column (str): Name of latitude column (default: 'latitude')
        lon_column (str): Name of longitude column (default: 'longitude')
        lat_bounds (tuple): Valid latitude range (default: (-90, 90))
        lon_bounds (tuple): Valid longitude range (default: (-180, 180))
    
    Returns:
        dict: Validation results with counts, issues, and recommendations
    """
```

## 🚀 Let's Start Building This Function!


In [None]:
# Let's start by importing the libraries we need
import pandas as pd
import numpy as np
from pathlib import Path

# Load some sample data to work with
stations_file = Path('../data/weather_stations.csv')
if stations_file.exists():
    df = pd.read_csv(stations_file)
    print(f"✅ Loaded sample data: {len(df)} weather stations")
    print(f"📋 Columns: {list(df.columns)}")
    print(f"\n🔍 First 3 rows:")
    print(df.head(3))
else:
    print("⚠️ Sample data not found - we'll create some test data")
    # Create sample data for demonstration
    df = pd.DataFrame({
        'station_id': ['STN_001', 'STN_002', 'STN_003', 'STN_004', 'STN_005'],
        'latitude': [33.4484, 91.0000, 33.4484, -200.0, np.nan],  # Mix of valid/invalid values
        'longitude': [-112.0740, -112.0740, -112.0740, -112.0740, -112.0740],
        'elevation_m': [331, 335, 331, 340, 350]
    })
    print("✅ Created test data with coordinate validation challenges")
    
print(f"\n📊 Data types:")
print(df.dtypes)


## 🤖 Step 1: Use Copilot CHAT to Learn About Coordinate Validation

**Try this with Copilot Chat:**

> **💬 Ask Copilot:** "How do I check if latitude values are within valid range (-90 to 90 degrees) in pandas?"

> **💬 Ask Copilot:** "What are common coordinate data quality issues I should check for in GIS data?"

> **💬 Ask Copilot:** "How can I identify duplicate coordinate pairs in a pandas DataFrame?"

**📚 Key Concepts to Learn:**
- Valid coordinate ranges (latitude: -90 to 90, longitude: -180 to 180)
- Boolean indexing: `df[condition]`
- Missing data detection: `pd.isnull()` vs `pd.notnull()`
- Duplicate detection: `df.duplicated()`
- Precision analysis with string methods

Let's start implementing the basic validation logic!


In [None]:
# 🤖 Use Copilot AGENT mode to help implement coordinate validation!

def validate_coordinate_data_example(df, lat_column='latitude', lon_column='longitude',
                                   lat_bounds=(-90, 90), lon_bounds=(-180, 180)):
    """
    🤖 Try typing this function signature and let Copilot suggest the implementation!
    """
    
    # 🤖 AGENT MODE: Start typing these comments and let Copilot complete the code
    
    # Initialize validation results dictionary
    validation = {
        'total_records': len(df),
        'missing_coordinates': 0,
        'invalid_latitude': 0,
        'invalid_longitude': 0,
        'duplicate_coordinates': 0,
        'precision_issues': 0,
        'valid_coordinates': 0,
        'quality_score': 0.0,
        'recommendations': []
    }
    
    # Check if required columns exist
    if lat_column not in df.columns or lon_column not in df.columns:
        validation['recommendations'].append(f"Missing required coordinate columns: {lat_column}, {lon_column}")
        return validation
    
    # 🤖 Ask Copilot to help with missing coordinate detection
    missing_coords = df[lat_column].isnull() | df[lon_column].isnull()
    validation['missing_coordinates'] = missing_coords.sum()
    
    # 🤖 Use AGENT mode for latitude validation
    invalid_lat = (df[lat_column] < lat_bounds[0]) | (df[lat_column] > lat_bounds[1])
    validation['invalid_latitude'] = invalid_lat.sum()
    
    # 🤖 Use AGENT mode for longitude validation  
    invalid_lon = (df[lon_column] < lon_bounds[0]) | (df[lon_column] > lon_bounds[1])
    validation['invalid_longitude'] = invalid_lon.sum()
    
    # 🤖 Ask Copilot about duplicate detection
    duplicates = df.duplicated(subset=[lat_column, lon_column])
    validation['duplicate_coordinates'] = duplicates.sum()
    
    # Calculate valid coordinates
    valid_coords = ~(missing_coords | invalid_lat | invalid_lon | duplicates)
    validation['valid_coordinates'] = valid_coords.sum()
    
    # Calculate quality score (percentage of valid coordinates)
    validation['quality_score'] = (validation['valid_coordinates'] / validation['total_records']) * 100
    
    return validation

# Test the function
print("🧪 Testing coordinate validation...")
result = validate_coordinate_data_example(df)

print(f"📊 Validation Results:")
for key, value in result.items():
    print(f"  {key}: {value}")


## 🤖 Step 2: Use EDIT Mode to Add Advanced Features

**Try these AI prompts with Copilot EDIT mode:**

> **✏️ EDIT prompt:** "Add precision analysis to check for coordinates with excessive decimal places"

> **✏️ EDIT prompt:** "Add quality recommendations based on validation results"

> **✏️ EDIT prompt:** "Add error handling for edge cases like empty DataFrames"

**🎯 Your Task:** Now implement the full `validate_coordinate_data()` function in `src/pandas_basics.py`

**Key Implementation Points:**
1. Create comprehensive validation results dictionary
2. Check for missing columns with proper error handling
3. Use boolean indexing for range validation
4. Detect duplicates and precision issues
5. Generate quality score and actionable recommendations

**🧪 Test Your Function:**
```bash
uv run pytest tests/test_pandas_basics.py::test_validate_coordinate_data -v
```
