# Function 1: Load and Explore Spatial Data 🗺️

**Welcome to your first GeoPandas function!**

In this notebook, you'll learn how to build the `load_and_explore_spatial_data()` function step by step. This is like opening a spatial dataset (shapefile, GeoJSON, etc.) and getting familiar with its geographic properties.

## 🎯 What This Function Does
- Loads spatial data files (shapefiles, GeoJSON, etc.) into a GeoDataFrame
- Shows you key spatial information (CRS, geometry types, bounds)
- Displays the first few features so you can see the data structure
- Checks for spatial data quality issues
- Handles errors gracefully

## 🔧 Function Signature
```python
def load_and_explore_spatial_data(file_path):
    """
    Args:
        file_path (str): Path to spatial file (e.g., 'data/cities.geojson')
    
    Returns:
        geopandas.GeoDataFrame: The loaded spatial dataset
    """
```

## 🚀 Step 1: Import Required Libraries

First, let's import the libraries we need for spatial analysis:

In [None]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import warnings

# Suppress coordinate system warnings for cleaner output
warnings.filterwarnings("ignore", category=UserWarning)

print(f"✅ GeoPandas version: {gpd.__version__}")
print(f"📊 Pandas version: {pd.__version__}")
print("🗺️ Ready to work with spatial data!")

## 📁 Step 2: Understanding Spatial File Formats

Before we load data, let's see what spatial files we have available:

In [None]:
# Let's see what spatial files we have in the data directory
data_dir = Path('../data')

if data_dir.exists():
    print("📂 Files in data directory:")
    for file in sorted(data_dir.iterdir()):
        if file.is_file():
            print(f"   {file.name}")
else:
    print(f"❌ Directory not found: {data_dir}")

## 🗺️ Step 3: Loading Your First Spatial Dataset

Now let's load a spatial dataset and see what happens:

In [None]:
# Define the file path to a spatial dataset
# We'll create a simple example if no files exist
from shapely.geometry import Point

# Create a sample GeoDataFrame for demonstration
sample_data = {
    'city_name': ['San Francisco', 'Los Angeles', 'San Diego'],
    'population': [883305, 3979576, 1423851],
    'geometry': [
        Point(-122.4194, 37.7749),
        Point(-118.2437, 34.0522),
        Point(-117.1611, 32.7157)
    ]
}

gdf = gpd.GeoDataFrame(sample_data, crs='EPSG:4326')
print("✅ Sample spatial dataset created for demonstration")
print(f"📊 Dataset type: {type(gdf)}")

## 🔍 Step 4: Exploring Basic Spatial Properties

When you first load spatial data, you want to understand:
- How many features does it have?
- What coordinate reference system (CRS) is it using?
- What geometry types are present?
- What is the spatial extent (bounds)?

In [None]:
# Show basic information about the spatial dataset
print("📊 BASIC SPATIAL INFORMATION")
print("=" * 40)

# Shape (number of features and attributes)
print(f"📏 Shape: {gdf.shape} - {gdf.shape[0]} features and {gdf.shape[1]} columns")

# Coordinate Reference System (CRS)
print(f"🌍 CRS: {gdf.crs}")
if gdf.crs is None:
    print("   ⚠️  Warning: No CRS defined!")
elif 'EPSG:4326' in str(gdf.crs):
    print("   📍 Geographic coordinates (WGS84 - latitude/longitude)")
else:
    print("   📐 Projected coordinates (likely in meters)")

# Geometry types
geometry_types = gdf.geom_type.value_counts()
print(f"\n🔺 Geometry types:")
for geom_type, count in geometry_types.items():
    print(f"   {geom_type}: {count} features")

# Spatial bounds (extent)
bounds = gdf.total_bounds
print(f"\n📐 Spatial bounds:")
print(f"   West:  {bounds[0]:.6f}° (longitude)")
print(f"   South: {bounds[1]:.6f}° (latitude)")
print(f"   East:  {bounds[2]:.6f}° (longitude)")
print(f"   North: {bounds[3]:.6f}° (latitude)")

## 👀 Step 5: Looking at the Actual Spatial Data

Numbers are helpful, but you need to see the actual features and attributes:

In [None]:
# Show column names and data types
print("📋 COLUMNS AND DATA TYPES")
print("=" * 40)

for i, (col, dtype) in enumerate(gdf.dtypes.items()):
    if col == 'geometry':
        print(f"   {i+1}. {col}: {dtype} (spatial geometry column) 🗺️")
    else:
        print(f"   {i+1}. {col}: {dtype}")

print(f"\n👀 FIRST 3 FEATURES")
print("=" * 40)

# Display first few rows
display(gdf.head(3))

## 🧐 Step 6: Spatial Data Quality Check

Real-world spatial data often has problems. Let's check for common issues:

In [None]:
print("🔍 SPATIAL DATA QUALITY CHECK")
print("=" * 40)

# Check for missing values in attributes
print("1. Missing attribute values:")
missing_attrs = gdf.isnull().sum()
if missing_attrs.sum() > 0:
    print("   ⚠️  Found missing values:")
    for col, count in missing_attrs[missing_attrs > 0].items():
        print(f"      {col}: {count} missing values")
else:
    print("   ✅ No missing attribute values")

# Check for empty geometries
print("\n2. Empty geometries:")
empty_geoms = gdf.geometry.is_empty.sum()
if empty_geoms > 0:
    print(f"   ⚠️  Found {empty_geoms} empty geometries")
else:
    print("   ✅ No empty geometries")

# Check for invalid geometries
print("\n3. Invalid geometries:")
invalid_geoms = (~gdf.geometry.is_valid).sum()
if invalid_geoms > 0:
    print(f"   ⚠️  Found {invalid_geoms} invalid geometries")
else:
    print("   ✅ All geometries are valid")

# Check CRS
print("\n4. Coordinate Reference System:")
if gdf.crs is None:
    print("   ❌ No CRS defined - this will cause problems!")
else:
    print(f"   ✅ CRS is defined: {gdf.crs}")

## 🗺️ Step 7: Quick Visualization

A picture is worth a thousand words! Let's create a simple map:

In [None]:
# Create a simple map of the spatial data
print("🗺️ CREATING SIMPLE MAP")
print("=" * 40)

fig, ax = plt.subplots(1, 1, figsize=(10, 8))

# Create map based on geometry type
main_geom_type = gdf.geom_type.value_counts().index[0]

if main_geom_type == 'Point':
    gdf.plot(ax=ax, color='red', markersize=100, alpha=0.7)
    map_type = "Point locations"
else:
    gdf.plot(ax=ax, color='lightblue', edgecolor='black', alpha=0.7)
    map_type = "Spatial features"

# Improve map appearance
ax.set_title(f'Spatial Data Overview\n{len(gdf)} {map_type}', 
            fontsize=14, fontweight='bold')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"✅ Map created successfully showing {len(gdf)} features")

## 🏗️ Step 8: Building the Complete Function

Now let's put everything together into a reusable function. This is what you'll implement in `src/spatial_analysis.py`:

In [None]:
def load_and_explore_spatial_data(file_path: str) -> gpd.GeoDataFrame:
    """
    Load a spatial dataset and display comprehensive information about it.
    
    Args:
        file_path (str): Path to spatial data file
        
    Returns:
        gpd.GeoDataFrame: The loaded spatial dataset, or None if loading failed
    """
    
    print("=" * 60)
    print("LOADING AND EXPLORING SPATIAL DATA")
    print("=" * 60)
    
    # Step 1: Check if file exists
    if not Path(file_path).exists():
        print(f"❌ ERROR: File not found: {file_path}")
        return None
    
    print(f"📁 Loading spatial data from: {file_path}")
    
    # Step 2: Load the spatial file
    try:
        gdf = gpd.read_file(file_path)
        print("✅ Spatial file loaded successfully!")
    except Exception as e:
        print(f"❌ ERROR loading spatial file: {e}")
        return None
    
    # Step 3: Display basic spatial information
    print(f"\n📊 SPATIAL DATASET OVERVIEW")
    print(f"Shape: {gdf.shape} - {gdf.shape[0]} features and {gdf.shape[1]} columns")
    print(f"CRS: {gdf.crs}")
    
    if gdf.crs is None:
        print("   ⚠️  Warning: No CRS defined!")
    elif 'EPSG:4326' in str(gdf.crs):
        print("   📍 Geographic coordinates (WGS84)")
    else:
        print("   📐 Projected coordinates")
    
    # Geometry types
    geometry_counts = gdf.geom_type.value_counts()
    print(f"\n🔺 Geometry types:")
    for geom_type, count in geometry_counts.items():
        print(f"   {geom_type}: {count} features")
    
    # Spatial bounds
    bounds = gdf.total_bounds
    print(f"\n📐 Spatial bounds:")
    print(f"   Min X: {bounds[0]:.6f}, Min Y: {bounds[1]:.6f}")
    print(f"   Max X: {bounds[2]:.6f}, Max Y: {bounds[3]:.6f}")
    
    # Step 4: Show first few features
    print(f"\n👀 FIRST 3 FEATURES:")
    print(gdf.head(3))
    
    # Step 5: Check data quality
    print(f"\n🔍 DATA QUALITY CHECK:")
    
    # Missing values
    missing = gdf.isnull().sum()
    if missing.sum() > 0:
        print("Missing values found:")
        print(missing[missing > 0])
    else:
        print("✅ No missing values")
    
    # Empty geometries
    empty_geoms = gdf.geometry.is_empty.sum()
    if empty_geoms > 0:
        print(f"⚠️  Found {empty_geoms} empty geometries")
    else:
        print("✅ No empty geometries")
    
    # Invalid geometries
    invalid_geoms = (~gdf.geometry.is_valid).sum()
    if invalid_geoms > 0:
        print(f"⚠️  Found {invalid_geoms} invalid geometries")
    else:
        print("✅ All geometries are valid")
    
    print(f"\n🎉 Spatial data exploration complete!")
    return gdf

## ✨ Step 9: Test Your Function

Let's test our complete function:

In [None]:
# Test with our sample data (save it first)
# Save sample data to test file loading
test_file = '../data/test_cities.geojson'
gdf.to_file(test_file)
print(f"💾 Saved test file: {test_file}")

print("\n🧪 TESTING COMPLETE FUNCTION\n")
result = load_and_explore_spatial_data(test_file)

In [None]:
# Test error handling with non-existent file
print("\n🧪 TESTING ERROR HANDLING\n")
result = load_and_explore_spatial_data('nonexistent_file.geojson')

## 🎯 Your Assignment Task

Now that you understand how this function works:

1. **Go to `src/spatial_analysis.py`**
2. **Find the `load_and_explore_spatial_data()` function**
3. **Replace the TODO comments with your implementation**
4. **Test your function with pytest**:

```bash
# Test just this function
uv run pytest tests/test_spatial_analysis.py::test_load_and_explore_spatial_data -v

# Test all functions
uv run pytest tests/ -v
```

## 🔑 Key Learning Points

- **`gpd.read_file()`** loads spatial files (shapefile, GeoJSON, etc.) into GeoDataFrames
- **`.crs`** shows the coordinate reference system - crucial for spatial analysis!
- **`.total_bounds`** gives you the spatial extent (bounding box)
- **`.geom_type`** shows what geometry types are in your data
- **Always check for data quality issues** - empty geometries, missing CRS, invalid data
- **Error handling makes your spatial code robust and user-friendly**
- **Geographic vs. Projected coordinates** - understand the difference!

## 🚀 Next Steps

Once this function works and passes the tests, move on to:
- **Function 2**: `calculate_basic_spatial_metrics()` - Learn to calculate areas, perimeters, and centroids
- **Function 3**: `create_spatial_buffer_analysis()` - Learn to create buffer zones and analyze proximity

**Good luck! 🍀**