# 📁 Function 1: Load Spatial Datasets

## Building the `load_spatial_dataset` Function

**Learning Objectives:**
- Understand different spatial data formats and their characteristics
- Learn to load spatial data using GeoPandas
- Handle common file loading issues and encoding problems
- Validate spatial data after loading
- Work with various coordinate reference systems
- Implement robust error handling for spatial data operations

**Professional Context:**
Loading spatial data is the foundation of all GIS workflows. Professionals need to:
- Handle diverse data formats (Shapefile, GeoJSON, GeoPackage, etc.)
- Deal with encoding issues and corrupted files
- Validate data integrity and spatial properties
- Ensure coordinate reference systems are properly handled
- Provide meaningful error messages for debugging

## 🎯 Function Overview

**Function Signature:**
```python
def load_spatial_dataset(file_path, expected_crs=None, encoding='utf-8'):
    """
    Load a spatial dataset from various formats with robust error handling.
    
    Parameters:
    -----------
    file_path : str
        Path to the spatial data file
    expected_crs : str or int, optional
        Expected coordinate reference system (EPSG code or proj string)
    encoding : str, default 'utf-8'
        Text encoding for the file
    
    Returns:
    --------
    gpd.GeoDataFrame
        Loaded spatial dataset
    
    Raises:
    -------
    FileNotFoundError
        If the file doesn't exist
    ValueError
        If the file format is not supported or data is invalid
    """
```

## 📚 Spatial Data Formats

### Common Spatial Vector Formats

| Format | Extension | Description | Pros | Cons |
|--------|-----------|-------------|------|------|
| **Shapefile** | `.shp` | ESRI format, multiple files | Universal support | Multiple files, limited attributes |
| **GeoJSON** | `.geojson` | JSON-based format | Human readable, web-friendly | Large file sizes |
| **GeoPackage** | `.gpkg` | SQLite-based format | Single file, supports raster+vector | Newer format |
| **KML/KMZ** | `.kml/.kmz` | Google Earth format | 3D support, styling | Limited attribute support |

### GeoPandas Reading Capabilities
```python
import geopandas as gpd

# GeoPandas can read many formats automatically:
gdf = gpd.read_file('data.shp')        # Shapefile
gdf = gpd.read_file('data.geojson')    # GeoJSON
gdf = gpd.read_file('data.gpkg')       # GeoPackage
gdf = gpd.read_file('data.kml')        # KML
```

## 🔧 Implementation Strategy

### Step 1: File Validation
```python
import os
from pathlib import Path

# Check if file exists
if not os.path.exists(file_path):
    raise FileNotFoundError(f"Spatial data file not found: {file_path}")

# Get file extension
file_extension = Path(file_path).suffix.lower()
```

### Step 2: Data Loading with Error Handling
```python
try:
    gdf = gpd.read_file(file_path, encoding=encoding)
except UnicodeDecodeError:
    # Try different encodings
    for enc in ['latin1', 'cp1252', 'iso-8859-1']:
        try:
            gdf = gpd.read_file(file_path, encoding=enc)
            break
        except UnicodeDecodeError:
            continue
    else:
        raise ValueError("Could not decode file with any common encoding")
except Exception as e:
    raise ValueError(f"Error loading spatial data: {str(e)}")
```

## 💻 Hands-On Examples

In [None]:
import geopandas as gpd
import pandas as pd
import os
from pathlib import Path

# Example 1: Loading sample data
def example_load_data():
    """Demonstrate loading spatial data"""
    try:
        # Use built-in sample data
        world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
        print(f"✓ Successfully loaded world data")
        print(f"  Shape: {world.shape}")
        print(f"  Columns: {list(world.columns)}")
        print(f"  CRS: {world.crs}")
        return world
    except Exception as e:
        print(f"✗ Error loading data: {e}")
        return None

# Run example
sample_data = example_load_data()

In [None]:
# Example 2: Robust loading function
def robust_load_example(file_path, encoding='utf-8'):
    """Example implementation with error handling"""
    # File validation
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"File not found: {file_path}")
    
    # Attempt to load data
    try:
        gdf = gpd.read_file(file_path, encoding=encoding)
        print(f"✓ Loaded with {encoding} encoding")
    except UnicodeDecodeError:
        print(f"✗ Failed with {encoding}, trying alternatives...")
        # Try alternative encodings
        for alt_encoding in ['latin1', 'cp1252', 'iso-8859-1']:
            try:
                gdf = gpd.read_file(file_path, encoding=alt_encoding)
                print(f"✓ Success with {alt_encoding} encoding")
                break
            except UnicodeDecodeError:
                continue
        else:
            raise ValueError("Could not decode file with any encoding")
    except Exception as e:
        raise ValueError(f"Error loading spatial data: {str(e)}")
    
    # Validate loaded data
    if gdf.empty:
        raise ValueError("Dataset is empty")
    
    if gdf.geometry.isna().all():
        raise ValueError("No valid geometries found")
    
    # Report success
    print(f"✓ Dataset validation passed:")
    print(f"  Features: {len(gdf)}")
    print(f"  Columns: {len(gdf.columns)}")
    print(f"  CRS: {gdf.crs or 'Not defined'}")
    
    return gdf

# Test the function
try:
    test_path = gpd.datasets.get_path('naturalearth_lowres')
    result = robust_load_example(test_path)
    print(f"\n✓ Function test successful!")
except Exception as e:
    print(f"\n✗ Function test failed: {e}")

## 🔍 Common Issues and Solutions

### Issue 1: Encoding Problems
**Problem:** `UnicodeDecodeError` when loading files with special characters

**Solution:** Try multiple encodings (utf-8, latin1, cp1252)

### Issue 2: Missing Files
**Problem:** Shapefile missing required components (.shx, .dbf)

**Solution:** Check for all required shapefile components

### Issue 3: CRS Issues
**Problem:** Missing or incorrect coordinate reference system

**Solution:** Validate CRS and provide warnings

## 🎯 Your Implementation Task

Now implement the `load_spatial_dataset` function in `src/geopandas_fundamentals.py`.

### Requirements Checklist:
- [ ] File existence validation
- [ ] Robust data loading with error handling
- [ ] Multiple encoding support
- [ ] Data validation (empty check, geometry check)
- [ ] CRS validation (optional parameter)
- [ ] Informative error messages
- [ ] Return loaded GeoDataFrame

In [None]:
# Test your implementation
import sys
import os
sys.path.append('../src')

try:
    from geopandas_fundamentals import load_spatial_dataset
    
    # Test with sample data
    test_file = gpd.datasets.get_path('naturalearth_lowres')
    
    print("Testing load_spatial_dataset function...")
    result = load_spatial_dataset(test_file)
    
    if isinstance(result, gpd.GeoDataFrame) and not result.empty:
        print("✓ Test passed! Function works correctly.")
        print(f"  Loaded {len(result)} features")
        print(f"  Columns: {list(result.columns)}")
        print(f"  CRS: {result.crs}")
    else:
        print("✗ Test failed! Function did not return expected result.")
        
except ImportError:
    print("Function not implemented yet. Complete the implementation in src/geopandas_fundamentals.py")
except Exception as e:
    print(f"✗ Test failed with error: {e}")

## 🧪 Testing Your Function

Once implemented, test your function thoroughly:

```bash
# Run the specific test for this function
cd /workspaces/your-repo
python -m pytest tests/test_geopandas_fundamentals.py::test_load_spatial_dataset -v
```

### Test Cases Your Function Should Handle:
1. ✅ **Valid file loading** - Successfully load common spatial formats
2. ✅ **File not found** - Raise appropriate error for missing files
3. ✅ **Encoding issues** - Handle different text encodings gracefully
4. ✅ **Empty datasets** - Detect and handle empty spatial data
5. ✅ **Invalid geometries** - Identify datasets with no valid geometry
6. ✅ **CRS validation** - Check coordinate reference systems when specified

## 🚀 Next Steps

After successfully implementing and testing this function:

1. **Move to Function 2:** `02_function_explore_spatial_properties.ipynb`
2. **Build on this foundation:** The spatial data you load here will be used in subsequent functions
3. **Professional development:** Consider how this function could be extended for production use

---

**🎯 Goal:** Master the fundamentals of spatial data loading - the first step in any GIS workflow!

**Next:** Once your tests pass, continue to `02_function_explore_spatial_properties.ipynb`