# Living Data Science: Geospatial AI in the Black Hills

Welcome to Day Three! Today you'll discover how AI and data science help us understand and protect the places we love. We'll work with real geographic data from the Black Hills region and build tools that connect technology with environmental stewardship.

**Learning Objectives:**
- Understand what makes data "geospatial" and why location matters
- Work with real environmental and tourism data from the Black Hills
- Visualize geographic patterns and tell stories with data
- Build AI models that help with conservation and planning
- Connect technology to cultural values of land stewardship

**Time**: Two 90-minute sessions
**Sessions**: 3A (Introduction to Geospatial Data) + 3B (AI for Geographic Problems)

* * * * *

## Session 3A: Introduction to Geospatial Data

### What Makes Data "Geospatial"?

Geospatial data is any information that includes a location component. Every piece of geospatial data answers the question: **"Where?"**

Examples of geospatial data:
- 📍 Where was this photo taken?
- 🌡️ Where was this temperature measured?
- 🦌 Where was this animal spotted?
- 🔥 Where did this wildfire occur?
- 🏔️ Where is this hiking trail?

**Why does location matter?**
- Environmental patterns change with geography
- Cultural and ecological knowledge is place-based
- Conservation decisions depend on spatial context
- Climate and weather vary by location
- Human activities have geographic patterns

In [None]:
# Let's start by importing the tools we'll need
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("🌍 Welcome to Geospatial Data Science!")
print("📊 Ready to explore the Black Hills through data...")

### Understanding Coordinates: The Language of Location

Every location on Earth can be described using two numbers:
- **Latitude**: How far north or south (0° at equator, -90° to +90°)
- **Longitude**: How far east or west (0° at Prime Meridian, -180° to +180°)

For the Black Hills region:
- Latitude: approximately 43° to 45° North
- Longitude: approximately -103° to -104° West

In [None]:
# Famous Black Hills locations with their coordinates
black_hills_landmarks = {
    "Mount Rushmore": {"lat": 43.8791, "lon": -103.4591, "type": "Monument"},
    "Crazy Horse Memorial": {"lat": 43.6412, "lon": -103.5403, "type": "Monument"},
    "Harney Peak (Black Elk Peak)": {"lat": 43.8658, "lon": -103.5324, "type": "Summit"},
    "Deadwood": {"lat": 44.3766, "lon": -103.7291, "type": "Historic Town"},
    "Custer State Park": {"lat": 43.7311, "lon": -103.4164, "type": "State Park"},
    "Spearfish Canyon": {"lat": 44.4873, "lon": -103.7782, "type": "Canyon"},
    "Wind Cave National Park": {"lat": 43.5578, "lon": -103.4067, "type": "National Park"},
    "Jewel Cave National Monument": {"lat": 43.7288, "lon": -103.8284, "type": "National Monument"}
}

# Convert to a DataFrame for easier analysis
landmarks_df = pd.DataFrame.from_dict(black_hills_landmarks, orient='index')
landmarks_df.index.name = 'Location'
landmarks_df = landmarks_df.reset_index()

print("🏔️ BLACK HILLS LANDMARKS")
print("=" * 50)
print(landmarks_df)

# Calculate the center point of our region
center_lat = landmarks_df['lat'].mean()
center_lon = landmarks_df['lon'].mean()
print(f"\n📍 Geographic center of our data: {center_lat:.4f}°N, {center_lon:.4f}°W")

### Your First Geospatial Visualization

Let's create our first map showing Black Hills landmarks:

In [None]:
# Create a simple map of our landmarks
plt.figure(figsize=(12, 8))

# Plot different types of locations with different colors
location_colors = {
    'Monument': 'red',
    'Summit': 'brown', 
    'Historic Town': 'blue',
    'State Park': 'green',
    'Canyon': 'orange',
    'National Park': 'purple',
    'National Monument': 'pink'
}

# Plot each location type
for location_type in landmarks_df['type'].unique():
    subset = landmarks_df[landmarks_df['type'] == location_type]
    plt.scatter(subset['lon'], subset['lat'], 
               c=location_colors[location_type], 
               label=location_type, 
               s=100, alpha=0.7)

# Add labels for each point
for idx, row in landmarks_df.iterrows():
    plt.annotate(row['Location'], 
                (row['lon'], row['lat']), 
                xytext=(5, 5), textcoords='offset points',
                fontsize=8, alpha=0.8)

# Customize the map
plt.xlabel('Longitude (°W)', fontsize=12)
plt.ylabel('Latitude (°N)', fontsize=12)
plt.title('🏔️ Black Hills Landmarks', fontsize=16, fontweight='bold')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)

# Make the map look more geographic
plt.gca().set_aspect('equal', adjustable='box')
plt.tight_layout()
plt.show()

print("🎯 What patterns do you notice in the data?")
print("💭 How are different types of locations distributed?")

### Working with Real Environmental Data

Now let's work with some simulated environmental monitoring data that represents the types of information scientists collect in the Black Hills:

In [None]:
# Generate realistic environmental monitoring data for the Black Hills
np.random.seed(42)  # For reproducible results

# Create monitoring stations throughout the Black Hills
n_stations = 25
monitoring_data = []

for i in range(n_stations):
    # Random locations within Black Hills region
    lat = np.random.uniform(43.5, 44.5)
    lon = np.random.uniform(-104.0, -103.0)
    
    # Elevation affects temperature (higher = cooler)
    elevation = np.random.uniform(3000, 7000)  # feet above sea level
    
    # Temperature varies with elevation and season
    base_temp = 65 - (elevation - 3000) * 0.003  # Lapse rate effect
    summer_temp = base_temp + np.random.normal(0, 5)
    winter_temp = base_temp - 30 + np.random.normal(0, 8)
    
    # Precipitation varies with location and elevation
    annual_precip = 15 + (elevation - 3000) * 0.002 + np.random.normal(0, 3)
    
    # Air quality (PM2.5 levels) - generally good in this region
    air_quality = np.random.uniform(5, 25)  # μg/m³
    
    # Wildlife sightings (simulation)
    wildlife_count = np.random.poisson(8)
    
    monitoring_data.append({
        'station_id': f'BH_{i+1:03d}',
        'latitude': lat,
        'longitude': lon,
        'elevation_ft': elevation,
        'summer_temp_f': summer_temp,
        'winter_temp_f': winter_temp,
        'annual_precip_in': max(0, annual_precip),
        'air_quality_pm25': air_quality,
        'wildlife_sightings': wildlife_count
    })

# Convert to DataFrame
env_data = pd.DataFrame(monitoring_data)

print("🌲 ENVIRONMENTAL MONITORING DATA")
print("=" * 40)
print(f"📊 Data shape: {env_data.shape[0]} monitoring stations, {env_data.shape[1]} variables")
print("\n📋 First 5 stations:")
print(env_data.head().round(2))

print("\n📈 Data Summary:")
print(env_data.describe().round(2))

### Save our data for Session 3B

Let's save our work so we can continue building on it:

In [None]:
# Save our datasets for use in the next session
env_data.to_csv('black_hills_environmental_data.csv', index=False)
landmarks_df.to_csv('black_hills_landmarks.csv', index=False)

print("💾 Data saved successfully!")
print("\n📁 Files created:")
print("• black_hills_environmental_data.csv - Environmental monitoring data")
print("• black_hills_landmarks.csv - Black Hills landmarks and locations")

print("\n🎯 What you've learned in Session 3A:")
print("✅ What makes data 'geospatial' and why location matters")
print("✅ How to work with latitude and longitude coordinates")
print("✅ Creating your first geographic visualizations")
print("✅ Understanding environmental data patterns")
print("✅ Connecting programming to environmental science")

print("\n🔮 Coming up in Session 3B: AI for Geographic Problems")
print("• Building machine learning models with geographic data")
print("• Predicting environmental patterns")
print("• Creating tools for conservation planning")
print("• Ethics of geospatial AI")

print("\n🌟 Great work! You're thinking like a geospatial data scientist! 🗺️")