# Session 3B: AI for Geographic Problems

Welcome to Session 3B! Now that you understand geospatial data, let's use artificial intelligence to solve real geographic problems. We'll build machine learning models that can predict environmental patterns and help with conservation planning.

**Learning Objectives:**
- Build your first machine learning model using geographic data
- Predict environmental patterns with AI
- Create practical tools for conservation and tourism planning
- Understand the ethics of geospatial AI
- Connect AI to cultural values and environmental stewardship

**Time**: 90 minutes

* * * * *

## Introduction to AI for Environmental Science

Artificial Intelligence can help us:
- **Predict** where wildlife might be found
- **Identify** areas at risk for environmental problems
- **Plan** conservation efforts more effectively
- **Monitor** environmental changes over time
- **Optimize** visitor experiences while protecting nature

Today we'll build models that predict environmental conditions in the Black Hills using the data we collected in Session 3A.

In [None]:
# Import our tools for machine learning
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score, r2_score
import warnings
warnings.filterwarnings('ignore')

print("🤖 Welcome to Geospatial AI!")
print("🌲 Ready to build environmental prediction models...")

## Loading Our Black Hills Data

Let's load the environmental data we created in Session 3A:

In [None]:
# Load our environmental monitoring data
try:
    env_data = pd.read_csv('black_hills_environmental_data.csv')
    landmarks_df = pd.read_csv('black_hills_landmarks.csv')
    print("✅ Data loaded successfully!")
except FileNotFoundError:
    print("❌ Data files not found. Please run Session 3A first.")
    # Create sample data for demonstration
    np.random.seed(42)
    n_stations = 25
    
    monitoring_data = []
    for i in range(n_stations):
        lat = np.random.uniform(43.5, 44.5)
        lon = np.random.uniform(-104.0, -103.0)
        elevation = np.random.uniform(3000, 7000)
        base_temp = 65 - (elevation - 3000) * 0.003
        summer_temp = base_temp + np.random.normal(0, 5)
        winter_temp = base_temp - 30 + np.random.normal(0, 8)
        annual_precip = 15 + (elevation - 3000) * 0.002 + np.random.normal(0, 3)
        air_quality = np.random.uniform(5, 25)
        wildlife_count = np.random.poisson(8)
        
        monitoring_data.append({
            'station_id': f'BH_{i+1:03d}',
            'latitude': lat,
            'longitude': lon,
            'elevation_ft': elevation,
            'summer_temp_f': summer_temp,
            'winter_temp_f': winter_temp,
            'annual_precip_in': max(0, annual_precip),
            'air_quality_pm25': air_quality,
            'wildlife_sightings': wildlife_count
        })
    
    env_data = pd.DataFrame(monitoring_data)
    print("✅ Sample data created for demonstration.")

print(f"\n📊 Environmental Data: {env_data.shape[0]} stations, {env_data.shape[1]} variables")
print(env_data.head())

## Project 1: Predicting Wildlife Sightings with AI

Let's build a machine learning model that predicts where wildlife is most likely to be spotted based on environmental conditions.

In [None]:
# Prepare data for machine learning
print("🦌 WILDLIFE PREDICTION PROJECT")
print("=" * 35)

# Features: environmental conditions that might affect wildlife
feature_columns = ['latitude', 'longitude', 'elevation_ft', 
                  'summer_temp_f', 'annual_precip_in', 'air_quality_pm25']

# Target: what we want to predict
target_column = 'wildlife_sightings'

# Extract features and target
X = env_data[feature_columns]
y = env_data[target_column]

print(f"Features (X): {X.shape}")
print(f"Target (y): {y.shape}")
print(f"\nWe're predicting: {target_column}")
print(f"Using these environmental factors: {feature_columns}")

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f"\n📚 Training data: {X_train.shape[0]} stations")
print(f"🧪 Testing data: {X_test.shape[0]} stations")

In [None]:
# Build and train our wildlife prediction model
print("🤖 Training AI model...")

# Create a Random Forest model (ensemble of decision trees)
wildlife_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
wildlife_model.fit(X_train, y_train)

# Make predictions
y_pred = wildlife_model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("✅ Model trained successfully!")
print(f"\n📈 Model Performance:")
print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.3f} (closer to 1.0 is better)")

# Show feature importance
feature_importance = pd.DataFrame({
    'feature': feature_columns,
    'importance': wildlife_model.feature_importances_
}).sort_values('importance', ascending=False)

print(f"\n🎯 Most Important Factors for Wildlife Sightings:")
for idx, row in feature_importance.iterrows():
    print(f"  {row['feature']}: {row['importance']:.3f}")

In [None]:
# Visualize our model's predictions
plt.figure(figsize=(12, 5))

# Plot 1: Actual vs Predicted
plt.subplot(1, 2, 1)
plt.scatter(y_test, y_pred, alpha=0.7)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual Wildlife Sightings')
plt.ylabel('Predicted Wildlife Sightings')
plt.title('🎯 Model Accuracy: Actual vs Predicted')
plt.grid(True, alpha=0.3)

# Plot 2: Feature Importance
plt.subplot(1, 2, 2)
plt.barh(feature_importance['feature'], feature_importance['importance'])
plt.xlabel('Importance Score')
plt.title('🔍 Feature Importance')
plt.tight_layout()

plt.show()

print("💡 Interpretation:")
print("• Points close to the red line = good predictions")
print("• Feature importance shows which factors matter most")
print("• Higher importance = stronger influence on wildlife sightings")

## Project 2: Creating a Wildlife Hotspot Prediction Tool

Let's use our trained model to create a practical tool for predicting wildlife activity:

In [None]:
def predict_wildlife_activity(latitude, longitude, elevation, temperature, precipitation, air_quality):
    """
    Predict wildlife sighting probability for a given location and conditions.
    """
    # Create input data in the same format as training data
    input_data = pd.DataFrame({
        'latitude': [latitude],
        'longitude': [longitude],
        'elevation_ft': [elevation],
        'summer_temp_f': [temperature],
        'annual_precip_in': [precipitation],
        'air_quality_pm25': [air_quality]
    })
    
    # Make prediction
    prediction = wildlife_model.predict(input_data)[0]
    
    # Convert to probability category
    if prediction >= 10:
        category = "Very High"
        emoji = "🦌🦌🦌"
    elif prediction >= 8:
        category = "High"
        emoji = "🦌🦌"
    elif prediction >= 6:
        category = "Moderate"
        emoji = "🦌"
    else:
        category = "Low"
        emoji = "👀"
    
    return prediction, category, emoji

# Test our prediction tool with different Black Hills locations
print("🔮 WILDLIFE HOTSPOT PREDICTIONS")
print("=" * 40)

test_locations = [
    {"name": "Near Mount Rushmore", "lat": 43.88, "lon": -103.46, "elev": 5700, "temp": 72, "precip": 18, "air": 12},
    {"name": "Custer State Park", "lat": 43.73, "lon": -103.42, "elev": 4200, "temp": 75, "precip": 16, "air": 10},
    {"name": "Higher Elevation Area", "lat": 43.87, "lon": -103.53, "elev": 6800, "temp": 68, "precip": 22, "air": 8},
    {"name": "Lower Valley Location", "lat": 44.10, "lon": -103.65, "elev": 3500, "temp": 78, "precip": 14, "air": 15}
]

for location in test_locations:
    prediction, category, emoji = predict_wildlife_activity(
        location["lat"], location["lon"], location["elev"], 
        location["temp"], location["precip"], location["air"]
    )
    
    print(f"\n📍 {location['name']}")
    print(f"   Coordinates: {location['lat']}, {location['lon']}")
    print(f"   Elevation: {location['elev']} ft")
    print(f"   🎯 Predicted sightings: {prediction:.1f}")
    print(f"   📊 Activity level: {category} {emoji}")

print("\n💡 This tool could help:")
print("• Wildlife photographers plan their trips")
print("• Park rangers focus monitoring efforts")
print("• Tourists increase their chances of wildlife viewing")
print("• Researchers identify high-priority study areas")

## Project 3: Environmental Risk Classification

Let's build a classification model that identifies areas at environmental risk:

In [None]:
# Create environmental risk categories based on multiple factors
def categorize_environmental_risk(row):
    """
    Classify environmental risk based on temperature, air quality, and other factors.
    """
    risk_score = 0
    
    # Temperature risk (extreme heat)
    if row['summer_temp_f'] > 85:
        risk_score += 2
    elif row['summer_temp_f'] > 80:
        risk_score += 1
    
    # Air quality risk
    if row['air_quality_pm25'] > 20:
        risk_score += 2
    elif row['air_quality_pm25'] > 15:
        risk_score += 1
    
    # Precipitation risk (drought conditions)
    if row['annual_precip_in'] < 12:
        risk_score += 2
    elif row['annual_precip_in'] < 15:
        risk_score += 1
    
    # Classify based on total risk score
    if risk_score >= 4:
        return 'High Risk'
    elif risk_score >= 2:
        return 'Moderate Risk'
    else:
        return 'Low Risk'

# Apply risk categorization
env_data['risk_category'] = env_data.apply(categorize_environmental_risk, axis=1)

print("🚨 ENVIRONMENTAL RISK ANALYSIS")
print("=" * 35)

risk_counts = env_data['risk_category'].value_counts()
print("Risk Distribution:")
for risk_level, count in risk_counts.items():
    percentage = (count / len(env_data)) * 100
    print(f"  {risk_level}: {count} stations ({percentage:.1f}%)")

# Build classification model
print("\n🤖 Training Risk Classification Model...")

# Prepare features and target for classification
X_risk = env_data[feature_columns]
y_risk = env_data['risk_category']

# Split data
X_train_risk, X_test_risk, y_train_risk, y_test_risk = train_test_split(
    X_risk, y_risk, test_size=0.3, random_state=42
)

# Train classifier
risk_model = RandomForestClassifier(n_estimators=100, random_state=42)
risk_model.fit(X_train_risk, y_train_risk)

# Make predictions
y_pred_risk = risk_model.predict(X_test_risk)

# Evaluate
accuracy = accuracy_score(y_test_risk, y_pred_risk)
print(f"✅ Classification Accuracy: {accuracy:.3f} ({accuracy*100:.1f}%)")

In [None]:
# Visualize environmental risk across the Black Hills region
plt.figure(figsize=(12, 8))

# Color code by risk level
risk_colors = {'High Risk': 'red', 'Moderate Risk': 'orange', 'Low Risk': 'green'}

for risk_level in env_data['risk_category'].unique():
    subset = env_data[env_data['risk_category'] == risk_level]
    plt.scatter(subset['longitude'], subset['latitude'], 
               c=risk_colors[risk_level], label=risk_level, 
               s=100, alpha=0.7, edgecolors='black', linewidth=0.5)

plt.xlabel('Longitude', fontsize=12)
plt.ylabel('Latitude', fontsize=12)
plt.title('🗺️ Environmental Risk Assessment Map', fontsize=16, fontweight='bold')
plt.legend(title='Risk Level')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("🎯 Applications of Risk Classification:")
print("• Prioritize environmental monitoring efforts")
print("• Plan conservation interventions")
print("• Alert visitors to potential air quality issues")
print("• Guide sustainable tourism development")
print("• Support climate adaptation planning")

## Project 4: Creating a Conservation Planning Tool

Let's combine our models to create a comprehensive tool for conservation planning:

In [None]:
def conservation_assessment(latitude, longitude, elevation, temperature, precipitation, air_quality):
    """
    Comprehensive conservation assessment for a location.
    """
    print(f"🌲 CONSERVATION ASSESSMENT")
    print(f"📍 Location: {latitude:.4f}°N, {longitude:.4f}°W")
    print(f"🏔️ Elevation: {elevation} ft")
    print("=" * 50)
    
    # Wildlife prediction
    wildlife_pred, wildlife_cat, wildlife_emoji = predict_wildlife_activity(
        latitude, longitude, elevation, temperature, precipitation, air_quality
    )
    
    print(f"🦌 Wildlife Activity: {wildlife_cat} {wildlife_emoji}")
    print(f"   Expected sightings: {wildlife_pred:.1f}")
    
    # Environmental risk
    risk_input = pd.DataFrame({
        'latitude': [latitude],
        'longitude': [longitude],
        'elevation_ft': [elevation],
        'summer_temp_f': [temperature],
        'annual_precip_in': [precipitation],
        'air_quality_pm25': [air_quality]
    })
    
    risk_prediction = risk_model.predict(risk_input)[0]
    risk_emoji = {'High Risk': '🚨', 'Moderate Risk': '⚠️', 'Low Risk': '✅'}[risk_prediction]
    
    print(f"🌍 Environmental Risk: {risk_prediction} {risk_emoji}")
    
    # Conservation recommendations
    print(f"\n💡 Conservation Recommendations:")
    
    if wildlife_pred >= 8 and risk_prediction == 'Low Risk':
        print("   🌟 PRIORITY CONSERVATION AREA")
        print("   • High wildlife value with low environmental risk")
        print("   • Ideal for wildlife viewing programs")
        print("   • Consider habitat enhancement projects")
    
    elif wildlife_pred >= 8 and risk_prediction != 'Low Risk':
        print("   🛡️ PROTECTION NEEDED")
        print("   • High wildlife value but environmental concerns")
        print("   • Implement protective measures")
        print("   • Monitor environmental conditions closely")
    
    elif wildlife_pred < 6 and risk_prediction == 'Low Risk':
        print("   🌱 RESTORATION POTENTIAL")
        print("   • Good environmental conditions")
        print("   • Consider habitat restoration to increase wildlife")
        print("   • Suitable for visitor facilities")
    
    else:
        print("   📊 BASELINE MONITORING")
        print("   • Establish monitoring protocols")
        print("   • Assess restoration potential")
        print("   • Consider environmental mitigation")
    
    print("\n" + "=" * 50)

# Test the conservation tool with different scenarios
print("🔍 CONSERVATION PLANNING SCENARIOS")
print("\n")

# Scenario 1: High elevation, pristine area
conservation_assessment(43.87, -103.53, 6500, 68, 20, 8)

print("\n")

# Scenario 2: Lower elevation, moderate conditions
conservation_assessment(43.75, -103.45, 4200, 76, 16, 14)

print("\n")

# Scenario 3: Valley location with concerns
conservation_assessment(44.15, -103.70, 3800, 82, 12, 22)

## Ethics and Cultural Considerations in Geospatial AI

As we build these powerful tools, we must consider their ethical implications and cultural sensitivity:

### Important Ethical Considerations:

**🏛️ Indigenous Data Sovereignty**
- The Black Hills (Paha Sapa) are sacred to the Lakota people
- Traditional ecological knowledge should be respected and protected
- Indigenous communities should control data about their lands
- AI models should incorporate traditional perspectives on land stewardship

**🌍 Environmental Justice**
- Ensure AI tools don't discriminate against certain communities
- Consider who benefits from environmental predictions
- Protect vulnerable ecosystems and communities
- Balance tourism development with conservation

**🔒 Privacy and Security**
- Protect sensitive location data
- Consider security implications of environmental monitoring
- Respect private land rights
- Ensure data is used for beneficial purposes

**📊 Model Limitations**
- AI models are simplified representations of complex systems
- Predictions should inform, not replace, human judgment
- Models may not capture all cultural and ecological factors
- Continuous monitoring and model updates are essential

**🤝 Community Engagement**
- Include local communities in AI development
- Validate models with traditional knowledge holders
- Ensure benefits reach the communities being studied
- Maintain transparency about model capabilities and limitations

## Reflection and Next Steps

**What you've accomplished today:**
- ✅ Built machine learning models with real geographic data
- ✅ Created tools for wildlife prediction and environmental monitoring
- ✅ Developed a comprehensive conservation planning system
- ✅ Considered ethical implications of geospatial AI
- ✅ Connected technology to environmental stewardship values

**Key takeaways:**
- AI can help us understand and protect natural environments
- Geographic data adds crucial context to environmental analysis
- Machine learning models can predict patterns and identify risks
- Technology tools must be developed with cultural sensitivity
- Conservation requires balancing multiple factors and perspectives

In [None]:
# Final project summary
print("🎯 YOUR GEOSPATIAL AI TOOLKIT")
print("=" * 35)
print("Models you've built:")
print("🦌 Wildlife Prediction Model - Predicts animal activity based on environmental conditions")
print("🚨 Environmental Risk Classifier - Identifies areas needing protection or monitoring")
print("🌲 Conservation Planning Tool - Combines multiple factors for land management decisions")

print("\n🛠️ Skills you've developed:")
skills = [
    "Working with geographic coordinates and spatial data",
    "Training machine learning models for prediction",
    "Evaluating model performance and interpreting results",
    "Creating practical tools from AI models",
    "Visualizing geographic patterns and predictions",
    "Considering ethical implications of AI applications",
    "Connecting technology to environmental conservation"
]

for i, skill in enumerate(skills, 1):
    print(f"   {i}. {skill}")

print("\n🚀 Ways to extend your work:")
extensions = [
    "Add real-time weather data integration",
    "Include seasonal variations in predictions",
    "Build web applications for public use",
    "Collaborate with local conservation organizations",
    "Incorporate traditional ecological knowledge",
    "Develop mobile apps for field researchers",
    "Create educational tools for environmental awareness"
]

for extension in extensions:
    print(f"• {extension}")

print("\n🌟 Congratulations! You're now a geospatial AI practitioner! 🗺️🤖")
print("You've learned to use technology in service of environmental protection and cultural values.")