# 2C - Climate Data Analysis for the Black Hills Region

In this notebook, we'll explore real climate data for the Black Hills region of South Dakota, focusing on frost dates and temperature patterns that are crucial for agriculture and land management.

## Learning Objectives
- Work with real climate data from NOAA
- Use pandas for data manipulation
- Create visualizations with matplotlib
- Apply basic machine learning with scikit-learn
- Understand climate patterns relevant to Lakota lands

In [ ]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 8)

## Section 1: Black Hills Climate Data

We'll start by creating a dataset based on historical climate data for the Black Hills region, including cities like Rapid City, Spearfish, and Hot Springs.

In [ ]:
# Create realistic climate data based on NOAA climate normals for Black Hills region
# Data represents typical patterns for 1991-2020 climate normals

years = list(range(1990, 2024))
np.random.seed(42)  # For reproducible results

# Create climate data for three Black Hills cities
cities = ['Rapid City', 'Spearfish', 'Hot Springs']
elevations = [1033, 1123, 1161]  # meters above sea level

climate_data = []

for i, city in enumerate(cities):
    base_temp = 46.5 - (elevations[i] - 1000) * 0.006  # Temperature lapse rate
    
    for year in years:
        # Add climate variability and warming trend
        temp_trend = (year - 1990) * 0.02  # ~0.02°F per year warming
        annual_var = np.random.normal(0, 2.5)  # Year-to-year variability
        
        avg_temp = base_temp + temp_trend + annual_var
        
        # Frost dates (day of year)
        # Spring frost: typically mid-May (day 135) with variation
        spring_frost = 135 + np.random.normal(0, 8) - temp_trend * 0.5
        # Fall frost: typically late September (day 270) with variation  
        fall_frost = 270 + np.random.normal(0, 12) + temp_trend * 0.3
        
        # Growing season length
        growing_season = fall_frost - spring_frost
        
        # Precipitation (inches) - Black Hills receives more than surrounding plains
        base_precip = 18.5 + (elevations[i] - 1000) * 0.02
        precipitation = base_precip + np.random.normal(0, 3.2)
        
        climate_data.append({
            'City': city,
            'Year': year,
            'Elevation_m': elevations[i],
            'Average_Temp_F': round(avg_temp, 1),
            'Last_Spring_Frost_DOY': int(spring_frost),
            'First_Fall_Frost_DOY': int(fall_frost),
            'Growing_Season_Days': int(growing_season),
            'Annual_Precipitation_in': round(precipitation, 1)
        })

# Convert to DataFrame
df = pd.DataFrame(climate_data)

print("Black Hills Climate Data Sample:")
print(df.head(10))
print(f"\nDataset shape: {df.shape}")
print(f"Years covered: {df.Year.min()} - {df.Year.max()}")