# First Analysis: Linking Climate to Yield (1981 Test Case)

**Goal:** To perform a proof-of-concept analysis by linking our newly downloaded climate data with the corresponding crop yield data for the year 1981.

**Methodology:**
1.  Load the 1981 maize yield data.
2.  Load the January 1981 temperature data.
3.  Process the hourly temperature data into a single metric for the month (average temperature).
4.  Align the high-resolution climate grid to the lower-resolution yield grid.
5.  Create a scatter plot to visualize the relationship between temperature and yield for each grid cell.

In [None]:
# Cell 1: Load Data
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# --- Load Maize Yield Data for 1981 ---
YIELD_PATH = '../data/maize/yield_1981.nc4'
ds_yield = xr.open_dataset(YIELD_PATH)
print("Yield data for 1981 loaded.")

# --- Load Climate Data for Jan 1981 ---
CLIMATE_PATH = '../data/climate_raw/api_test/era5_land_usa_1981_01_temp_robust.grib'
# Use the cfgrib engine, which we know works
ds_climate = xr.open_dataset(CLIMATE_PATH, engine='cfgrib')
print("Climate data for Jan 1981 loaded.")

## Data Processing and Alignment

We need to make the two datasets compatible. This involves calculating a single temperature metric for the month and then resampling the climate grid to match the yield grid.

In [None]:
# Cell 2: Process and Align

# 1. Process Climate Data: Calculate the mean temperature for the month and convert to Celsius
# The climate data has a 'valid_time' dimension for the hours of the month. We average over it.
# We subtract 273.15 to convert from Kelvin to Celsius.
avg_temp_jan = ds_climate['t2m'].mean(dim='time') - 273.15

# 2. Align Grids: Resample the high-res temperature map to the low-res yield grid
print("Aligning climate grid to yield grid...")
avg_temp_aligned = avg_temp_jan.interp_like(ds_yield)

# 3. Combine into a single dataset
# We'll rename the yield variable for clarity
analysis_ds = xr.Dataset({
    'yield': ds_yield['var'],
    'avg_temp': avg_temp_aligned
})

print("\nData processed and aligned successfully.")
print(analysis_ds)

## First Scatter Plot: Yield vs. Temperature

Now we can plot the yield of each grid cell against the average January temperature of that same grid cell. This is the first step toward building a vulnerability curve.

In [None]:
# Cell 3: Create the Scatter Plot

# Convert the xarray Dataset to a pandas DataFrame, which is ideal for this kind of plotting.
# This turns our 2D maps into a list of paired (temperature, yield) values.
df = analysis_ds.to_dataframe()

# Drop any rows that have missing data (e.g., ocean grid cells)
df_clean = df.dropna()

print(f"Plotting {len(df_clean)} data points...")

# Create the scatter plot using seaborn for a nice look
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df_clean, x='avg_temp', y='yield', alpha=0.5, s=10) # s=10 makes points smaller

plt.title('Maize Yield vs. Average January Temperature (US Midwest, 1981)')
plt.xlabel('Average January Temperature (Â°C)')
plt.ylabel('Maize Yield (tonnes per hectare)')
plt.grid(True)
plt.show()