In [3]:
# === Rainfall × CropType Interaction Regression ===
import pandas as pd
import statsmodels.formula.api as smf

# Load data
df = pd.read_csv('enhanced_canadian_farm_production_dataset.csv')

# Keep relevant columns and drop missing values
df_sub = df[['average_yield_kg_per_hectare', 'avg_rainfall_mm', 'crop_type']].dropna()

# Fit linear regression with interaction term
# Base: overall effect of rainfall + adjustments for each crop
model = smf.ols('average_yield_kg_per_hectare ~ avg_rainfall_mm * C(crop_type)', data=df_sub).fit()

# Display regression summary
print(model.summary())

# Extract coefficients related to rainfall (main + interactions)
rainfall_effects = model.params.filter(like='avg_rainfall_mm')

# Create a readable table
rainfall_effects = rainfall_effects.reset_index()
rainfall_effects.columns = ['term', 'coef']
rainfall_effects['crop_type'] = rainfall_effects['term'].apply(
    lambda x: x.split(':')[-1].replace('C(crop_type)[T.', '').replace(']', '') if ':' in x else 'Baseline'
)

# Sort by coefficient (sensitivity to rainfall)
rainfall_effects = rainfall_effects.sort_values('coef', ascending=False)

print("\n=== Estimated sensitivity of yield to rainfall by crop ===")
# display(rainfall_effects[['crop_type', 'coef']].round(3))


                                 OLS Regression Results                                 
Dep. Variable:     average_yield_kg_per_hectare   R-squared:                       0.804
Model:                                      OLS   Adj. R-squared:                  0.804
Method:                           Least Squares   F-statistic:                 2.154e+05
Date:                          Sun, 26 Oct 2025   Prob (F-statistic):               0.00
Time:                                  18:01:53   Log-Likelihood:            -7.5963e+06
No. Observations:                       1000000   AIC:                         1.519e+07
Df Residuals:                            999980   BIC:                         1.519e+07
Df Model:                                    19                                         
Covariance Type:                      nonrobust                                         
                                                  coef    std err          t      P>|t|      [0.025      0.975

## Methodology: Interaction-Based Linear Regression for Rainfall Resilience

### Objective
This analysis examines whether certain crop types demonstrate greater **resilience** (yield stability) under conditions of **reduced rainfall**.

---

### Method Overview
A **linear regression with interaction terms** was applied between **rainfall (continuous variable)** and **crop type (categorical variable)**:

$$
\text{Yield} = β_0 + β_1 \times \text{Rainfall} + β_2 \times \text{CropType} + β_3 \times (\text{Rainfall} \times \text{CropType}) + ε
$$

- The main rainfall coefficient (`β₁`) reflects the **average sensitivity of yield to rainfall** across all crops.  
- Each interaction term (`Rainfall × CropType`) indicates how the response of a specific crop differs from the baseline.  
- Positive coefficients → yield increases strongly with rainfall (low drought resilience).  
- Negative coefficients → yield remains stable or decreases less with reduced rainfall (high resilience).

---

### Results Interpretation

| Crop Type | Coefficient (ΔYield / ΔRainfall) | Interpretation |
|------------|----------------------------------|----------------|
| **Canola** | −0.47 | Very stable under low rainfall (high resilience) |
| **Rye** | −0.46 | Highly drought-resistant |
| **Oats** | −0.33 | Moderately resilient |
| **Wheat** | +0.34 | Mild rainfall dependence |
| **Soybeans** | +0.45 | Moderate sensitivity |
| **Tomatoes** | +0.69 | Fairly sensitive to rainfall change |
| **Potatoes** | +0.97 | Sensitive—yield drops with less rain |
| **Sugar Beets** | +1.02 | Strong rainfall dependency |
| **Corn** | +1.64 | Most sensitive; yield rises sharply with more rain |

**Baseline effect:**  
Rainfall baseline coefficient = **+1.65**, indicating that, on average, an additional 1 mm of rainfall increases yield by approximately 1.65 kg/ha for the reference crop (likely *Barley*).

---

### Insights
- **Rye, Canola, and Oats** exhibit the strongest early signs of **resilience** to lower precipitation.  
- **Corn, Potatoes, and Sugar Beets** are the most **rainfall-dependent**, suggesting the need for irrigation or moisture-preserving practices.  
- The interaction regression quantitatively captures yield sensitivity differences across species and helps identify crops that maintain stable output under climatic stress.

---

**Summary:**  
> “Interaction regression reveals distinct rainfall–yield sensitivities across crops, providing early indicators of drought resilience within a unified statistical framework.”

