# **Hypothesis Validation**

## Objectives

This hypothesis analysis aims to explore critical climate relationships by testing three evidence-based assumptions. The primary objective is to evaluate whether these hypotheses hold true or can be rejected from a statistical standpoint using empirical data supported by visualisation. 
Specifically, the analysis investigates whether **higher renewable energy adoption and improved energy efficiency are associated with lower CO₂ emissions per capita**, and whether **a tipping point exists around 30% renewable energy share beyond which emission reductions accelerate**. These insights contribute to a clearer understanding of decarbonisation pathways and support data-informed climate policy decisions.



---

In [4]:
# import libraries
import numpy as np
import pandas as pd 
import seaborn as sns 
import plotly.express as px 
import matplotlib.pyplot as plt 
import statsmodels.api as sm
import scipy.stats as stats
from statsmodels.formula.api import ols
from scipy.stats import spearmanr

In [5]:
# load the dataset
df= pd.read_csv('../data/cleaned/enhanced_energy_features.csv')
df.head()

Unnamed: 0,country,year,elec_access_pct,clean_fuels_access_pct,renew_cap_kw_pc,climate_finance_usd,renewables_share_pct,fossil_elec_twh,nuclear_elec_twh,renew_elec_twh,...,region_miss,subregion_miss,co2_per_capita_t,log_co2_per_capita_t,log_renewables_share_pct,log_energy_intensity_mj_usd,log_gdp_pc_usd,above_30_pct,year_offset,renewables_3yr_avg
0,Afghanistan,2000,1.613591,6.2,9.22,20000.0,44.99,0.16,0.0,0.31,...,0,0,0.037754,0.037059,3.828424,0.970779,5.195324,1,0,44.99
1,Afghanistan,2001,4.074574,7.2,8.86,130000.0,45.6,0.09,0.0,0.5,...,0,0,0.035988,0.035356,3.841601,1.007958,5.195324,1,1,45.295
2,Afghanistan,2002,9.409158,8.2,8.47,3950000.0,37.83,0.13,0.0,0.56,...,0,0,0.04818,0.047055,3.659193,0.875469,5.195324,1,2,42.806667
3,Afghanistan,2003,14.738506,9.5,8.09,25970000.0,36.66,0.31,0.0,0.63,...,0,0,0.053666,0.052276,3.628599,0.875469,5.255847,1,3,40.03
4,Afghanistan,2004,20.064968,10.9,7.75,0.0,44.24,0.33,0.0,0.56,...,0,0,0.043717,0.042788,3.811982,0.788457,5.358387,1,4,39.576667


**Hypothesis 1: Using Spearman Correlation and OLS Regression**

This analysis tests whether higher renewable energy adoption is associated with lower CO₂ emissions per capita.  
Spearman correlation is used for robustness to non-normality, and OLS regression is applied for interpretability.

- **Null hypothesis (H₀):** There is no significant relationship between the share of renewables and CO₂ emissions per capita.  
- **Alternative hypothesis (H₁):** Countries with a higher share of renewables have significantly lower CO₂ emissions per capita.

In [None]:
# Validate H1: Higher renewables share is associated with lower CO₂ per capita

# Spearman correlation:
h1_data = df[['renewables_share_pct', 'co2_per_capita_t']].dropna()
corr, pval = spearmanr(h1_data['renewables_share_pct'], h1_data['co2_per_capita_t'])

# OLS Regression:
h1_model = ols('co2_per_capita_t ~ renewables_share_pct', data=h1_data).fit()
h1_summary = h1_model.summary() 

corr, pval, h1_summary



(-0.8050152741281514,
 0.0,
 <class 'statsmodels.iolib.summary.Summary'>
 """
                             OLS Regression Results                            
 Dep. Variable:       co2_per_capita_t   R-squared:                       0.319
 Model:                            OLS   Adj. R-squared:                  0.318
 Method:                 Least Squares   F-statistic:                     1596.
 Date:                Sat, 19 Jul 2025   Prob (F-statistic):          9.38e-287
 Time:                        10:11:57   Log-Likelihood:                -10279.
 No. Observations:                3417   AIC:                         2.056e+04
 Df Residuals:                    3415   BIC:                         2.057e+04
 Df Model:                           1                                         
 Covariance Type:            nonrobust                                         
                            coef    std err          t      P>|t|      [0.025      0.975]
 -------------------------------


#### **Result Interpretation**


**1. Result from Spearman Correlation :**

- **Spearman correlation (ρ):** –0.805  
- **p-value:** 0.0 (highly significant)


A Spearman correlation of –0.805 indicates a very strong negative monotonic relationship between renewables share and CO₂ emissions per capita.

- As the percentage of renewables increases, CO₂ per person tends to decrease.
- The p-value confirms this relationship is statistically significant (p < 0.001).



**2. Key Results from OLS Linear Regression Summary:**

| Metric                        | Value   |
|-------------------------------|---------|
| Intercept                     | 8.4363  |
| Coef. (Renewables Share)      | –0.1117 |
| p-value (coefficient)         | < 0.001 |
| R-squared                     | 0.319   |
| F-statistic                   | 1596    |
| No. of Observations           | 3417    |

<br>

- The coefficient for `renewables_share_pct` is **–0.1117**  
  This means that for **each 1% increase in renewable energy share**, **CO₂ per capita decreases by approximately 0.11 tonnes** on average.
- The **p-value is < 0.001**, so this result is statistically significant.
- The **R² = 0.319** indicates that **approximately 32%** of the variation in CO₂ emissions per capita is explained by renewables share alone, a strong result for a single-variable model.


**Conclusion for H1**

Both Spearman correlation and OLS regression strongly support the hypothesis:

**Countries with higher renewable energy shares tend to have significantly lower CO₂ emissions per capita.**

