### Modelling Renewable Energy Adoption and Its Impact on National CO₂ Emissions

**Joshua Quartey**\
DATA 605\
Lecture L01

#### SECTION A: INTRODUCTION
Global $CO_2$ emissions continue to drive climate change and pose urgent challenges for economies worldwide.  As countries invest in renewable energies like solar, wind, hydro, and biofuels, understanding the real-world impact of that shift on national emissions is critical for shaping effective policy and guiding private investment.  This project combines time-series data on the share of primary energy from renewables with annual $CO_2$ emissions, economic scale (GDP), population, and energy-intensity metrics to quantify how renewable adoption influences emissions trajectories.  By building both cross-sectional regressions and time-series forecasts, this project aims to explore the effect of renewable energy sources on emissions, offering actionable insights for policymakers and energy strategists committed to clean-energy.


##### Main Objective
To quantify the relationship between the share of renewable energy in a country’s primary energy mix and its total CO₂ emissions—controlling for economic size, population, and energy efficiency—and to build a predictive time-series model for future emission trends.

#### SECTION B: ANALYSIS QUESTIONS

##### Exploratory Data Analysis (EDA) Questions
1. Global Trends (2000–2023):\
	How have the share of renewables, CO₂ emissions, and energy intensity evolved globally since 2000?

2. Emissions Normalized by Scale:\
	How does CO₂ per unit GDP vary with renewable adoption?


#####  Statistical Analysis Questions & Hypotheses

- RQ1: Controlling for GDP and population, is a higher renewable share associated with lower total CO₂ emissions?\
   $H_0$: Renewable energy share has no effect on $CO_2$.\
   $H_a$: Renewable energy share has a negative effect on $CO_2$.

- RQ2: Does energy intensity change the strength of the renewables to emissions relationship?\
   $H_0$ There is no interaction between energy intensity and renewable energy share.\
   $H_a$: Energy intensity positively affects the emissions-reducing effect of renewables.

- RQ3: Is there a significant negative correlation between renewable share and $CO_2$ per capita?\
   $H_0$ : There is no significant linear relationship (Pearson $p = 0$)\
   $H_a$ : There is a significant linear relationship (Pearson $p = 0$)

- RQ4: How well does including renewable share as an exogenous variable in an ARIMA model perform in forecasting of annual $CO_2$ emissions?


#### SECTION C: DATA SOURCING AND JUSTIFICATION 
The following datasets were selected for this analysis because together they capture the key supply‐side, demand‐side, and contextual drivers of national CO₂ emissions.  The renewables share data measures the pace of clean‐energy adoption. The $CO_2$ emissions dataset provides the data on the outcome of interest. The GDP and population datasets provide the data on the two indicators that will be used to control for economic scale and demographic differences that influence energy demand. The energy intensity data reflects how efficiently each economy converts energy into output.  Merging these datasets makes it possible to isolate the net effect of renewables over time while accounting for size, growth, and efficiency factors that would otherwise confound our estimates.

| Dataset                                     | Key Fields                       | Years     | Why                                                 |
| ------------------------------------------- | -------------------------------- | --------- | --------------------------------------------------- |
| Our World in Data – Share of primary energy consumption that comes from renewables [link](https://ourworldindata.org/grapher/renewable-share-energy?tab=table&time=1980..latest#explore-the-data)  | Entity, Year, Renewables %       | 1965–2023 | Primary predictor—renewable adoption over time.     |
| Our World in Data – Annual CO₂ Emissions (tonnes) [link](https://ourworldindata.org/grapher/annual-co2-emissions-per-country?tab=table&time=1980..latest) | Entity, Year, CO₂ total          | 1949–2023 | Outcome to analyze and forecast.                    |
| World Bank – GDP (current US\$) [link](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD)            | Country, Year, GDP               | 1960–2023 | Controls for economic scale.                        |
| World Bank – Population (total) [link](https://data.worldbank.org/indicator/SP.POP.TOTL)            | Country, Year, Population        | 1970–2023 | Enables per-capita normalizations.                  |
| OWID – Intensity Level of Primary Energy(MJ/\$2017 PPP GDP) [link](https://data.worldbank.org/indicator/EG.EGY.PRIM.PP.KD?view=chart) | Entity, Year, Energy intensity   | 2000–2022 | Captures efficiency—key moderator of energy use.    |

The datasets were merges by country and year and filtered to the period of 2000 to 2022.

#### SECTION D: DATA CLEANING
The five datasets were downloaded from OWID and the World Bank and each was
- standardized. country names were formatted to be consistent (e.g.recoding values like“Bahamas, The” to “Bahamas,” “United States” to “United States of America”) and aggregates (e.g.“World,” “High-income countries,” regional groupings) were filtered out.  
- reshaped into country–year rows format and ensured  matching `(country, year)` keys across all sources.
- missing values in the time series were filled using linear interpolation (with forward- and backward-fill) for GDP, population, CO₂, and energy intensity. 

<figure style="text-align: center;">
	<img title="a title" alt="Alt text" src="data before cleaning co2.jpg">
	<figcaption><strong>Figure:</strong> CO2 emissions data pre-cleaning.</figcaption>
</figure>

<figure style="text-align: center;">
	<img title="a title" alt="Alt text" src="data before cleaning energy intensity.jpg">
	<figcaption><strong>Figure:</strong> energy intensity data pre-cleaning.</figcaption>
</figure>

<figure style="text-align: center;">
	<img title="a title" alt="Alt text" src="data before cleaning gdp.jpg">
	<figcaption><strong>Figure:</strong> gdp data pre-cleaning.</figcaption>
</figure>

<figure style="text-align: center;">
	<img title="a title" alt="Alt text" src="data before cleaning population.jpg">
	<figcaption><strong>Figure:</strong> population data pre-cleaning.</figcaption>
</figure>

Finally, all cleaned tables were merged into one and derived metrics ($CO_2$ per capita, $CO_2$ intensity) were calculated for exploratory analysis and modeling.


Transforming the data in this way was essential for ensuring consistentency and accuracy in the estimates and minimizing loss of data between datasets during the join. Metrics like $CO_2$ per capita, $CO_2$ intensity provided a way to account for population and economic scale differences.

<figure style="text-align: center;">
	<img title="a title" alt="Alt text" src="cleaned data 1.png">
	<figcaption><strong>Figure:</strong> Merged data after cleaning.</figcaption>
</figure>

<figure style="text-align: center;">
	<img title="a title" alt="Alt text" src="cleaned data 2.png">
	<figcaption><strong>Figure:</strong> Merged data after cleaning cont.</figcaption>
</figure>

<figure style="text-align: center;">
	<img title="a title" alt="Alt text" src="cleaned data 3.png">
	<figcaption><strong>Figure:</strong> Merged data after cleaning cont.</figcaption>
</figure>

#### SECTION E: VISUALIZATIONS AND FINDINGS 
##### Exploratory Data Analysis (EDA) Questions
1. Global Trends (2000–2023):
	How have the share of renewables, $CO_2$ emissions, and energy intensity evolved globally since 2000?\
	Since 2000, global $CO_2$ emissions have generally increased, peaking around 2022 despite some short-term dips. In contrast, global energy intensity (energy use per unit of GDP) has steadily declined, which couldbe due to improvements in energy efficiency. Meanwhile, the share of renewables in primary energy has shown a consistent upward trend, especially accelerating after 2010. These results suggest a global shift toward cleaner energy and more efficient energy use, although emissions remain high.
	<figure style="text-align: center;">
		<img title="a title" alt="Alt text" src="line Global Average Annual CO₂ Emissions.png">
	</figure>

	<figure style="text-align: center;">
		<img title="a title" alt="Alt text" src="line Global Average Energy Intensity.png">
	</figure>

	<figure style="text-align: center;">
		<img title="a title" alt="Alt text" src="line Global Average Renewables Share (% Primary Energy).png">
	</figure>


2. Emissions Normalized by Scale:
	How does $CO_2$ per unit GDP vary with renewable adoption?\
	While absolute $CO_2$ emissions have risen, the decreasing energy intensity and rising renewables share imply that emissions per unit of GDP may be stabilizing or improving. Increased renewable adoption seems to be correlated with lower $CO_2$ intensity, indicating progress in decarbonizing economic growth.
	<figure style="text-align: center;">
		<img title="a title" alt="Alt text" src="twinx Global CO2 Intensity vs Renewable Share.png">
	</figure>

#####  Statistical Analysis Questions & Hypotheses
- RQ1: Controlling for GDP and population, is a higher renewable share associated with lower total $CO_2$ emissions?\
   $H_0$: Renewable energy share has no effect on $CO_2$.\
   $H_a$: Renewable energy share has a negative effect on $CO_2$.

   Relationship Between Renewables and Emissions\
   The model explains 85.5% of the variation in annual $CO_2$ emissions across country–year observations ($R^2 = 0.855$).  Controlling for economic scale (GDP) and population, the estimated coefficient on renewables_pct is $−3.485 × 10^6 (p < 0.001)$.  This means a one percentagepoint increase in the share of primary energy from renewables is associated with a reduction of about 3.5 million tonnes of $CO_2$ emissions on average when holding GDP and population constant. Also the coefficients of the other predictors indicate that higher GDP and larger population each predict greater total emissions ($p < 0.001$).
   
   The high R² indicates that renewable share, GDP, and population are strong predictors of national $CO_2$ output.
   <figure style="text-align: center;">
   		<img title="a title" alt="Alt text" src="RQ relationship emissions renewables.jpg">
	</figure>

- RQ2: Does energy intensity change the strength of the renewables to emissions relationship?\
   $H_0$ There is no interaction between energy intensity and renewable energy share.\
   $H_a$: Energy intensity positively affects the emissions-reducing effect of renewables
   
   Interaction of Renewables and Energy Intensity\
   In this model, the overall fit improves slightly ($R^2 = 0.865$), and all key terms are statistically significant.  The main effect of renewables share is $–5.44 × 10^6 (p = 0.016)$, indicating that at the baseline energy‐intensity level, a one percentage point increase in renewables share is associated with 5.4 million fewer tonnes of $CO_2$.  The coefficient of energy intensity is $7.86 × 10^6 (p < 0.001)$ which confirms that more energy‐intensive economies emit substantially more $CO_2$.  Also, the positive interaction term(renewables_pct x energy_intensity = $1.11 × 10^6; p = 0.027$) indicates that the effect of reducing emissions that renewables have is reduced in economies with higher energy intensity.
   <figure style="text-align: center;">
   		<img title="a title" alt="Alt text" src="RQ relationship intensity renewables.jpg">
	</figure>

- RQ3: Is there a significant negative correlation between renewable share and $CO_2$ per capita?
   $H_0$ : There is no significant linear relationship (Pearson $p = 0$)\
   $H_a$ : There is a significant linear relationship (Pearson $p = 0$)
   
   <figure style="text-align: center;">
   		<img title="a title" alt="Alt text" src="corrr matrix Correlation Matrix of Key Variables.png">
	</figure>

	The Pearson coefficient and p-value calculated were $r = -0.190, p‐value = 1.32e-14$.
	The Pearson correlation between renewable‐energy share and $CO_2$ emissions per capita is $r = –0.190 (p \approx 1.3×10^{-14})$. This indicates a weak but statistically highly significant negative relationship. That is, countries with higher proportions of primary energy consumption coming from renewables tend to have slightly lower $CO_2$ emissions on a per-person basis.  The small p-value ($p < 0.001$) means we can confidently reject the null hypothesis of zero correlation in favor of the alternative that the association is non-zero. The magnitude of $r (–0.19)$ also suggests that renewables share alone explains only about 3.6% of the variability in $CO_2$ per capita $(r^Z \approx 0.036)$ which is a small effect.

- RQ4: How well does including renewable share as an exogenous variable in an ARIMA model perform in forecasting of annual $CO_2$ emissions?
	Forecast Accuracy & Exogenous Effect of Renewables\
	Incorporating renewable share as an exogenous regressor in a $SARIMAX(1,1,1)$ model yields a significant negative coefficient of $–4.44 × 10^6 (p < 0.001)$, which indicates that a percentage‐point increase in renewables share corresponds to about 4.4 million fewer tonnes of $CO_2$, all other factors held constant. On the test set, the model achieves an RMSE of approximately 13.6 million tonnes.  Although the absolute errors are sizable the model only explains about 17% of out of sample variance ($R^2 = 0.167$), suggesting that while renewables share is a meaningful driver, other factors (e.g., economic structure shifts, policy changes, etc) also play major roles in emission fluctuations.

	The AR(1) coefficient (0.932) and MA(1) coefficient (−0.914) remain highly significant ($p < 0.001$), which indicates that past emission levels strongly inform current values.  The diagnostic tests (Ljung–Box $Q = 0.32$, $p = 0.57$; Jarque–Bera $p = 0.54$) indicate that there is no serious residual autocorrelation or non‐normality.  Overall, the SARIMAX model shows that renewable adoption has a downward effect on emissions but its moderate performance highlights the need to incorporate additional variables or more flexible modeling approaches for more accurate predictions.

	<figure style="text-align: center;">
   		<img title="a title" alt="Alt text" src="ARIMAX Forecast vs Actual CO₂ Emissions.png">
	</figure>
 

### PART 2 FOCUS: SOCIAL MEDIA AND WEB ANALYTICS