#### Exploring Global Renewable Energy Trends & Drivers  
This project investigates how countries adopt renewable energy and what factors influence that adoption.  
The three datasets  used are:

**IRENA** (Renewable energy generation & capacity)  
**OWID** (CO₂, energy use, GDP)  
**WGI** (Governance indicators)  

**Continental/Regional Groupings**
- Africa → All African countries included in IRENA data.
- Americas → North, Central, and South America combined.
- Asia → Asian countries, typically including Middle East.
- Europe → European countries.
- Oceania → Australia, New Zealand, and Pacific island nations.

**What are the key factors that influence how countries adopt renewable energy over time?**
**Sub-Questions (RQs)**
##### **RQ1:**  How does renewable adoption vary by region and technology?
##### **RQ2:**  Are governance indicators correlated with higher renewable adoption?
##### **RQ3:**  Which renewable technologies are driving growth? 
##### **RQ4:**  Does renewable adoption help reduce CO₂ emissions over time?

In [4]:
import pandas as pd
import sys
import os
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from pathlib import Path
from scipy.stats import skew # type: ignore
from IPython.display import display, HTML
import warnings

# Suppress unnecessary warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

In [2]:
## Project setup: configure paths and imports for accessing modules and data files
import sys
from pathlib import Path

# Set project root
project_root = Path.cwd().parent

# Add project_root and project_scripts to sys.path for imports
project_scripts = project_root / "project_scripts"
for p in [project_root, project_scripts]:
    if str(p) not in sys.path:
        sys.path.insert(0, str(p))

In [54]:
# Project Setup and Imports 
from project_scripts import project_path_setup
from project_scripts.data_handler import DataHandler
import utils

# Project paths (from project_path_setup.py)
project_root = project_path_setup.project_root
project_scripts = project_path_setup.project_scripts

# Data directories (relative paths from project_root)
raw_dir = project_root / "data" / "raw"
clean_dir = project_root / "data" / "cleaned"
final_dir = project_root / "data" / "final"
sqlite_dir = project_root / "data" / "sqlite"

# Ensure directories exist
for d in [clean_dir, final_dir, sqlite_dir]:
    d.mkdir(parents=True, exist_ok=True)

In [8]:
#helper functions to run sql and show/save plots
def run(query: str):
    return utils.run_sql(query)

def show_and_save(fig, folder: str, filename: str):
    utils.save_plot(fig, folder, filename)
    fig.show()


##### Question: How has renewable energy generation evolved globally over time?

In [60]:
#RQ1 Renewable adoption by region & technology
rq1_sql = """
SELECT 
    c.region,
    i.group_technology, 
    SUM(i.electricity_generation_gwh) AS total_generation
FROM irena_energy i
JOIN country c ON i.iso = c.iso
WHERE c.region IS NOT NULL
    AND c.region NOT IN ('Multilateral', 'Unspecified countries')
    AND i.group_technology NOT IN ('Fossil fuels', 'Nuclear','Other non-renewable energy','Pumped storage')
GROUP BY c.region, i.group_technology
ORDER BY total_generation DESC;
"""
rq1_df = run(rq1_sql)
rq1_df.head()

# Stacked Bar Plot for RQ1


fig = px.bar(
    rq1_df,
    x="region",
    y="total_generation",
    color="group_technology",
    barmode="stack",   # <-- stacked instead of grouped
    title="Renewable Electricity Generation by Region and Technology",
    labels={"total_generation": "Electricity Generation (GWh)"}
)
fig.update_yaxes(type="log")
show_and_save(fig, "rq1", "rq1_renewable_by_region_tech")
#fig.show()

The plot shows Renewable Electricity Generation by region by technology. We can infer that Asia and Americas has the highest electricity generation from the Renewable energy sources. Europe comes close and generates about 80% of electricity from the Renewable energy sources. 

In [65]:
# RQ1b Renewable adoption by region & technology 0ver years
rq1b_sql = """
SELECT 
    c.region,
    e.group_technology,
    e.year,
    SUM(e.electricity_generation_gwh) AS total_generation
FROM irena_energy e
JOIN country c ON e.iso = c.iso
WHERE e.group_technology != 'Nuclear'
    AND e.group_technology NOT IN ('Fossil fuels', 'Nuclear','Other non-renewable energy','Pumped storage')
    AND c.region NOT IN ('Multilateral', 'Unspecified countries')
    and year between '2001-01-01' AND '2023-01-01'
GROUP BY c.region, e.group_technology, e.year
ORDER BY c.region, e.group_technology, e.year;
"""

rq1_df = utils.run_sql(rq1b_sql)

# Interactive stacked line chart
fig_rq1 = px.area(
    rq1_df,
    x="year",
    y="total_generation",
    color="group_technology",
    facet_col="region",
    facet_col_wrap=3,  # Wrap facets into multiple rows
    title="Renewable Energy Generation by Technology and Region over years"
)
#utils.save_plot(fig_rq1, folder="rq1", filename="renewable_by_region_tech")
fig.update_yaxes(type="log")
fig_rq1.show()


There has been a clear growth in renewable energy generation over the years. The Americas and Europe show steady and consistent increases, while Asia experiences a sharp surge around 2015. Hydropower remains the dominant renewable technology across regions. Oceania and Africa show relatively limited progress, which may reflect the need for greater infrastructure development and industrial capacity in these regions.

In [67]:
# Pivot WGI indicators: long -> wide
# Need to pivot the WGI table so that each indicator becomes a column. 
# This allows easy merging with renewable energy and CO₂ data, and provides meaningful columns 
# for analysis and visualization. Full names for each indicator are included.
wgi_sql = """
SELECT 
    iso,
    year,
    MAX(CASE WHEN indicator='cc' THEN estimate END) AS control_of_corruption,
    MAX(CASE WHEN indicator='ge' THEN estimate END) AS government_effectiveness,
    MAX(CASE WHEN indicator='pv' THEN estimate END) AS political_stability,
    MAX(CASE WHEN indicator='rl' THEN estimate END) AS rule_of_law,
    MAX(CASE WHEN indicator='rq' THEN estimate END) AS regulatory_quality,
    MAX(CASE WHEN indicator='va' THEN estimate END) AS voice_accountability
FROM wgi_governance
GROUP BY iso, year
ORDER BY iso, year;
"""

wgi_df = utils.run_sql(wgi_sql)
wgi_df.head(3)


Unnamed: 0,iso,year,control_of_corruption,government_effectiveness,political_stability,rule_of_law,regulatory_quality,voice_accountability
0,ABW,2000-01-01,..,..,..,..,..,..
1,ABW,2002-01-01,..,..,..,..,..,..
2,ABW,2003-01-01,..,..,..,..,..,..


Whether governance quality is associated with higher renewable adoption

In [71]:
#RQ Governance vs renewable adoption
#This explores whether governance indicators are correlated with renewable adoption.

rq2_sql = """
SELECT 
    e.iso,
    c.country,
    e.year,
    SUM(e.electricity_generation_gwh) AS total_generation,
    g.control_of_corruption,
    g.government_effectiveness,
    g.political_stability,
    g.rule_of_law,
    g.regulatory_quality,
    g.voice_accountability
FROM irena_energy e
JOIN country c ON e.iso = c.iso
JOIN (
    SELECT 
        iso,
        year,
        MAX(CASE WHEN indicator='cc' THEN estimate END) AS control_of_corruption,
        MAX(CASE WHEN indicator='ge' THEN estimate END) AS government_effectiveness,
        MAX(CASE WHEN indicator='pv' THEN estimate END) AS political_stability,
        MAX(CASE WHEN indicator='rl' THEN estimate END) AS rule_of_law,
        MAX(CASE WHEN indicator='rq' THEN estimate END) AS regulatory_quality,
        MAX(CASE WHEN indicator='va' THEN estimate END) AS voice_accountability
    FROM wgi_governance
    GROUP BY iso, year
) g ON c.iso = g.iso AND e.year = g.year
GROUP BY e.iso, c.country, e.year;
"""

rq2_df = utils.run_sql(rq2_sql)

# Convert relevant columns to numeric, coercing errors to NaN
rq2_df['government_effectiveness'] = pd.to_numeric(rq2_df['government_effectiveness'], errors='coerce')
rq2_df['total_generation'] = pd.to_numeric(rq2_df['total_generation'], errors='coerce')

# Drop rows with NaNs in these columns to avoid plotting issues
rq2_df_clean = rq2_df.dropna(subset=['government_effectiveness', 'total_generation'])

# Scatter plot with trendline to show the correlation.: total_generation vs gov indicator
fig_rq2 = px.scatter(
    #rq2_df,
    rq2_df_clean,
    x="government_effectiveness",
    y="total_generation",
    size="total_generation",
    color="total_generation",
    hover_name="country",
    trendline="ols",
    title="Governance Effectiveness vs Renewable Energy Generation"
)
fig_rq2.update_yaxes(type="log")
show_and_save(fig_rq2, "rq2", "rq2_gov_vs_renewable")


Governance indicators range from negative to positive values, where negative scores represent below-average governance quality. The scatter plot shows a positive relationship between governance quality and renewable energy adoption, suggesting governance plays a supporting but not exclusive role.

Which technologies drive global trends

In [76]:
#RQ3 Regional Tech Share
rq3_sql = """
WITH tech_totals AS (
    SELECT
        c.region,
        i.year,
        i.technology,
        SUM(i.electricity_generation_gwh) AS tech_generation
    FROM irena_energy i
    JOIN country c ON i.iso = c.iso
    GROUP BY c.region, i.year, i.technology
),
region_totals AS (
    SELECT region, year, SUM(tech_generation) AS total_generation
    FROM tech_totals GROUP BY region, year
)
SELECT
    t.region,
    t.year,
    t.technology,
    t.tech_generation,
    (t.tech_generation * 1.0 / r.total_generation) * 100 AS tech_share_pct
FROM tech_totals t
JOIN region_totals r 
    ON t.region = r.region AND t.year = r.year
ORDER BY t.region, t.year, tech_share_pct DESC;
"""
rq3_df = run(rq3_sql)
rq3_df.head()

#Shows which technologies dominate each region.
fig_rq3 = px.treemap(
    rq3_df,
    path=["region", "technology"],
    values="tech_share_pct",
    title="Technology Share of Renewable Generation",
    color="tech_share_pct",
    color_continuous_scale="Blues",
)
#fig_rq3.update_yaxes(type="log")
show_and_save(fig_rq3, "rq3", "rq3_treemap")
#fig.show()


This chart illustrates the technology share—both renewable and non-renewable within each region. Each technology is represented as a sub-block. The size of a block reflects the proportion of regional energy generation contributed by that technology, while the color shading further emphasizes the magnitude of its share. This shows Coal and Peat shares majority of the electricity genaration across all regions.

Which technology drives adoption trends?

In [81]:
#RQ Which technology drives adoption trends?
#Aggregates generation by technology to identify which sources drive overall adoption.

rq3a_sql = """
SELECT 
    group_technology,
    year,
    SUM(electricity_generation_gwh) AS total_generation
FROM irena_energy
WHERE re_or_non_re = 'Total Renewable'
AND group_technology NOT IN ('Fossil fuels', 'Nuclear','Other non-renewable energy','Pumped storage')
AND year between '2001-01-01' AND '2023-01-01'
GROUP BY group_technology, year
ORDER BY year, group_technology;
"""

rq3a_df = utils.run_sql(rq3a_sql)

# Get full grid of years × technologies
years = rq3a_df['year'].unique()
technologies = rq3a_df['group_technology'].unique()
full_index = pd.MultiIndex.from_product([technologies, years], names=['group_technology', 'year'])

# Reindex and fill missing values with 0
rq3a_df = rq3a_df.set_index(['group_technology', 'year']).reindex(full_index, fill_value=0).reset_index()

# Apply log1p transformation to handle very low/zero values
rq3a_df['log_total_generation'] = np.log1p(rq3a_df['total_generation'])
#Line chart clearly shows growth trends for solar, wind, hydro, biomass. 
fig_rq3a = px.line(
    rq3a_df,
    x="year",
    #y="total_generation",
    y="log_total_generation",
    color="group_technology",
    title="Global Renewable Energy Generation Trends by Technology"
)
show_and_save(fig_rq3a, folder="rq3a", filename="tech_trends")
#fig_rq3a.show()    

From 2001 to 2013, the ranking of technologies in terms of electricity generation remained consistent, with Hydropower producing the most and Other Renewable energies the least. However, from 2015 onwards, Solar, Hydropower, and Wind have increased their generation, surpassing Bioenergy, which had been the leading contributor prior to 2015.

How renewable adoption relates to CO₂ emissions

In [87]:
#RQ Renewable adoption vs CO₂ emissions  based on Country
rq4a_sql = """
SELECT
    c.country,
    i.year,
    SUM(i.electricity_generation_gwh) AS renewable_gwh,
    o.co2,
    o.co2_per_capita,
    o.co2_growth_prct
FROM irena_energy i
JOIN country c ON i.iso = c.iso
JOIN owid_co2 o 
    ON i.iso = o.iso AND i.year = o.year
WHERE i.year BETWEEN '2001-01-01' AND '2023-01-01'
GROUP BY c.country, i.year
ORDER BY i.year, c.country;
"""
rq4a_df = run(rq4a_sql)
rq4a_df.head()

#Shows renewable growth vs CO₂ emissions for a selected country.
country = "China"

sub = rq4a_df[rq4a_df["country"] == country]

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=sub["year"], y=sub["renewable_gwh"], name="Renewable GWh", yaxis="y1"
))
fig.add_trace(go.Scatter(
    x=sub["year"], y=sub["co2"], name="CO₂ Emissions (Mt)", yaxis="y2"
))

fig.update_layout(
    title=f"Renewables vs CO₂ Emissions — {country}",
    yaxis=dict(title="Renewable Generation (GWh)"),
    yaxis2=dict(title="CO₂ (Mt)", overlaying="y", side="right"),
)

show_and_save(fig, "rq4a", f"rq4a_dual_axis_{country}")


We observe that CO₂ emissions remain high despite the increase in renewable electricity generation. This suggests that while renewable energy adoption is growing, it has not yet been sufficient to significantly reduce overall emissions, likely due to continued reliance on fossil fuels in other sectors or regions.

In [89]:
#RQ Renewable adoption vs CO₂ emissions
#This checks if countries with higher renewable capacity per capita have lower CO₂ emissions.
rq4_sql = """
SELECT 
    i.year,
    SUM(i.electricity_generation_gwh) AS total_re,
    SUM(o.co2) AS total_co2
FROM irena_energy i
JOIN owid_co2 o
    ON i.iso = o.iso AND i.year = o.year
WHERE i.year between '2001-01-01' AND '2023-01-01'
GROUP BY i.year
ORDER BY i.year;
"""

rq4_df = utils.run_sql(rq4_sql)
#Trendline shows whether there is a negative correlation.
fig_rq4 = px.line(
    rq4_df,
    x="year",
    y=["total_re", "total_co2"],
    title="Global Renewable Electricity Generation vs CO₂ Emissions",
    labels={"value": "GWh / CO₂ (Mt)", "variable": "Metric"}
)
utils.save_plot(fig_rq4, folder="rq4", filename="renewable_vs_co2")
fig_rq4.show()


The relationship between renewable electricity generation and CO₂ emissions suggests that higher renewable adoption is associated with lower or slower-growing emissions. However, the relationship is not perfectly linear, indicating that other factors such as economic structure and energy demand also influence emissions levels.

In [92]:
# RQ5 — Fastest Growing Regions
rq5_sql = """
WITH region_year AS (
    SELECT
        c.region,
        i.year,
        SUM(i.electricity_generation_gwh) AS total_gwh
    FROM irena_energy i
    JOIN country c ON i.iso = c.iso
    GROUP BY c.region, i.year
)
SELECT
    region,
    year,
    total_gwh,
    LAG(total_gwh) OVER (PARTITION BY region ORDER BY year) AS prev_gwh,
    ROUND(
        (total_gwh - LAG(total_gwh) OVER (PARTITION BY region ORDER BY year)) * 100.0
        / LAG(total_gwh) OVER (PARTITION BY region ORDER BY year), 2
    ) AS yoy_growth_pct
FROM region_year
ORDER BY region, year;
"""
rq5_df = run(rq5_sql)

#
fig = px.line(
    rq5_df,
    x="year",
    y="yoy_growth_pct",
    color="region",
    markers=True,
    title="Year-over-Year Growth Rate of Renewables by Region"
)
fig.update_yaxes(type="log")
show_and_save(fig, "rq5", "rq5_yoy_growth")
#fig.show()


This shows steady growth across all regions. Asia appears to be the leading region in adopting and expanding renewable energy practices. A logarithmic scale was used to better distinguish and compare relatively overlapping trend lines.

In [99]:
#RQ Governance + Renewable vs CO₂
#Combines governance effectiveness, renewable capacity, and CO₂ emissions to explore multi-factor influence.

rq6_sql = """
SELECT 
    d.year,
    AVG(d.sdg_7b1_capacity_per_capita) AS avg_capacity,
    AVG(d.co2_per_capita) AS avg_co2,
    AVG(g.government_effectiveness) AS avg_gov_eff
FROM derived_metrics d
JOIN country c ON c.iso = d.iso
JOIN (
    SELECT 
        iso,
        year,
        MAX(CASE WHEN indicator='ge' THEN estimate END) AS government_effectiveness
    FROM wgi_governance
    GROUP BY iso, year
) g ON d.iso = g.iso AND d.year = g.year
GROUP BY d.year
ORDER BY d.year;
"""

rq6_df = utils.run_sql(rq6_sql)

# Shift governance scores to positive for bubble size. new column that shifts all values to be positive.
rq6_df['size_gov_eff'] = rq6_df['avg_gov_eff'] - rq6_df['avg_gov_eff'].min() + 0.1

#Bubble size represents governance strength — bigger bubbles indicate stronger governance.
fig_rq6 = px.scatter(
    rq6_df,
    x="avg_capacity",
    y="avg_co2",
    size="size_gov_eff",
    color="avg_gov_eff",
    hover_name="year",
    trendline="ols",
    title="Governance & Renewable Capacity vs CO₂ per Capita"
)
show_and_save(fig_rq6, folder="rq6", filename="gov_renewable_vs_co2")
#fig_rq6.show()


This plot shows the relationship between governance, renewable electricity generation, and CO₂ emissions over the years. The trend lines suggest that stronger governance is associated with increased adoption of renewable energy, but not limited to it. However, CO₂ emissions are still rising, likely driven by higher production and energy demand. This indicates that while governance can support renewable growth, broader structural and economic factors also play a crucial role in influencing emissions.

In [101]:
#Top Countries by Renewable Electricity Generation
rq6a_sql = """
SELECT
    c.country,
    c.region,
    SUM(i.electricity_generation_gwh) AS total_renewables
FROM irena_energy i
JOIN country c ON i.iso = c.iso
WHERE i.re_or_non_re = 'Total Renewable'
GROUP BY c.country, c.region
HAVING total_renewables > 0
ORDER BY total_renewables DESC
LIMIT 20
"""
rq6a_df = utils.run_sql(rq6a_sql)

import plotly.express as px

# Sort dataframe globally (important)
df_sorted = rq6a_df.sort_values("total_renewables", ascending=False)

fig6a = px.bar(
    df_sorted,
    x="country",
    y="total_renewables",
    color="region",
    title="Top 20 Countries by Total Renewable Electricity Generation",
    labels={"total_renewables": "Total Renewable Generation (GWh)"},
    category_orders={
        "country": df_sorted["country"].tolist()  # enforce global order
    }
)
show_and_save(fig6a, folder="rq6a", filename="top_20_renewable_countries")

How renewable generation evolves globally over time

In [100]:
## Renewable Electricity Over Time
rq7_sql = """
SELECT 
    i.iso,
    c.country,
    i.year,
    SUM(i.electricity_generation_gwh) AS total_generation
FROM irena_energy i
JOIN country c ON i.iso = c.iso
GROUP BY i.iso, i.year
"""
rq7_df = utils.run_sql(rq7_sql)

fig_rq7 = px.choropleth(
    rq7_df,
    locations="iso",
    color="total_generation",
    hover_name="country",
    animation_frame="year",
    color_continuous_scale="YlGn",
    title="Global Renewable Electricity Generation by Country (Over Time)",
    labels={"total_generation": "Electricity Generation (GWh)"}
)
show_and_save(fig_rq7, folder="rq7", filename="renewable_elec_trend")


This is an encouraging plot, showing a clear increase in renewable energy resources over time. This upward trend is a positive sign for climate action, indicating that the transition to cleaner energy sources is gaining momentum. It suggests progress toward reducing reliance on fossil fuels, although continued efforts are needed to achieve meaningful climate impact.