## INTRODUCTION

> In the face of growing environmental concerns and the depletion of fossil fuel reserves, the transition to renewable energy sources has become a global imperative. Renewable energy derived from natural processes that are continuously replenished such as solar, wind, hydro, and geothermal offers a sustainable and cleaner alternative to traditional energy systems.
>
> This project focuses on analyzing renewable energy consumption trends across various countries from 2000 to 2022. By examining patterns linked to economic development, technological adoption, and energy sector investments, the study aims to uncover key drivers of renewable energy use. The project also forecasts energy consumption for the years 2023 and 2024 and evaluates progress toward global sustainability goals.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
import plotly.colors
import wbdata
import datetime
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [2]:
energy_data = pd.read_csv("../Data/AnalysisData.csv")

In [3]:
energy_data = energy_data.drop(columns='Unnamed: 0')

## DATA SOURCES
> To support our analysis, we collected data from four main sources:

| Data                                                   | Data Type             | Source                                                                                                                                               | Time Range   |
|--------------------------------------------------------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|
| Energy Consumption Data                                | CSV                    | Kaggle World Energy Consumption Data. [Link](https://www.kaggle.com/datasets/pralabhpoudel/world-energy-consumption/data)                           | 2000 - 2022  |
| GNI Information and Socio-Economic Status for each Country | wbdata Python Module    | World Bank's Data API (wbdata)                                                                                                                        | 2000 - 2022  |
| GDP Information for each Country (Test Data)           | Ninjas API             | Country wise GDP data API. [Link](https://www.api-ninjas.com/api/gdp)                                                                                | 2023 - 2024  |
| Population Information for each Country (Test Data)    | Ninjas API             | Country wise population data API. [Link](https://www.api-ninjas.com/api/population)                                                                  | 2023 - 2024  |


## DATA PROCESSING
> The key data processing steps we took into consideration are:
> 
> `Data Retrieval` - The data was taken form the sources above, the resulting dataframe was a combination of CSV, world bank data, and data pulled via API.
>
> `Data Cleaning` – We cleaned the dataset, standardizing column names for consistency, and addressing missing values through interpolation or by dropping incomplete rows where appropriate.
>
> `Transformation` – We standardized time formats, and created new columns like GNI, Socio Economic Status, and per capita energy consumption.
>
> `Merging Datasets` – Using consistent country codes (ISO) and years, we merged all datasets into a single dataframe aligned by country and time.
>
> `Test Data Integration` – We added 2023–2024 population, GNI and GDP data, ensuring it matched the structure of our main dataset for accurate prediction modeling.
>
> The first five rows of our processed data is displayed below.

In [4]:
pd.set_option('display.max_columns',100)
energy_data.head()

Unnamed: 0,country,year,iso_code,gdp,population,biofuel_consumption,biofuel_electricity,coal_consumption,coal_electricity,electricity_demand,electricity_generation,electricity_share_energy,fossil_electricity,fossil_fuel_consumption,gas_consumption,gas_electricity,hydro_consumption,hydro_electricity,low_carbon_consumption,low_carbon_electricity,nuclear_consumption,nuclear_electricity,oil_consumption,oil_electricity,other_renewable_consumption,other_renewable_electricity,other_renewable_exc_biofuel_electricity,per_capita_electricity,primary_energy_consumption,renewables_consumption,renewables_electricity,solar_consumption,solar_electricity,wind_consumption,wind_electricity,GNI,Socio-Economic Status
0,Afghanistan,2000,AFG,11283790000.0,19542986.0,0.0,0.0,0.0,0.0,0.57,0.47,0.0,0.16,0.0,0.0,0.0,0.0,0.31,0.0,0.31,0.0,0.0,0.0,0.16,0.0,0.0,0.0,24.05,5.914,0.0,0.31,0.0,0.0,0.0,0.0,,
1,Afghanistan,2001,AFG,11021270000.0,19688634.0,0.0,0.0,0.0,0.0,0.69,0.59,0.0,0.09,0.0,0.0,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.09,0.0,0.0,0.0,29.967,4.664,0.0,0.5,0.0,0.0,0.0,0.0,,
2,Afghanistan,2002,AFG,18804870000.0,21000258.0,0.0,0.0,0.0,0.0,0.79,0.69,0.0,0.13,0.0,0.0,0.0,0.0,0.56,0.0,0.56,0.0,0.0,0.0,0.13,0.0,0.0,0.0,32.857,4.428,0.0,0.56,0.0,0.0,0.0,0.0,180.0,Low-income
3,Afghanistan,2003,AFG,21074340000.0,22645136.0,0.0,0.0,0.0,0.0,1.04,0.94,0.0,0.31,0.0,0.0,0.0,0.0,0.63,0.0,0.63,0.0,0.0,0.0,0.31,0.0,0.0,0.0,41.51,5.208,0.0,0.63,0.0,0.0,0.0,0.0,190.0,Low-income
4,Afghanistan,2004,AFG,22332570000.0,23553554.0,0.0,0.0,0.0,0.0,0.99,0.89,0.0,0.33,0.0,0.0,0.0,0.0,0.56,0.0,0.56,0.0,0.0,0.0,0.33,0.0,0.0,0.0,37.786,4.81,0.0,0.56,0.0,0.0,0.0,0.0,210.0,Low-income


## DATA SUMMARY STATISTICS

> The below stated dataframe is the summary statistics for our energy consumption data.

In [5]:
energy_data.describe()

Unnamed: 0,year,gdp,population,biofuel_consumption,biofuel_electricity,coal_consumption,coal_electricity,electricity_demand,electricity_generation,electricity_share_energy,fossil_electricity,fossil_fuel_consumption,gas_consumption,gas_electricity,hydro_consumption,hydro_electricity,low_carbon_consumption,low_carbon_electricity,nuclear_consumption,nuclear_electricity,oil_consumption,oil_electricity,other_renewable_consumption,other_renewable_electricity,other_renewable_exc_biofuel_electricity,per_capita_electricity,primary_energy_consumption,renewables_consumption,renewables_electricity,solar_consumption,solar_electricity,wind_consumption,wind_electricity,GNI
count,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,4906.0,2183.0
mean,2010.779861,335665000000.0,32929750.0,3.076598,1.715073,183.751491,38.836549,100.146439,100.36985,5.374279,65.460005,543.3077,146.87327,21.799378,41.230227,16.245405,94.935919,34.783664,33.241138,12.190916,216.639179,4.824078,6.068109,2.077266,0.338628,3513.539697,662.788365,61.711266,22.592748,3.265791,1.230526,8.102476,3.061769,12308.163078
std,6.492154,1375298000000.0,130788800.0,23.917948,7.825211,1409.748517,276.279668,471.307293,470.801025,7.521168,329.807788,2512.49669,602.690993,91.052231,200.219805,73.711358,431.681586,152.595062,183.403453,66.445481,866.102241,14.911602,27.223656,8.701784,1.773539,5091.6099,2902.209565,304.850686,107.73416,30.902472,11.686146,64.380731,24.586354,17963.929909
min,2000.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,120.0
25%,2005.0,0.0,742355.5,0.0,0.0,0.0,0.0,0.6,0.49,0.0,0.18,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,295.438,5.2755,0.0,0.01,0.0,0.0,0.0,0.0,1325.0
50%,2011.0,19260430000.0,5812102.0,0.0,0.0,0.0,0.0,6.86,6.425,0.0,2.17,0.0,0.0,0.0,0.0,0.48,0.0,0.98,0.0,0.0,0.0,0.43,0.0,0.0,0.0,1825.2315,46.337,0.0,0.96,0.0,0.0,0.0,0.0,4320.0
75%,2016.0,142020300000.0,21492200.0,0.0,0.25,10.774,2.49,41.495,42.6125,12.845,23.84,211.4545,49.6495,9.985,5.23675,6.31,20.05025,11.655,0.0,0.0,108.8825,3.22,0.21925,0.37,0.0,4970.3345,300.666,12.122,8.83,0.003,0.02,0.01575,0.06,14880.0
max,2022.0,18151620000000.0,1425894000.0,433.866,172.13,24559.486,5421.19,8821.43,8839.13,33.141,5710.28,36318.707,8812.123,1689.46,3471.19,1321.71,8137.857,3128.85,2303.296,809.41,11214.078,195.62,563.745,176.629,19.16,56030.785,44275.914,7092.607,2711.22,1115.113,420.35,1988.45,800.52,120990.0


In [6]:
#Data Frame to get the total renewable energy consumed
df_generation = energy_data.groupby('year')[['biofuel_consumption','hydro_consumption', 'solar_consumption', 
                                            'wind_consumption','other_renewable_consumption']].sum()
df_generation = df_generation.reset_index()
df_plot = pd.melt(df_generation, id_vars=['year'], 
                value_vars=['biofuel_consumption','hydro_consumption', 
                            'solar_consumption', 'wind_consumption','other_renewable_consumption'],
                var_name='Energy Type', value_name='Energy Consumption')

## VISUALIZATION

### Renewable Energy Consumption Growth 
> In this section, we analyze the consumption trends of key renewable energy sources — biofuels, solar, wind, and hydro — over time. The visualization clearly illustrates a significant surge in the adoption of most renewable energy sources beginning around 2015. This upward trend highlights a pivotal shift in global energy strategies, as countries increasingly prioritized cleaner and more sustainable alternatives in response to climate change concerns, policy changes, and technological advancements.

In [7]:
# Plot to Analyze Renewable Energy Consumption Growth

df_plot = df_plot.sort_values(['year', 'Energy Type'], ignore_index=True)
N_UNIQUE_Energy_type = df_plot['Energy Type'].nunique()
custom_colors = {
    "biofuel_consumption": "#24221e",
    "hydro_consumption": "#858181",
    "solar_consumption": "#3f9fbf",
    "wind_consumption": "#f051c6",
    "other_renewable_consumption": "#f57842",
}
df_indexed = pd.DataFrame()
for index in np.arange(start=0,
                    stop=len(df_plot)+1,
                    step=N_UNIQUE_Energy_type):
    df_slicing = df_plot.iloc[:index].copy()
    df_slicing['frame'] = (index//N_UNIQUE_Energy_type)
    df_indexed = pd.concat([df_indexed, df_slicing])

#Scatter Plot
scatter_plot = px.scatter(
    df_indexed,
    x='year',
    y='Energy Consumption',
    color='Energy Type',
    animation_frame='frame',
    color_discrete_map=custom_colors
)

for frame in scatter_plot.frames:
    for data in frame.data:
        data.update(mode='markers',
                    showlegend=True,
                    opacity=1)
        data['x'] = np.take(data['x'], [-1])
        data['y'] = np.take(data['y'], [-1])
        print
        
# Line Plot
line_plot = px.line(
    df_indexed,
    x='year',
    y='Energy Consumption',
    color='Energy Type',
    animation_frame='frame',
    width=1000,
    height=500,
    color_discrete_map=custom_colors
)

line_plot.update_traces(showlegend=False)
for frame in line_plot.frames:
    for data in frame.data:
        data.update(mode='lines', opacity=0.8, showlegend=False)

# Combining Scatter and Line Plot
combined_plot = go.Figure(
    data=line_plot.data + scatter_plot.data,
    frames=[
        go.Frame(data=line_plot.data + scatter_plot.data, name=scatter_plot.name)
        for line_plot, scatter_plot in zip(line_plot.frames, scatter_plot.frames)
    ],
    layout=line_plot.layout
)

combined_plot.update_yaxes(
    gridcolor='#7a98cf',
    griddash='dot',
    gridwidth=.5,
    linewidth=2,
    tickwidth=2,
    title_font=dict(size=16)
)

combined_plot.update_xaxes(
    title_font=dict(size=16),
    linewidth=2,
    tickwidth=2
)

combined_plot.update_traces(
    line=dict(width=5),
    marker=dict(size=25))

combined_plot.update_layout(
    font=dict(family="Rockwell", size=12),
    yaxis=dict(tickfont=dict(size=10)),
    xaxis=dict(tickfont=dict(size=10)),
    showlegend=True,
    legend=dict(
        title='Energy group'
        ),
    title={'text':"<b>Renewable Energy Consumed Over Time</b>", 'x': 0.5},
    yaxis_title="Amount of Energy Consumed",
    xaxis_title="Year",
    yaxis_showgrid=True,
    xaxis_range=[df_indexed['year'].min(),
                df_indexed['year'].max()],
    yaxis_range=[df_indexed['Energy Consumption'].min(),
                 df_indexed['Energy Consumption'].max() * 1.1],
    title_x=0.5
)
# adjust speed of animation
combined_plot['layout'].pop("sliders")
combined_plot.layout.updatemenus[0].buttons[0]['args'][1]['frame']['duration'] = 140
combined_plot.layout.updatemenus[0].buttons[0]['args'][1]['transition']['duration'] = 60
combined_plot.layout.updatemenus[0].buttons[0]['args'][1]['transition']['redraw'] = True
combined_plot.show()

### Renewable Energy Consumption World Map 
> A choropleth map was generated to analyze the temporal progress of renewable energy consumption across various countries. While North America initially held the largest share, the map reveals China's increasing dominance, eventually becoming the global leader in renewable energy consumption.

In [8]:
# Choropleth map for Renewable Energy Consumption

fig = px.choropleth(
    energy_data,
    locations="iso_code",  
    color="renewables_consumption",  
    hover_name="country",  
    animation_frame="year",  
    color_continuous_scale="Greens",  
    projection="natural earth",
    width=800,
    height=500
)  

fig.update_layout(title="<b>Renewable Energy Consumption Growth</b>", geo=dict(showcoastlines=True))

fig.show()

In [9]:
# Data Frame to get the total non renewable energy consumed 
non_renewable = energy_data.groupby('year')[['coal_consumption', 'fossil_fuel_consumption', 'gas_consumption', 'nuclear_consumption', 'oil_consumption']].sum()
non_renewable = non_renewable.reset_index()

non_renewable = non_renewable.melt(id_vars=['year'], value_vars=['coal_consumption', 'fossil_fuel_consumption', 'gas_consumption', 'nuclear_consumption', 'oil_consumption'],
                                   var_name='Energy Type', value_name='Energy Consumption')

non_renewable['Energy Consumption'] = non_renewable['Energy Consumption'].round(4)

### Non-Renewable Energy Consumption Plots Analysis

> Following this, comparable plots were created to examine the trends in non-renewable energy consumption across the years. Though there is a noticable dip in the non-renewable energy resource consumption in 2020, the overall analysis indicates a relatively steady increase in non-renewable energy consumption. 

In [10]:
non_renewable = non_renewable.sort_values(by=['Energy Type', 'year'])

#Create the "Start" frame (all values = 0)
start_frame = non_renewable.copy()
start_frame['Energy Consumption'] = 0
start_frame['animated_window'] = 'Start'

#Building frames for line animation
frames = [start_frame]

animated_window = non_renewable['year'].unique()

for year in animated_window:
    df_until_year = non_renewable[non_renewable['year'] <= year].copy()
    df_until_year['animated_window'] = str(year)
    frames.append(df_until_year)

# Combine all frames into one DataFrame
animated_df = pd.concat(frames)

#Plot animated line chart
fig = px.line(
    animated_df,
    x="year",
    y="Energy Consumption",
    color="Energy Type",
    hover_name="Energy Type",
    animation_frame="animated_window",
    range_x=[2000, 2023],
    range_y=[1000, 150000],
    color_discrete_map={
        'coal_consumption': '#24221e',
        'fossil_fuel_consumption': '#858181',
        'gas_consumption': '#3f9fbf',
        'nuclear_consumption': '#f051c6',
        'oil_consumption': '#f57842',
    },
    markers="year",
)

fig.update_traces(mode="markers+lines")

fig.update_layout(
    font=dict(family="Rockwell", size=12),
    width=900,
    height=600,
    title={'text':"<b>Total Non Renewable Energy Consumed Over Time</b>", 'x': 0.5},
    yaxis_title="Amount of Energy Consumed",
    xaxis_title="Year",
)

#Slow down animation speed
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 300
fig.show()

In [11]:
energy_data['non_renewable_consumption'] = energy_data[['coal_consumption', 'fossil_fuel_consumption','gas_consumption', 'nuclear_consumption', 'oil_consumption']].sum(axis=1)

In [12]:
# Choropleth map for None Renewable Energy Consumption
fig = px.choropleth(
    energy_data,
    locations="iso_code",  
    color="non_renewable_consumption",  
    hover_name="country",  
    animation_frame="year",  
    color_continuous_scale="Reds",  
    projection="natural earth",
)  

fig.update_layout(title="<b>Non Renewable Energy Consumption Growth</b>", geo=dict(showcoastlines=True))

fig.show()

In [13]:
comparison_data = energy_data.groupby(['country', 'year'])[['renewables_consumption', 'non_renewable_consumption']].sum().reset_index()

top_5_countries = comparison_data.groupby('country')['renewables_consumption'].mean().nlargest(5).index.tolist()

comparison_data = comparison_data[comparison_data['country'].isin(top_5_countries)].reset_index(drop=True)

### Energy Consumption Comparision Over Time

> This analysis examines global energy consumption trends over the years, focusing on the top five consuming nations. While the United States led in 2000, China surpassed it by 2011 and maintained its dominant position through 2022. Notably, renewable energy consumption has increased, especially in China. However, non-renewable sources still remains the dominant energy resource for all countries, highlighting both the advancement of clean energy and the persistent dependence on fossil fuels in major economies.


In [14]:
melted_data = comparison_data.melt(
    id_vars=['country', 'year'],
    value_vars=['renewables_consumption', 'non_renewable_consumption'],
    var_name='energy_type',
    value_name='consumption'
)

energy_type_map = {'renewables_consumption':'Renewable',
                    'non_renewable_consumption':'Non Renewable'}

melted_data['energy_type'] = melted_data['energy_type'].map(energy_type_map)

fig1 = px.bar(
    melted_data,
    x='country',
    y='consumption',
    animation_frame='year',
    color='energy_type',
    barmode='group',
    title="Energy Consumption Comparison Over Time",
    color_discrete_map = {
    'Renewable': 'green',
    'Non Renewable': 'gray',
    },
    height=550,  
    width=900  
)

fig1.update_layout(
    xaxis_title="Country",
    yaxis_title="Energy Consumption (TWh)",
    xaxis={'categoryorder': 'total descending'},
)

fig1.update_traces(
    hovertemplate='Country: %{x}<br>Consumption: %{y} TWh<br>Type: %{color}<extra></extra>'
)

fig1.update_yaxes(separatethousands=True)

# Smooth animation
fig1.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 500

fig1.show()


In [15]:
predicted_data = pd.read_csv("../Data/PredictedRenewableConsumption.csv")

In [16]:
predicted_data = predicted_data.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'])

## PREDICTIVE ANALYSIS

### Linear Regression Model
> • Our goal is to predict renewable energy consumption for the years 2023 and 2024.
>
> • Key predictors used in the model include Year, ISO Code, GDP, and GNI.
>
> • The model achieved an R-squared value of 0.734, indicating a good fit to the data.

### Random Forest Model
> • The same set of predictors were utilized to model renewable energy consumption using a Random Forest algorithm.
>
> • The model showed an improved performance, with an R² score of 0.801, reflecting a reliable level of predictive accuracy. 
>
> As the Random Forest model had the superior performance, achieving a higher R² value of 0.801, we used this model to predict renewable energy consumption for 2023 and 2024. The 5 rows of the predicted data is shown below.

In [17]:
predicted_data.head()

Unnamed: 0,country,year,population,gdp per capita nominal,gdp per capita ppp,iso_code,GNI,predicted_renewables_consumption
0,Afghanistan,2023,41454761,410.933,2173.745,AFG,380.0,0.14076
1,Albania,2024,2791765,9598.191,21376.586,ALB,,0.0
2,Albania,2023,2811655,8299.278,20018.303,ALB,7680.0,0.0
3,Algeria,2024,46814308,5579.128,17718.286,DZA,,7.33403
4,Algeria,2023,46164219,5221.813,16900.136,DZA,4950.0,10.16934


In [18]:
#Total renewable consumption for the year 2023-2024
comparison_data = predicted_data.groupby(['year'])[['predicted_renewables_consumption']].sum().reset_index().rename(columns={'predicted_renewables_consumption':'renewables_consumption'})

In [19]:
energy_data_values = energy_data.groupby(['year'])[['renewables_consumption']].sum().reset_index()

In [20]:
comparison_data = pd.concat([energy_data_values,comparison_data],ignore_index=True)

### Renewable Energy Consumption (2000–2024)

> The time trend plot, generated from our predicted values, illustrates a period of rapid renewable energy consumption growth from 2000 to 2022. However, the flattening of the curve in 2023 and 2024 is likely due to data gaps for a few countries(missing data). Therefore, we cannot definitively confirm the observed dip in energy consumption during these later years.

In [21]:
# Plot using Plotly
fig = px.line(comparison_data, x="year", y="renewables_consumption",
              title="Renewable Energy Consumption (2000–2024)",
              labels={"renewables_consumption": "Consumption", "year": "Year"},
              )

# Show plot
fig.update_xaxes(dtick=1)
fig.show()

## CONCLUSION

> In conclusion, analyzing renewable energy consumption trends across countries over time provides crucial insights into how nations are transitioning effectively toward clean energy sources. This kind of analysis supports governments in setting informed targets and regulations that align with global sustainability goals. Additionally, by leveraging historical trends, we can use predictive modeling to forecast future energy demands, helping them make smarter and more strategic energy planning decisions for the coming years.