In [1]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
#!pip install kaggle
!pip install jupyterlab-hide-code
from IPython.display import HTML
df = pd.read_csv('C:/Users/N/DSPT11/EDA project - Energy Analysis - Sepide/Global Energy Data/global-data-on-sustainable-energy.csv')



In [2]:
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

In [74]:
df.head()

Unnamed: 0,Entity,Year,Access to electricity (% of population),Access to clean fuels for cooking,Renewable-electricity-generating-capacity-per-capita,Financial flows to developing countries (US $),Renewable energy share in the total final energy consumption (%),Electricity from fossil fuels (TWh),Electricity from nuclear (TWh),Electricity from renewables (TWh),...,Energy intensity level of primary energy (MJ/$2017 PPP GDP),Value_co2_emissions_kt_by_country,Renewables (% equivalent primary energy),gdp_growth,gdp_per_capita,Density\n(P/Km2),Land Area(Km2),Latitude,Longitude,country_code
0,Afghanistan,2000,1.613591,6.2,9.22,20000.0,44.99,0.16,0.0,0.31,...,1.64,760.0,,,,60,652230.0,33.93911,67.709953,AFG
1,Afghanistan,2001,4.074574,7.2,8.86,130000.0,45.6,0.09,0.0,0.5,...,1.74,730.0,,,,60,652230.0,33.93911,67.709953,AFG
2,Afghanistan,2002,9.409158,8.2,8.47,3950000.0,37.83,0.13,0.0,0.56,...,1.4,1029.999971,,,179.426579,60,652230.0,33.93911,67.709953,AFG
3,Afghanistan,2003,14.738506,9.5,8.09,25970000.0,36.66,0.31,0.0,0.63,...,1.4,1220.000029,,8.832278,190.683814,60,652230.0,33.93911,67.709953,AFG
4,Afghanistan,2004,20.064968,10.9,7.75,0.0,44.24,0.33,0.0,0.56,...,1.2,1029.999971,,1.414118,211.382074,60,652230.0,33.93911,67.709953,AFG


In [73]:
df.shape

(3648, 22)

In [75]:
df.isna().sum();

In [6]:
# dropping the row with NaN values in the Latitude or Longitude columns
df.dropna(subset=['Latitude', 'Longitude'], inplace=True)

In [76]:
df.shape;

Checked for Nan values, dropped one row which was for one country with just one observation in 2000 and no latitude/longitude.
Filled the missing values for financial aids to developing countries when missing, as it would make more sense to consider it as zero than a random number because for most countries it didn't follow a specific pattern. 

In [9]:
df['Financial flows to developing countries (US $)'] = df['Financial flows to developing countries (US $)'].fillna(0)

In [10]:
df.isna().sum()

Entity                                                                 0
Year                                                                   0
Access to electricity (% of population)                                9
Access to clean fuels for cooking                                    168
Renewable-electricity-generating-capacity-per-capita                 931
Financial flows to developing countries (US $)                         0
Renewable energy share in the total final energy consumption (%)     194
Electricity from fossil fuels (TWh)                                   21
Electricity from nuclear (TWh)                                       126
Electricity from renewables (TWh)                                     21
Low-carbon electricity (% electricity)                                42
Primary energy consumption per capita (kWh/person)                     0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)          206
Value_co2_emissions_kt_by_country                  

In [77]:
# which rows are empty for access to electricity 
nan_electricity = df[df['Access to electricity (% of population)'].isna()]['Entity']

I have checked the dataset and found out that the countries with missing values in "access to electricity" column had values close to 1% in their other non-missing close observations (next or previous year).
Therefore I filled the missing values with the 1%.

In [12]:
# Substitute NaN values in 'Access to electricity (% of population)' with 1
df['Access to electricity (% of population)'].fillna(1, inplace=True)

In [78]:
nan_clean_electricity_cooking = df[df['Access to clean fuels for cooking'].isna()]['Entity']

After checking the NaN values for 'Access to clean fuels for cooking' column, concluded that these countries have access to electricity above 97%, and for other countries that have access to electricity in that level, access to clean fuel for cooking is almost the same- or close. So I filled the missing values with 100%.

In [16]:
# substitute the Nan values for access to clean fuel for cooking with 100, as per other observations in other countries. 
df['Access to clean fuels for cooking'].fillna(100, inplace=True)

In [79]:
nan_electricity = df[df['Renewable energy share in the total final energy consumption (%)'].isna()]['Entity']

In [80]:
nan_electricity1 = df[df['Electricity from fossil fuels (TWh)'].isna()]['Entity']

Then I added a new column to the dataset using pycountry library of python, and the new column contains the SO 3166-1 alpha-3 codes for each country in the dataset. Although I haven't used this at the end!

In [68]:
import pycountry

# Function to get country code using pycountry
def get_country_code(country_name):
    try:
        return pycountry.countries.lookup(country_name).alpha_3
    except LookupError:
        # If the country name is not found in pycountry's list
        return None;

# Map the 'Entity' column to country codes
df['country_code'] = df['Entity'].apply(get_country_code)

# Save the result to a new CSV file
df.to_csv('global-data-on-sustainable-energy_with_codes.csv', index=False)

# Print the DataFrame to see the result
#print(df.head());
#df.shape;

Then using this cleaned dataset, I plotted the world map with some insights:

In [92]:
# Function to plot features on world map
def plot_world_map(column_name):
    fig = go.Figure()
    for year in range(2000, 2021):
        # Filter the data for the current year
        filtered_df = df[df['Year'] == year]
        custom_colorscale = [
            [0.0, "red"],    # Color for the lowest value
            [0.5, "yellow"], # Color for the midpoint
            [1.0, "green"]   # Color for the highest value
        ]
        # Create a choropleth trace for the current year
        trace = go.Choropleth(
            locations=filtered_df['Entity'],
            z=filtered_df[column_name],
            locationmode='country names',
            colorscale= custom_colorscale,  
            colorbar=dict(title=column_name),
            zmin=df[column_name].min(),
            zmax=df[column_name].max(),
            visible=False  # Set the trace to invisible initially
        )

        # Add the trace to the figure
        fig.add_trace(trace)

    # Set the first trace to visible
    fig.data[0].visible = True

    # Create animation steps
    steps = []
    for i in range(len(fig.data)):
        step = dict(
            method='update',
            args=[{'visible': [False] * len(fig.data)},  # Set all traces to invisible
                  {'title_text': f'{column_name} Map - {2000 + i}', 'frame': {'duration': 1000, 'redraw': True}}],
            label=str(2000 + i)  # Set the label for each step
        )
        step['args'][0]['visible'][i] = True  # Set the current trace to visible
        steps.append(step)
        # Create the slider
    sliders = [dict(
        active=0,
        steps=steps,
        currentvalue={"prefix": "Year: ", "font": {"size": 14}},  # Increase font size for slider label
    )]

    # Update the layout of the figure with increased size and change the template
    fig.update_layout(
        title_text=f'{column_name} in 2000 to 2020',  # Set the initial title
        title_font_size=24,  # Increase title font size
        title_x=0.5,  # Center the title
        geo=dict(
            showframe=True,
            showcoastlines=False,
            projection_type='natural earth'
        ),
        sliders=sliders,
        height=500,  # Set the height of the figure in pixels
        width=1000,  # Set the width of the figure in pixels
        font=dict(family='Arial', size=12),  # Customize font family and size for the whole figure
        margin=dict(t=80, l=50, r=50, b=50),  # Add margin for better layout spacing
        # Change the template to 'plotly_dark'
    )

    # Show the figure
    fig.show()

##Access to clean Fuel for cooking:
this is an important metric because traditional cooking methods in many parts of the world involve burning biomass
(wood, animal dung, crop waste) or coal, which are not clean fuels.

Clean fuels include:

Liquefied petroleum gas (LPG)
Natural gas
Biogas
Alcohol fuels (ethanol, methanol)
Electricity (if generated from renewable sources)

low carbon electricity:
Low carbon electricity refers to the production of electrical power with substantially lower amounts of carbon dioxide (CO2) emissions compared to typical fossil fuel-based power generation.
examples:
Hydropower
wind
solar
nuclear
geothermal
biomass
GDP: This stands for Gross Domestic Product, which is the total value of all goods and services produced over a specific time period within a nation's borders.
GDP per capita:
GDP per capita provides an average economic value per person, which is used as an indicator of the standard of living in a country. It is often used in comparisons between different countries or regions. A higher GDP per capita typically suggests a higher standard of living, although it does not account for wealth distribution within a country.

In [93]:
select_col = [
 'Access to electricity (% of population)',
 'Access to clean fuels for cooking',
 #'Renewable energy share in the total final energy consumption (%)',
 #'Electricity from fossil fuels (TWh)',
 #'Electricity from nuclear (TWh)',
 #'Electricity from renewables (TWh)',
 'Low-carbon electricity (% electricity)',
 #'Primary energy consumption per capita (kWh/person)',
 #'Energy intensity level of primary energy (MJ/$2017 PPP GDP)',
 'Value_co2_emissions_kt_by_country',
 #'Renewables (% equivalent primary energy)',
 #'Renewable-electricity-generating-capacity-per-capita',
 'Financial flows to developing countries (US $)',
 #'gdp_growth',
 'gdp_per_capita']

for i in select_col:
    column_name = i
    print(column_name)
    plot_world_map(column_name)

Access to electricity (% of population)


Access to clean fuels for cooking


Low-carbon electricity (% electricity)


Value_co2_emissions_kt_by_country


Financial flows to developing countries (US $)


gdp_per_capita


In [94]:
import plotly.express as px
# Filtering the DataFrame for the years 2000-2020
df_filtered = df[(df['Year'] >= 2000) & (df['Year'] <= 2020)]
# Mean GDP by country
gdp_mean_by_country = df_filtered.groupby('Entity')['gdp_per_capita'].mean()
# Sorting the series to find the top 20 countries
top_countries = gdp_mean_by_country.sort_values(ascending=False).head(25)
# Creating the bar chart using Plotly Express
fig = px.bar(top_countries, x=top_countries.index, y=top_countries.values,
             title="Top 20 Countries by GDP Average (2000-2020)",
             labels={'x': 'Country', 'y': 'GDP Average'},
             color=top_countries.values,
             color_continuous_scale=px.colors.sequential.Viridis)

# Show the plot
fig.show()


In [82]:
# Filter the DataFrame for the years 2000-2020
df_filtered = df[(df['Year'] >= 2000) & (df['Year'] <= 2020)].dropna(subset=['gdp_per_capita'])
# Sum the GDP by country
gdp_sum_by_country = df_filtered.groupby('Entity')['gdp_per_capita'].mean()
# Sort the series to find the top 20 countries
top_countries = gdp_sum_by_country.sort_values(ascending=True).head(20)
# Create the bar chart using Plotly Express
fig = px.bar(top_countries, x=top_countries.index, y=top_countries.values,
             title="Bottom 20 Countries by Mean of GDP (2000-2020)",
             labels={'x': 'Country', 'y': 'Mean of GDP'},
             color=top_countries.values,
             color_continuous_scale=px.colors.sequential.Viridis)

# Show the plot
#fig.show()

In [85]:
import pandas as pd
import plotly.express as px

# Filter the DataFrame for the years 2000-2020
df_filtered = df[(df['Year'] >= 2000) & (df['Year'] <= 2020)]

# Sum the GDP by country
co2_sum_by_country = df_filtered.groupby('Entity')['Value_co2_emissions_kt_by_country'].sum()

# Sort the series to find the top 20 countries
top_countries = co2_sum_by_country.sort_values(ascending=False).head(25)

# Create the bar chart using Plotly Express
fig = px.bar(top_countries, x=top_countries.index, y=top_countries.values,
             title="Top 20 Countries by CO2 Emission (2000-2020)",
             labels={'x': 'Country', 'y': 'Sum of CO2 Emission'},
             color=top_countries.values,
             color_continuous_scale=px.colors.sequential.Viridis)

# Show the plot
fig.show()


In [88]:
# Filter the DataFrame for the years 2000-2020
df_filtered = df[(df['Year'] >= 2000) & (df['Year'] <= 2020)]

# Sum the GDP by country
co2_sum_by_country = df_filtered.groupby('Entity')['Financial flows to developing countries (US $)'].sum()

# Sort the series to find the top 20 countries
top_countries = co2_sum_by_country.sort_values(ascending=False).head(25)

# Create the bar chart using Plotly Express
fig = px.bar(top_countries, x=top_countries.index, y=top_countries.values,
             title="Top 20 Countries by Sum of Financial Aids (US $) (2000-2020)",
             labels={'x': 'Country', 'y': 'Sum of Financial Aids'},
             color=top_countries.values,
             color_continuous_scale=px.colors.sequential.Viridis)

# Show the plot
fig.show()


When we talk about primary energy consumption per capita in kilowatt-hours, 
we're looking at the broad picture of energy use, which is important for understanding the energy needs of a population, 
the efficiency of energy use, and the potential environmental impact. It also provides a way to 
compare energy use across different countries or regions, irrespective of population size. 
Higher values indicate a greater level of energy use which is typically associated with higher levels of industrial activity 
and living standards but may also imply greater environmental impact if the energy comes from fossil fuels.

 "Energy intensity level of primary energy (MJ/$2017 PPP GDP)" is an
  indicator of the energy efficiency of a country's economy. 
  It shows how many megajoules of primary energy are used for each dollar of GDP (adjusted for PPP) produced. 
  A lower number indicates that a country is producing more with less energy, 
  which typically signifies a more energy-efficient and advanced economy. 
  Conversely, a higher number would suggest a less efficient use of energy in relation to economic output, 
  often seen in economies that are more reliant on energy-intensive industries or have outdated technology and infrastructure

This economic indicator gives a rough measure of the average economic prosperity or wealth of each person in that country. It is often used as an indicator of the standard of living, with higher GDP per capita figures suggesting higher standards of living. However, it's important to note that GDP per capita does not account for how income is distributed among the population, and a high GDP per capita does not necessarily mean that wealth is distributed evenly across the society.