# A Visual Exploration of GDP and Energy Supply

In [32]:
# ensure the visualizations render properly across VSCode, Jupyter Book, etc.
# https://plotly.com/python/renderers/

import plotly.io as pio

pio.renderers.default = "notebook_connected+plotly_mimetype"

When we think about what makes a country wealthy, and what powers its daily life and industries, two indicators immediately come to mind: GDP per capita and energy supply per capita. GDP per capita tells us how much economic value a country generates for each person. Energy supply per capita tells us how much energy each person “has” behind their lifestyle, infrastructure, and the economy as a whole.

When thinking about the relationship between these two variables, my initial assumption is that they should be positively related. Richer countries tend to use more energy, and greater energy availability can, in turn, support industrial activity and technological progress—both of which drive economic growth. But this is only a rough, intuitive guess. To know whether this pattern truly holds, I need real data rather than assumptions.

To explore this question, I selected two datasets that contain GDP per capita and energy supply per capita, extracted the relevant variables, and merged them into a single table. With this combined dataset, I then used visualizations to examine whether the actual data supports my initial intuition or reveals a different story altogether.

In [33]:
""" First, import the packages needed to load and process the datasets, 
    as well as the packages required for visualization"""
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# First Dataset: Energy supply per capita

The data on total energy supply per capita comes from the Energy Institute’s dataset: https://www.energyinst.org/statistical-review/resources-and-data-downloads

In [34]:
# Import the dataset that includes total energy supply per capita data
energy_supply_per_capita=pd.read_excel("Narrow format.xlsx")
energy_supply_per_capita


Unnamed: 0,Country,Year,ISO3166_alpha3,ISO3166_numeric,Region,SubRegion,OPEC,EU,OECD,CIS,Var,Value
0,Algeria,1965,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,biogeo_ej,0.000000
1,Algeria,1965,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,co2_combust_mtco2,5.568753
2,Algeria,1965,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,co2_combust_pc,0.007229
3,Algeria,1965,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,co2_combust_per_ej,4.242345
4,Algeria,1965,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,coalcons_ej,0.002931
...,...,...,...,...,...,...,...,...,...,...,...,...
293229,Zimbabwe,2023,ZWE,716.0,Africa,Eastern Africa,0.0,0.0,0.0,0.0,lithium_kt,14.900000
293230,Zimbabwe,2024,ZWE,716.0,Africa,Eastern Africa,0.0,0.0,0.0,0.0,coalprod_ej,0.156511
293231,Zimbabwe,2024,ZWE,716.0,Africa,Eastern Africa,0.0,0.0,0.0,0.0,coalprod_mt,5.797470
293232,Zimbabwe,2024,ZWE,716.0,Africa,Eastern Africa,0.0,0.0,0.0,0.0,lithium_kt,22.000000


In [35]:
"""According to the dataset dictionary, 
   in the column titled “Var,” the entries labeled "tes_gj_pc" refer to total energy supply per capita. 
   Therefore, we need to use "tes_gj_pc" as the identifier 
   to filter out all observations corresponding to total energy supply per capita."""
energy_supply_per_capita=energy_supply_per_capita[energy_supply_per_capita["Var"]=="tes_gj_pc"]
energy_supply_per_capita

Unnamed: 0,Country,Year,ISO3166_alpha3,ISO3166_numeric,Region,SubRegion,OPEC,EU,OECD,CIS,Var,Value
28,Algeria,1965,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,7.001426
58,Algeria,1966,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,8.331651
88,Algeria,1967,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,7.673287
118,Algeria,1968,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,7.992000
148,Algeria,1969,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,8.771010
...,...,...,...,...,...,...,...,...,...,...,...,...
292699,Vietnam,2020,VNM,704.0,Asia Pacific,Asia Pacific,0.0,0.0,0.0,0.0,tes_gj_pc,40.047866
292757,Vietnam,2021,VNM,704.0,Asia Pacific,Asia Pacific,0.0,0.0,0.0,0.0,tes_gj_pc,38.151021
292815,Vietnam,2022,VNM,704.0,Asia Pacific,Asia Pacific,0.0,0.0,0.0,0.0,tes_gj_pc,39.123704
292873,Vietnam,2023,VNM,704.0,Asia Pacific,Asia Pacific,0.0,0.0,0.0,0.0,tes_gj_pc,42.597552


In [36]:
# Check the number of countries recorded in the dataset
number_of_countries=energy_supply_per_capita["Country"].nunique()
number_of_countries

105

In [37]:
# Check whether the values we are plotting are stored as floats or ints rather than as objects.
energy_supply_per_capita.dtypes

Country             object
Year                 int64
ISO3166_alpha3      object
ISO3166_numeric    float64
Region              object
SubRegion           object
OPEC               float64
EU                 float64
OECD               float64
CIS                float64
Var                 object
Value              float64
dtype: object

In [38]:
# First, plot all 105 countries in a single graph to get an initial sense of the overall pattern.
fig = px.line(
    energy_supply_per_capita,
    x="Year",
    y="Value",
    color="Country",
    title="Total Energy Supply per Capita",
    labels={"Value": "Energy Supply per Capita (GJ)"}
)
fig.show()

From the chart, we can see that most countries have an energy supply per capita below 400 GJ per person. A few countries stand out with exceptionally high values that clearly deviate from the global majority. In addition, because the figure includes many countries and thus many overlapping lines, the overall trend is difficult to interpret directly.

Therefore, in the sections below, I focus on analyzing the countries with the most extreme values and comparing patterns across different groups of countries.

First, we focus on the most recent data—specifically the values for 2024—to examine the current global landscape of energy supply per capita. 

In [39]:
# we filter the dataset to include only the observations corresponding to the year 2024
supply_2024=energy_supply_per_capita[energy_supply_per_capita["Year"]==2024]
supply_2024=supply_2024[["Country","Value"]]
supply_2024

Unnamed: 0,Country,Value
2491,Algeria,58.376220
5883,Argentina,71.635142
9088,Australia,204.773316
11343,Austria,122.691568
13353,Azerbaijan,70.544209
...,...,...
284153,US,265.851326
285341,USSR,0.000000
287409,Uzbekistan,66.816470
290140,Venezuela,73.865693


In [40]:
# Plot higtogram of energy supply per capita in 2024 of countries to see the distribution
fig = px.histogram(supply_2024, x="Value")
fig.show()

Conclusion drawing from the histogram above:

Most countries' energy supply per capita falls within the range of 0 to 225 gigajoules per person. The distribution is strongly right-skewed, with the majority of countries clustered at low to moderate energy supply levels, and only a small number exhibiting extremely high values. The long right tail is likely driven by a handful of energy-rich or energy-intensive economies, including fossil-fuel exporters and high-latitude industrial nations with substantial heating and power demands.

Overall, the variation in energy supply across countries is substantial, with a few clear outliers that display exceptionally high levels of per-capita energy supply.

Let’s take a closer look at which countries exhibit exceptionally high levels of per-capita energy supply.

In [41]:
supply_2024_sorted = supply_2024.sort_values(by="Value", ascending=False)
supply_2024_sorted.head(5)

Unnamed: 0,Country,Value
70256,Iceland,787.648188
173021,Qatar,768.497595
183652,Singapore,649.191172
276609,United Arab Emirates,496.511503
98351,Kuwait,382.95144


The table above shows the five countries with the highest total energy supply per capita in 2024: Iceland, Qatar, Singapore, the United Arab Emirates, and Kuwait.

The countries with the highest per-capita energy supply all have energy-rich industrial structures (oil/gas extraction, refining, LNG processing, or aluminum smelting), and/or very small populations, which magnify per-capita values.

In addition, many of these economies are located in extreme climates—either very cold (Iceland) or very hot (Qatar, UAE, Kuwait, Singapore)—which further increases energy demand for heating or cooling.

Next, we rank all countries based on their average energy supply per capita over the entire period and identify the top 5% and bottom 5% groups. We then examine the long-term trends of these two sets of countries to compare how their energy supply has evolved over time.

In [42]:
# Calculate mean of value for each countries over time
avg_supply = energy_supply_per_capita.groupby("Country")["Value"].mean()
avg_supply

Country
Algeria                  37.162208
Argentina                62.547454
Australia               208.428584
Austria                 127.035174
Azerbaijan               49.448155
                           ...    
United Arab Emirates    446.716748
United Kingdom          146.846557
Uzbekistan               48.832140
Venezuela                89.220198
Vietnam                  12.961076
Name: Value, Length: 105, dtype: float64

In [43]:
# Select the bottom 5% countries: 
# step 1.find the value of bottom 5%(threshold)
bottom_cutoff = avg_supply.quantile(0.05)
bottom_cutoff

np.float64(8.257472962384258)

In [44]:
# fing the countries that lower than that threshold
bottom_countries = avg_supply[avg_supply < bottom_cutoff]
bottom_countries

Country
Bangladesh              3.879342
Other Africa            5.112582
Sri Lanka               8.058516
Total Eastern Africa    3.487266
Total Middle Africa     3.628547
Total Western Africa    4.782068
Name: Value, dtype: float64

In [45]:
# form a new series for bottom 5% countries in order to plot them
bottom_countries_supply = energy_supply_per_capita[energy_supply_per_capita["Country"].isin(bottom_countries.index)]

# draw a line plot
fig = px.line(
    bottom_countries_supply,
    x="Year",
    y="Value",
    color="Country",
    title="Total energy supply per capita over time by country, 5th percentile",
    labels={
        "Value": "Energy Supply per Capita (GJ)"},
)
fig.show()

In [46]:
# Select the top 5% countries
top_cutoff = avg_supply.quantile(0.95)
top_countries = avg_supply[avg_supply > top_cutoff]
top_countries

Country
Iceland                 397.419589
Kuwait                  351.007027
Luxembourg              347.610424
Qatar                   667.463563
Singapore               355.806392
United Arab Emirates    446.716748
Name: Value, dtype: float64

In [47]:
# form a new series for bottom 5% countries in order to plot them
top_countries_supply = energy_supply_per_capita[energy_supply_per_capita["Country"].isin(top_countries.index)]

# draw a line plot of them
fig = px.line(
    top_countries_supply,
    x="Year",
    y="Value",
    color="Country",
    title="Total energy supply per capita over time by country, 95th percentile",
    labels={
        "Value": "Energy Supply per Capita (GJ)"},
)
fig.show()

By comparing the long-term trends in energy supply per capita for the bottom 5% of countries and the top 5% from 1965 to 2024, we observe that although energy supply has generally increased worldwide, countries with high per-capita energy supply exhibit much more pronounced and volatile fluctuations over time. In contrast, countries with low per-capita energy supply show a steady, gradual upward trend with far fewer dramatic changes.

It might because high-energy-supply countries respond strongly to global commodity cycles, industrial fluctuations, and resource-sector shocks, whereas low-energy-supply countries move gradually with development and electrification.

# Second Dataset: GDP per capita

The dataset including GDP per capita is from World Bank website: https://datacatalog.worldbank.org/search/dataset/0037712/World-Development-Indicators

In [48]:
# Import the dataset that includes GDP per capita data
GDP_per_capita=pd.read_csv("WDICSV.csv")
GDP_per_capita.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Caribbean small states,CSS,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,,,,,,,...,96.71966,96.729311,96.759553,96.790254,96.818111,96.791763,96.7503,96.691523,,
1,Caribbean small states,CSS,"Access to electricity, urban (% of urban popul...",EG.ELC.ACCS.UR.ZS,,,,,,,...,98.75298,98.931607,99.048892,99.196899,99.370056,99.455285,99.504589,99.653604,99.631428,
2,Caribbean small states,CSS,Account ownership at a financial institution o...,FX.OWN.TOTL.OL.ZS,,,,,,,...,,,,,,,,,,
3,Caribbean small states,CSS,"Adjusted net enrollment rate, primary (% of pr...",SE.PRM.TENR,,,,,,,...,90.92441,90.48512,89.39624,88.92917,,,,,,
4,Caribbean small states,CSS,Adjusted net national income (constant 2015 US$),NY.ADJ.NNTY.KD,,,,,,,...,51421940000.0,,,,,,,,,


In [49]:
# Filter out all entries in the dataset that correspond to GDP per capita.
GDP_per_capita=GDP_per_capita[GDP_per_capita["Indicator Name"]=="GDP per capita (current US$)"]
GDP_per_capita

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
67,Caribbean small states,CSS,GDP per capita (current US$),NY.GDP.PCAP.CD,455.490442,486.571130,507.847334,530.733581,555.235937,584.032637,...,14402.472578,13297.744350,13637.120721,13815.326515,13938.202320,11925.757395,13833.309930,17355.836658,17582.263680,20173.956475
220,Early-demographic dividend,EAR,GDP per capita (current US$),NY.GDP.PCAP.CD,149.209963,158.356881,156.136437,167.994096,187.777251,198.677696,...,3219.688272,3269.984328,3505.106215,3515.766217,3543.323907,3250.749747,3765.510798,4113.000849,4313.648042,4510.252682
373,East Asia & Pacific (IDA & IBRD countries),TEA,GDP per capita (current US$),NY.GDP.PCAP.CD,91.380257,80.554314,72.520993,76.466923,86.194441,98.010308,...,6664.487165,6758.740155,7323.510232,8136.368818,8379.406869,8485.420258,10086.478516,10220.360504,10239.674334,10513.975566
526,Fragile and conflict affected situations,FCS,GDP per capita (current US$),NY.GDP.PCAP.CD,139.231374,142.317146,153.891190,172.060720,152.305627,163.674750,...,1815.532080,1635.507031,1706.443269,1823.312303,1893.882690,1685.097951,1757.356402,1897.949114,1722.021979,1571.800432
679,Heavily indebted poor countries (HIPC),HPC,GDP per capita (current US$),NY.GDP.PCAP.CD,132.984028,133.971715,142.244444,163.185416,148.031531,163.301341,...,917.999802,900.498036,939.745532,977.345901,984.570662,968.074022,1049.999875,1111.391858,1140.196070,1202.769794
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35563,Virgin Islands (U.S.),VIR,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,34007.352941,35324.974887,35365.069304,36663.208755,38633.529892,39787.374165,42571.077737,44320.909186,,
35716,West Bank and Gaza,PSE,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,3272.154324,3527.613824,3620.360487,3562.330943,3656.858271,3233.568638,3678.635657,3799.955270,3455.028529,2592.305912
35869,"Yemen, Rep.",YEM,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,1362.173812,975.359417,811.165970,633.887202,,,,,,
36022,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,221.559849,209.693206,202.281031,203.219451,229.979246,287.425476,...,1295.877887,1239.085279,1483.465773,1463.899979,1258.986198,951.644317,1127.160779,1447.123101,1330.727806,1235.084665


In [50]:
# Reshape the table from a wide format to a long format 
# so that Plotly can more easily read, interpret, and visualize the data
GDP_per_capita = (
    GDP_per_capita.drop(columns=["Indicator Name", "Indicator Code"])
    .melt(
        id_vars=["Country Name", "Country Code"],
        var_name="Year",
        value_name="GDP per Capita",
    )
    .dropna()
)

GDP_per_capita

Unnamed: 0,Country Name,Country Code,Year,GDP per Capita
0,Caribbean small states,CSS,1960,455.490442
1,Early-demographic dividend,EAR,1960,149.209963
2,East Asia & Pacific (IDA & IBRD countries),TEA,1960,91.380257
3,Fragile and conflict affected situations,FCS,1960,139.231374
4,Heavily indebted poor countries (HIPC),HPC,1960,132.984028
...,...,...,...,...
15397,Vanuatu,VUT,2024,3542.810716
15399,Viet Nam,VNM,2024,4717.290287
15401,West Bank and Gaza,PSE,2024,2592.305912
15403,Zambia,ZMB,2024,1235.084665


In [51]:
# Check whether “Year” and “GDP per capita” are in the appropriate formats for plotting.
GDP_per_capita.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12832 entries, 0 to 15404
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    12832 non-null  object 
 1   Country Code    12832 non-null  object 
 2   Year            12832 non-null  object 
 3   GDP per Capita  12832 non-null  float64
dtypes: float64(1), object(3)
memory usage: 501.2+ KB


In [52]:
""" From the results above, we can see that the Year variable is stored as an object, 
    which may cause issues when plotting. 
    Therefore, we need to convert this variable to an integer format."""
GDP_per_capita["Year"] = GDP_per_capita["Year"].astype(int)

GDP per capita is a widely used and well-studied indicator, and it is commonly visualized as a time-series to examine how a country’s level of economic development changes over time. In addition to this conventional approach, I wanted to explore the data from a different perspective.

I divided the timeline into roughly 10-year intervals and plotted the global distribution of GDP per capita for the years 1960, 1970, 1980, 1990, 2000, 2010, 2020, and 2024. By comparing these distribution plots across different decades, we can observe how the global income distribution — and the gap between high-income and low-income countries — has evolved over time.

In [53]:
# Select several years and make graph of GDP per capita distribution in these years
selected_years = [1960, 1970, 1980, 1990, 2000, 2010, 2020, 2024]
GDP_per_capita_subset = GDP_per_capita[GDP_per_capita["Year"].isin(selected_years)]
fig = px.histogram(
    GDP_per_capita_subset,
    x="GDP per Capita",
    facet_col="Year",                
    facet_col_wrap=3,                
    nbins=50,                       
    title="Global GDP per Capita Distribution (Selected Years)",
    labels={"GDP_per_capita": "GDP per Capita (USD)"}
)

fig.update_layout(
    height=500,
    width=1000,
)
fig.show()


As the histograms show, global GDP per capita has increased substantially from 1960 to 2024, but the growth has been highly uneven across countries. The right tail of the distribution becomes longer over time, indicating that high-income countries have pulled further ahead, while many low-income countries remain clustered at the lower end of the distribution.

The rise of a sizable middle-income group is evident. Meanwhile, a few countries emerge as extreme high-income outliers in recent decades. Overall, the distributional patterns suggest that although the world has become wealthier, income inequality between countries has widened.

# Merge two datasets together and analyze

In [54]:
# First, check whether the merge keys in the two datasets follow the same naming and formatting conventions. 
# If they do not match, the data needs to be cleaned before merging.
energy_supply_per_capita["Country"].unique()

array(['Algeria', 'Argentina', 'Australia', 'Austria', 'Azerbaijan',
       'Bangladesh', 'Belarus', 'Belgium', 'Brazil', 'Bulgaria', 'Canada',
       'Chile', 'China', 'China Hong Kong SAR', 'Colombia', 'Croatia',
       'Cyprus', 'Czech Republic', 'Denmark', 'Ecuador', 'Egypt',
       'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary',
       'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland',
       'Israel', 'Italy', 'Japan', 'Kazakhstan', 'Kuwait', 'Latvia',
       'Lithuania', 'Luxembourg', 'Malaysia', 'Mexico', 'Morocco',
       'Netherlands', 'New Zealand', 'North Macedonia', 'Norway', 'Oman',
       'Other Africa', 'Other Asia Pacific', 'Other Caribbean',
       'Other CIS', 'Other Europe', 'Other Middle East',
       'Other Northern Africa', 'Other S. & Cent. America',
       'Other South America', 'Other Southern Africa', 'Pakistan', 'Peru',
       'Philippines', 'Poland', 'Portugal', 'Qatar', 'Romania',
       'Russian Federation', 'Saudi Arabia', 'Singa

In [55]:
GDP_per_capita["Country Name"].unique()

array(['Caribbean small states', 'Early-demographic dividend',
       'East Asia & Pacific (IDA & IBRD countries)',
       'Fragile and conflict affected situations',
       'Heavily indebted poor countries (HIPC)', 'High income',
       'IBRD only', 'Late-demographic dividend',
       'Latin America & Caribbean',
       'Latin America & Caribbean (excluding high income)',
       'Latin America & the Caribbean (IDA & IBRD countries)',
       'Least developed countries: UN classification',
       'Low & middle income', 'Lower middle income', 'North America',
       'Post-demographic dividend', 'Pre-demographic dividend',
       'Upper middle income', 'Algeria', 'Argentina', 'Australia',
       'Austria', 'Bahamas, The', 'Bangladesh', 'Barbados', 'Belgium',
       'Belize', 'Benin', 'Bermuda', 'Bolivia', 'Botswana', 'Brazil',
       'Burkina Faso', 'Burundi', 'Cameroon', 'Canada',
       'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia',
       'Congo, Dem. Rep.', 'Congo, 

We found that a few country names in the energy-supply dataset are written as abbreviations. These abbreviations need to be replaced with their full country names before merging the data.

In [56]:
"""We need to replace all the "US" in energy dataset with "United States",
   in this way, they can be same in both datasets to merge"""

energy_supply_per_capita["Country"] = energy_supply_per_capita["Country"].replace("US", "United States")



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [57]:
# Merge Energy supply dataset and GDP dataset using the key of country name and year
merged_data = pd.merge(
    left=energy_supply_per_capita, 
    right=GDP_per_capita, 
    left_on=["Country","Year"], 
    right_on=["Country Name","Year"])
merged_data

Unnamed: 0,Country,Year,ISO3166_alpha3,ISO3166_numeric,Region,SubRegion,OPEC,EU,OECD,CIS,Var,Value,Country Name,Country Code,GDP per Capita
0,Algeria,1965,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,7.001426,Algeria,DZA,253.622060
1,Algeria,1966,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,8.331651,Algeria,DZA,241.448970
2,Algeria,1967,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,7.673287,Algeria,DZA,261.792442
3,Algeria,1968,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,7.992000,Algeria,DZA,292.436036
4,Algeria,1969,DZA,12.0,Africa,Northern Africa,1.0,0.0,0.0,0.0,tes_gj_pc,8.771010,Algeria,DZA,315.914656
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3721,Uzbekistan,2020,UZB,860.0,CIS,CIS,0.0,0.0,0.0,1.0,tes_gj_pc,56.988444,Uzbekistan,UZB,1978.280519
3722,Uzbekistan,2021,UZB,860.0,CIS,CIS,0.0,0.0,0.0,1.0,tes_gj_pc,59.685478,Uzbekistan,UZB,2258.519641
3723,Uzbekistan,2022,UZB,860.0,CIS,CIS,0.0,0.0,0.0,1.0,tes_gj_pc,60.099621,Uzbekistan,UZB,2578.666894
3724,Uzbekistan,2023,UZB,860.0,CIS,CIS,0.0,0.0,0.0,1.0,tes_gj_pc,59.396427,Uzbekistan,UZB,2878.968793


In [58]:
# Create a scatter plot using all countries and all years 
# Examine the overall relationship between the two variables
fig = px.scatter(
    merged_data,
    x="GDP per Capita",
    y="Value",
    color="Region",              
    title="GDP per Capita vs Energy Supply per Capita (Colored by Region)",
    labels={
        "GDP per capita": "GDP per Capita (USD)",
        "Value": "Energy Supply per Capita (GJ)",
        "Region": "Region"
    }
)

fig.show()


Conclusion from the scatter plot above:

Overall, energy supply per capita and GDP per capita show a broadly positive relationship, but the points in the scatter plot are widely dispersed, and many clear exceptions to this pattern exist. For example, in the upper part of the chart, there are several countries whose GDP per capita is not particularly high, yet their energy supply per capita is extremely elevated. Based on the color coding, these points correspond mainly to Middle Eastern countries, where abundant fossil-fuel resources lead to unusually high levels of energy availability and consumption.

In contrast, some countries exhibit high GDP per capita while maintaining relatively low levels of energy supply per capita. These countries are primarily in Europe. Thanks to decades of industrial development, technological advancement, and strong energy-efficiency policies, many European economies generate substantial economic output with comparatively modest energy use.

Most other regions generally follow the expected positive correlation: wealthier countries tend to have greater per-capita energy supply, while lower-income regions cluster at lower energy levels. This mix of patterns highlights not only the global link between income and energy availability but also the structural and regional differences that shape how countries convert energy into economic growth.

Next, we select several representative country from several regions and examine how their GDP per capita and energy supply per capita have evolved over time.
The selected countries are:

Middle East: Qatar

North America: United States
 
Asia Pacific: China 

In [59]:
"""To get a clearer sense of how energy supply and GDP move together over time, 
   I decided to plot both variables on the same chart using a dual-axis line plot. 
   This way, each country gets two lines—one for energy and one for GDP,
   so we can easily see whether they rise together, diverge, or behave in completely different ways.

   Since I needed to do this for several countries, 
   writing the same chunk of code over and over would have been painful.
   So instead, I wrapped everything into a function. 
   Now I can generate the same style of plot for any country with a single line of code. 
   Much cleaner, much smarter, and definitely much kinder to future-me."""

def plot_relationship(country_name):
    # First, filter the big merged dataset so we keep only the rows for this one country.
    # This gives us a clean, country-specific slice to plot.
    data_country = merged_data[merged_data["Country"] == country_name]

    # Create a figure that supports 2 y-axes.
    # This allows us to plot energy supply on one axis and GDP on the other without scaling issues.
    fig = make_subplots(specs=[[{"secondary_y": True}]])

    # Add the energy supply line (this will go on the left y-axis).
    fig.add_trace(
        go.Scatter(
            x=data_country["Year"],
            y=data_country["Value"],
            name="Energy supply per person over time",   # Legend label
        ),
        secondary_y=False,  # Tell Plotly this line belongs to the main y-axis
    )

    # Add the GDP line (this will go on the right y-axis).
    # Plotting it separately helps us compare trends without forcing them onto the same scale.
    fig.add_trace(
        go.Scatter(
            x=data_country["Year"],
            y=data_country["GDP per Capita"],
            name="GDP per Capita over time",
        ),
        secondary_y=True,  # This line belongs to the second y-axis
    )

    # Give the entire figure a clear title so the reader knows what country we're looking at.
    fig.update_layout(title_text=f"Energy supply per person vs. GDP per Capita, {country_name}")

    # Label the x-axis (horizontal axis).
    fig.update_xaxes(title_text="Year")

    # Label both y-axes so readers understand which line belongs to which variable.
    fig.update_yaxes(title_text="Energy Supply per Capita (GJ)", secondary_y=False)
    fig.update_yaxes(title_text="GDP per Capita (USD)", secondary_y=True)

    # Finally, display the figure.
    fig.show()


In [60]:
# Plot for Qatar
plot_relationship("Qatar")

Qatar’s per-capita energy supply and GDP both show large swings over time, even though the overall long-term trend for both variables is upward. This volatility makes sense once you remember that Qatar is a highly resource-dependent economy. Its economic performance is tightly linked to the energy sector—when energy production or exports fluctuate, GDP tends to move right along with it.

In [61]:
# Plot for US
plot_relationship("United States")

The United States shows a completely different pattern from Qatar. While GDP per capita rises steadily over time, energy supply per capita actually fluctuates and gradually declines. This suggests that the U.S. has managed to grow its economy without increasing per-person energy consumption—an indication of improved efficiency, technological advancement, and a shift toward a more service-oriented economy.

In [62]:
# Plot for China
plot_relationship("China")

China’s chart looks very different from the other two. Both GDP per capita and energy supply per capita explode upward after the early 2000s. GDP growth accelerates rapidly, and energy consumption follows almost the same trajectory.

# Conclusion
There is no single universal relationship between energy supply and GDP.
Instead, the relationship depends heavily on a country’s economic structure, natural resources, and stage of development.

# Key Takeaways:
1. Most countries use very little energy per person, while a few resource-rich or climate-extreme countries form a long right tail with exceptionally high values.

2. High-energy-supply countries are far more volatile, driven by commodity cycles and resource-sector fluctuations, whereas low-energy-supply countries show slow, steady growth.

3. Global GDP per capita has risen but unevenly. The distribution stretches over time, with high-income countries pulling ahead and inequality widening.

4. Energy supply and GDP generally move together, but the relationship varies by development model:

    - Qatar links tightly to energy exports,

    - The U.S. grows GDP with stable or declining energy use,

    - China grows both rapidly during industrialization.

5. No single pattern fits all countries. Energy use reflects each nation’s resources, climate, economic structure, and stage of development.