# Project 2

## Do Countries that Spend More on Education Use More Electricity per Capita?

By Abi Insani



### Introduction

**Dataset:**

I am using two World Bank datasets about:
- **Electricity Consumption per capita** (kWh per person)
- **Government Expenditure on Education** (% of GDP)

I am focusing on **G20 Countries**, which gives a mix of advanced and emerging economies.


**Research Question**

*Among G20 countries, is higher government spending on education (as a share of GDP) associated with higher electricity consumption per capita?*



**Hypothesis**

G20 countries with higher education spending (% of GDP) tend to have **higher electricity consumption per capita**, because more investment in education is associated with higher income, more industrial activity, and greater access to electricity.

*Note: The project does **not** try to prove causality — just to check if there is a visible relationship in the data.*


### I. Data Sources and Preparation

For this project, I use two World Bank indicators to explore whether higher education spending (% of GDP) is associated with higher electricity consumption per capita across G20 countries. The datasets come in the standard World Bank Excel format, which requires light preprocessing (e.g., skipping metadata rows, selecting country-year values). I focus only on G20 members and select a single reference year (such as 2019) based on the most complete overlapping data across both indicators. All cleaning and merging steps are done in Python with pandas.

*Datasets Used:*

**1. Electric power consumption (kWh per capita)** as **ELEC_CONS**

**2. Government expenditure on education, total (% of GDP)** as **EXP_EDU**


**Preparation Steps**

1. Load both Excel files (since World Bank datasets have markdown first 3 rows, we skip reading it).

2. Filter the data to G20 country codes only.

3. Convert both datasets into panel (long) format by vertical stacking.

4. Keep only the relevant columns (country name + country code + selected year + electricity consumption per capita + education expenditure).

5. Clean column names and merge the datasets using country codes.

Lets begin! First I need to import the necessary packages - pandas and plotly to analyze the data and display charts and other visualization tools

In [1]:
import pandas as pd
import plotly.express as px

### Step 1: Load both Excel files

Now we read the Electric power consumption (kWh per capita) as **elec_cons**:

In [49]:
elec_cons = pd.read_excel("ELEC_CONS_PER_CAPITA.xls", skiprows=3)
elec_cons.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Aruba,ABW,Electric power consumption (kWh per capita),EG.USE.ELEC.KH.PC,,,,,,,...,,,,,,,,,,
1,Africa Eastern and Southern,AFE,Electric power consumption (kWh per capita),EG.USE.ELEC.KH.PC,,,,,,,...,582.708405,568.703452,566.073368,568.141299,548.496602,512.766661,514.341833,501.466616,,
2,Afghanistan,AFG,Electric power consumption (kWh per capita),EG.USE.ELEC.KH.PC,,,,,,,...,,,,,,,,,,
3,Africa Western and Central,AFW,Electric power consumption (kWh per capita),EG.USE.ELEC.KH.PC,,,,,,,...,201.73466,215.380351,179.972422,182.920554,188.36169,193.378593,200.861531,203.999368,,
4,Angola,AGO,Electric power consumption (kWh per capita),EG.USE.ELEC.KH.PC,,,,,,,...,306.167407,331.6649,315.199297,370.736573,410.864566,437.653351,392.355835,392.507047,,


Now we read overnment expenditure on education, total (% of GDP)** as **exp_edu**:

In [50]:
exp_edu = pd.read_excel("GOV_EXP_ED_PERC_GDP.xls", skiprows=3)
exp_edu.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
0,Aruba,ABW,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,5.88827,5.49136,4.45582,4.548764,4.435037,,3.618558,,,
1,Africa Eastern and Southern,AFE,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,4.737919,4.692,4.43051,4.73975,4.51141,4.090565,4.368379,3.697668,3.962293,
2,Afghanistan,AFG,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,3.2558,4.54397,4.34319,,,,,,,
3,Africa Western and Central,AFW,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,3.13883,2.615035,3.29663,3.051252,3.047399,3.398741,3.096926,2.891687,3.21562,
4,Angola,AGO,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,3.486896,2.754937,2.466879,2.183513,2.073064,2.667447,2.297197,2.385359,2.512737,


### Step 2: Filter the data to G20 country codes only

The next step is we want to filter both datasets on G20 countries, which includes:
- Argentina
- Australia
- Brazil
- Canada
- China
- France
- Germany
- India
- Indonesia
- Italy
- Japan
- Mexico
- Russia
- Saudi Arabia
- South Africa
- South Korea
- Turkey
- European Union (as a group)

We display the filtered **elec_cons** dataset first:


In [51]:
g20_list = [
    "Argentina",
    "Australia",
    "Brazil",
    "Canada",
    "China",
    "France",
    "Germany",
    "India",
    "Indonesia",
    "Italy",
    "Japan",
    "Mexico",
    "Russian Federation",
    "Saudi Arabia",
    "South Africa",
    "Korea, Rep.",
    "Turkiye",
    "United Kingdom",
    "United States",
    "European Union",
]

elec_cons = elec_cons[elec_cons["Country Name"].isin(g20_list)]
exp_edu = exp_edu[exp_edu["Country Name"].isin(g20_list)]

elec_cons.head()
exp_edu.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024
9,Argentina,ARG,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,5.77611,5.54549,5.45432,4.87774,4.77165,5.2769,4.65393,4.79263,5.89754,
13,Australia,AUS,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,,,,,,5.38625,5.34109,5.05916,,
29,Brazil,BRA,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,6.24106,6.31404,6.32048,6.08851,5.96347,5.7715,5.49698,5.61923,,
35,Canada,CAN,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,4.73938,4.81642,4.95997,4.88898,4.77293,4.88795,4.7486,4.88022,,
40,China,CHN,"Government expenditure on education, total (% ...",SE.XPD.TOTL.GD.ZS,,,,,,,...,,,,,,,,,4.00128,


### Step 3: Convert both datasets into long format

In this step, we want to produce **One row** per **One year** of country data by using **Melt** function:


In [56]:
# Melt elec_cons
elec_cons_long = elec_cons.melt(
    id_vars=["Country Name", "Country Code"],
    var_name="Year",
    value_name="electricity_per_capita",
)
# Melt exp_edu
exp_edu_long = exp_edu.melt(
    id_vars=["Country Name", "Country Code"],
    var_name="Year",
    value_name="education_pct_gdp",
)

ValueError: value_name (electricity_per_capita) cannot match an element in the DataFrame columns.