# Table of Content
1. Introduction
2. Data manipulation, data cleaning
3. Analysis
4. Conclusion
5. Appendix

# Exploring the Correlation Between Government Expenditures and Literacy Rate: The Role of Socio-Economic and Educational Factors

## Introduction

Education is a cornerstone of societal development and individual empowerment, profoundly influencing a nation's economic growth, social progress, and cultural enrichment. Among the various metrics of educational attainment, literacy rates stand out as a critical indicator, reflecting the ability of a population to engage effectively with written information and thus participate fully in modern society.

This study aims to explore the correlation between government expenditures on education and literacy rates, emphasizing the adult population (ages 25-64). Government expenditure on education, typically expressed as a percentage of GDP, is a crucial factor that can significantly impact the quality, accessibility, and breadth of educational opportunities available to a nation's citizens.

However, the relationship between government spending and literacy is not straightforward and can be influenced by various socio-economic and educational factors. To gain a more nuanced understanding, this analysis will incorporate several controlling variables (Z variables), including:

- Student-Teacher Ratio: This reflects the average number of students per teacher at a given level of education and is an essential indicator of educational quality.
- GDP per Capita: A standard measure of a country's economic performance and living standards, which can significantly impact educational funding and literacy.
- Completion Rate: The rate at which students complete a given level of education, which can influence overall literacy levels.
- Average Years of Schooling: This represents the average number of years of education received by people aged 25 and above and is a direct measure of educational attainment.

By controlling for these variables, the study seeks to isolate the effect of government educational expenditure on literacy rates. This approach acknowledges the complex interplay between economic capabilities, educational policies, and societal values in shaping literacy outcomes.

Through rigorous data analysis and statistical modeling, this project aims to provide insights into how government spending on education correlates with literacy rates and the extent to which other socio-economic and educational factors mediate this relationship. The findings could offer valuable guidance for policymakers and educators in prioritizing investments and strategies to enhance literacy and, by extension, socio-economic development.

# Data

The dataset initially comprised 47 indicators across 272 countries or regions, spanning 20 years, resulting in a total of 12,784 rows. The indicators encompass a range of educational, economic, and demographic metrics. To address the challenge of numerous missing values in yearly data, we calculated the average of each indicator across the 20-year period for each country. This approach not only mitigated the issue of missing data but also provided a more consolidated and long-term perspective of each indicator at the country level.

After the cleaning and transformation process, our dataset was refined to encompass 13 indicators for 60 countries. This reduction in the number of indicators and countries was a necessary step to ensure data quality and reliability, focusing on the most relevant and consistently reported metrics.

For our analysis, we've selected key indicators from the World Bank data, grouped into three main categories:

**Educational Indicators:**
- Literacy Rate, Population 25-64 Years, Both Sexes (%) (UIS.LR.AG25T64): Reflects the adult literacy level, essential for understanding everyday life.
- Student-Teacher Ratio in Primary Education: Indicates the average number of students per teacher, a proxy for educational quality.
- Completion Rate, Primary Education, Both Sexes (%): Measures the effectiveness of primary education systems in retaining students.
- Average Years of Total Schooling, Age 25+, Total: Represents the average educational attainment of the adult population.

**Economic Indicators:**
- Government Expenditure on Education as % of GDP (SE.XPD.TOTL.GD.ZS): Shows the government's financial commitment to education relative to the country's overall economic output.
- GDP per Capita (current US$): A key measure of economic performance and standard of living.

These indicators collectively provide insights into the relationship between government education spending, economic conditions, and educational outcomes, particularly literacy rates. The educational indicators focus on the quality and effectiveness of education systems, while the economic indicators reflect national priorities and capabilities concerning education.



In [1]:
# import libs
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from stargazer.stargazer import Stargazer
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [2]:
# import data
df = pd.read_csv('https://raw.githubusercontent.com/artyomashigov/da_2_term_project/main/literacy_rate_raw_data.csv')

In [3]:
# changing column names
df.rename(columns={
    'Country Name': 'country',
    'Barro-Lee: Average years of total schooling, age 25+, female': 'avg_ed_years_fem',
    'Barro-Lee: Average years of total schooling, age 25+, total': 'avg_ed_years',
    'Completion rate, primary education, both sexes (%)': 'prim_comp_rate',
    'Completion rate, primary education, female (%)': 'prim_comp_rate_fem',
    'Completion rate, primary education, male (%)': 'prim_comp_rate_male',
    'GDP per capita (current US$)': 'gdp_capita',
    'GDP per capita, PPP (current international $)': 'gdp_capita_ppp',
    'Government expenditure on education as % of GDP (%)': 'gov_exp',
    'Government expenditure on primary education as % of GDP (%)': 'gov_exp_prim',
    'Government expenditure on primary education, US$ (millions)': 'gov_exp_prim_mn_usd',
    'Literacy rate, population 25-64 years, both sexes (%)': 'lit_rate',
    'Literacy rate, population 25-64 years, female (%)': 'lit_rate_fem',
    'Literacy rate, population 25-64 years, male (%)': 'lit_rate_male',
    'Population growth (annual %)': 'pop_growth',
    'Pupil-qualified teacher ratio in primary education (headcount basis)': 'stu_teach_ratio'
}, inplace=True)


In [4]:
# drop 2 columns
df = df.drop(['gov_exp_prim', 'gov_exp_prim_mn_usd'], axis=1)
# drop missing values
df = df.dropna()
df = df.reset_index(drop = True)
#export clean data
df.to_csv('literacy_rate_cleaned_data.csv')

## Analysis