In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

In [3]:
df = pd.read_csv('life expectancy.csv')
df.head()

Unnamed: 0,Country Name,Country Code,Region,IncomeGroup,Year,Life Expectancy World Bank,Prevelance of Undernourishment,CO2,Health Expenditure %,Education Expenditure %,Unemployment,Corruption,Sanitation,Injuries,Communicable,NonCommunicable
0,Afghanistan,AFG,South Asia,Low income,2001,56.308,47.8,730.0,,,10.809,,,2179727.1,9689193.7,5795426.38
1,Angola,AGO,Sub-Saharan Africa,Lower middle income,2001,47.059,67.5,15960.0,4.483516,,4.004,,,1392080.71,11190210.53,2663516.34
2,Albania,ALB,Europe & Central Asia,Upper middle income,2001,74.288,4.9,3230.0,7.139524,3.4587,18.575001,,40.520895,117081.67,140894.78,532324.75
3,Andorra,AND,Europe & Central Asia,High income,2001,,,520.0,5.865939,,,,21.78866,1697.99,695.56,13636.64
4,United Arab Emirates,ARE,Middle East & North Africa,High income,2001,74.544,2.8,97200.0,2.48437,,2.493,,,144678.14,65271.91,481740.7


In [4]:
df.describe()

Unnamed: 0,Year,Life Expectancy World Bank,Prevelance of Undernourishment,CO2,Health Expenditure %,Education Expenditure %,Unemployment,Corruption,Sanitation,Injuries,Communicable,NonCommunicable
count,3306.0,3118.0,2622.0,3154.0,3126.0,2216.0,3002.0,975.0,2059.0,3306.0,3306.0,3306.0
mean,2010.0,69.748362,10.663654,157492.4,6.364059,4.589014,7.89076,2.860513,52.738785,1318219.0,4686289.0,7392488.0
std,5.478054,9.408154,11.285897,772641.5,2.842844,2.119165,6.270832,0.621343,30.126762,5214068.0,18437270.0,29326880.0
min,2001.0,40.369,2.5,10.0,1.263576,0.85032,0.1,1.0,2.377647,430.49,330.16,2481.82
25%,2005.0,63.642,2.5,2002.5,4.205443,3.136118,3.733,2.5,24.746007,62456.88,57764.75,318475.8
50%,2010.0,72.1685,6.2,10205.0,5.892352,4.371465,5.92,3.0,49.317481,245691.0,314769.3,1350146.0
75%,2015.0,76.809,14.775,58772.5,8.119166,5.519825,10.0975,3.25,80.278847,846559.1,2831636.0,3918468.0
max,2019.0,84.356341,70.9,10707220.0,24.23068,23.27,37.25,4.5,100.000004,55636760.0,268564600.0,324637800.0


In [5]:
#look for missing data
df.isnull().sum()

Country Name                         0
Country Code                         0
Region                               0
IncomeGroup                          0
Year                                 0
Life Expectancy World Bank         188
Prevelance of Undernourishment     684
CO2                                152
Health Expenditure %               180
Education Expenditure %           1090
Unemployment                       304
Corruption                        2331
Sanitation                        1247
Injuries                             0
Communicable                         0
NonCommunicable                      0
dtype: int64

Research Questions
After reviewing the rich existing literature on Life Expectancy, we realized the lack of concrete
research on understanding the impact of all-encompassing determinants that cover socio-economic and
environmental factors for SSA countries using Panel Data techniques. Hence, we tried to address this
inadequacy through our research. In this paper, we aim to have a better understanding of factors affecting
life expectancy in the SSA region for an efficient policy-making process and better allocation of funds
and resources in addressing the prevalence of low life expectancy in Sub-Saharan Africa. To achieve that
we attempt to answer the following questions in this research:

What’s the Impact of Expenditure on Health and Education (% of GDP) on Life Expectancy?
How does the prevalence of undernourishment and communicable disease Affect Life Expectancy?
Do factors like corruption and unemployment rate impact life expectancy? If yes, quantify
Increase in CO2 emissions decrease life expectancy? Is it significant?
Data
Main sources of data - World Bank Open Data & Our World in Data

Country - 174 countries - list

Country Code - 3-letter code

Region - region of the world country is located in

IncomeGroup - country's income class

Year - 2000-2019 (both included)

Life expectancy - data

Prevalence of Undernourishment (% of the population) - Prevalence of undernourishment is the
percentage of the population whose habitual food consumption is insufficient to provide the dietary
energy levels that are required to maintain a normally active and healthy life

Carbon dioxide emissions (kiloton) - Carbon dioxide emissions are those stemming from the burning
of fossil fuels and the manufacture of cement. They include carbon dioxide produced during the
consumption of solid, liquid, and gas fuels and gas flaring

Health Expenditure (% of GDP) - Level of current health expenditure expressed as a percentage of GDP. Estimates of current health expenditures include healthcare goods and services consumed during each year. This indicator does not include capital health expenditures such as buildings,
machinery, IT, and stocks of vaccines for emergencies or outbreaks

Education Expenditure (% of GDP) - General government expenditure on education (current,
capital, and transfers) is expressed as a percentage of GDP. It includes expenditures funded by
transfers from international sources to the government. General government usually refers to local,
regional, and central governments.

Unemployment (% total labor force) - Unemployment refers to the % share of the labor force that
is without work but available for and seeking employment

Corruption (CPIA rating) - Transparency, accountability, and corruption in the public sector assets
the extent to which the executive can be held accountable for its use of funds and for the results
of its actions by the electorate and by the legislature and judiciary, and the extent to which public employees within the executive are required to account for administrative decisions, use of resources,
and results obtained.

Sanitation

Disability-Adjusted Life Years (DALYs) due to Injuries - One DALY represents
the loss of the equivalent of one year of full health. DALYs for an injury or health
condition is the sum of the years of life lost due to premature mortality (YLLs) and the years
lived with a disability (YLDs) due to prevalent cases of the disease in a population

Disability-Adjusted Life Years (DALYs) due to Communicable diseases - One DALY represents
the loss of the equivalent of one year of full health. DALYs for a communicable disease or health
condition is the sum of the years of life lost due to premature mortality (YLLs) and the years
lived with a disability (YLDs) due to prevalent cases of the disease in a population

Disability-Adjusted Life Years (DALYs) due to Non-Communicable diseases - One DALY represents
the loss of the equivalent of one year of full health. DALYs for a non-communicable disease or health
condition is the sum of the years of life lost due to premature mortality (YLLs) and the years
lived with a disability (YLDs) due to prevalent cases of the disease in a population