About Dataset
This dataset simulates a set of key economic, social, and environmental indicators for 20 countries over the period from 2010 to 2019.
The dataset is designed to reflect typical World Bank metrics, which are used for analysis, policy-making, and forecasting. It includes the following variables:

Country Name: The country for which the data is recorded.
Year: The specific year of the observation (from 2010 to 2019).
GDP (USD): Gross Domestic Product in billions of US dollars, indicating the economic output of a country.
Population: The total population of the country in millions.
Life Expectancy (in years): The average life expectancy at birth for the country’s population.
Unemployment Rate (%): The percentage of the total labor force that is unemployed but actively seeking employment.
CO2 Emissions (metric tons per capita): The per capita carbon dioxide emissions, reflecting environmental impact.
Access to Electricity (% of population): The percentage of the population with access to electricity, representing infrastructure development.
Country:

Description: Name of the country for which the data is recorded.
Data Type: String
Example: "United States", "India", "Brazil"
Year:

Description: The year in which the data is observed.
Data Type: Integer
Range: 2010 to 2019
Example: 2012, 2015
GDP (USD):

Description: The Gross Domestic Product of the country in billions of US dollars, indicating the economic output.
Data Type: Float (billions of USD)
Example: 14200.56 (represents 14,200.56 billion USD)
Population:

Description: The total population of the country in millions.
Data Type: Float (millions of people)
Example: 331.42 (represents 331.42 million people)
Life Expectancy (in years):

Description: The average number of years a newborn is expected to live, assuming that current mortality rates remain constant throughout their life.
Data Type: Float (years)
Range: Typically between 50 and 85 years
Example: 78.5 years
Unemployment Rate (%):

Description: The percentage of the total labor force that is unemployed but actively seeking employment.
Data Type: Float (percentage)
Range: Typically between 2% and 25%
Example: 6.25%
CO2 Emissions (metric tons per capita):

Description: The amount of carbon dioxide emissions per person in the country, measured in metric tons.
Data Type: Float (metric tons)
Range: Typically between 0.5 and 20 metric tons per capita
Example: 4.32 metric tons per capita
Access to Electricity (%):

Description: The percentage of the population with access to electricity.
Data Type: Float (percentage)
Range: Typically between 50% and 100%
Example: 95.7%

In [51]:
import pandas as pd
import janitor
import re
import plotly.express as px

# Data import

In [52]:
#with open("../data/world_bank_dataset.csv", 'r') as file:
#    colnames = ['country', 'year', 'gdp', 'population', 'life_expectancy', 'unemployment_rate', 'co2', 'access_electricity']
#    data_word_bank = pd.read_csv(file, names=colnames,header=0)

with open("../data/World_Development.xlsx", 'rb') as file:
    data_word_dev = pd.read_excel(file)

data_word_dev.head()

Unnamed: 0,Country Name,Country Code,Time,Time Code,Access to electricity (% of population) [EG.ELC.ACCS.ZS],Adjusted savings: education expenditure (% of GNI) [NY.ADJ.AEDU.GN.ZS],"Automated teller machines (ATMs) (per 100,000 adults) [FB.ATM.TOTL.P5]","Birth rate, crude (per 1,000 people) [SP.DYN.CBRT.IN]","Claims on central government, etc. (% GDP) [FS.AST.CGOV.GD.ZS]","Commercial bank branches (per 100,000 adults) [FB.CBK.BRCH.P5]",...,Surface area (sq. km) [AG.SRF.TOTL.K2],"Survival to age 65, female (% of cohort) [SP.DYN.TO65.FE.ZS]","Survival to age 65, male (% of cohort) [SP.DYN.TO65.MA.ZS]",Trade (% of GDP) [NE.TRD.GNFS.ZS],UHC service coverage index [SH.UHC.SRVS.CV.XD],"Unemployment, female (% of female labor force) (modeled ILO estimate) [SL.UEM.TOTL.FE.ZS]","Unemployment, male (% of male labor force) (modeled ILO estimate) [SL.UEM.TOTL.MA.ZS]","Unemployment, total (% of total labor force) (modeled ILO estimate) [SL.UEM.TOTL.ZS]",Urban population (% of total population) [SP.URB.TOTL.IN.ZS],Voice and Accountability: Estimate [VA.EST]
0,Afghanistan,AFG,1960,YR1960,,,,50.34,5.016529,,...,,21.190631,17.743034,11.157027,,,,,8.401,
1,Afghanistan,AFG,1961,YR1961,,,,50.443,9.388664,,...,652860.0,21.819029,18.350272,12.55061,,,,,8.684,
2,Afghanistan,AFG,1962,YR1962,,,,50.57,12.093496,,...,652860.0,22.397017,18.878496,14.227644,,,,,8.976,
3,Afghanistan,AFG,1963,YR1963,,,,50.703,10.857987,,...,652860.0,22.973204,19.3959,26.035511,,,,,9.276,
4,Afghanistan,AFG,1964,YR1964,,,,50.831,13.899999,,...,652860.0,23.546267,19.941843,26.944448,,,,,9.586,


In [53]:
data_word_dev.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16960 entries, 0 to 16959
Data columns (total 63 columns):
 #   Column                                                                                                        Non-Null Count  Dtype  
---  ------                                                                                                        --------------  -----  
 0   Country Name                                                                                                  16960 non-null  object 
 1   Country Code                                                                                                  16960 non-null  object 
 2   Time                                                                                                          16960 non-null  int64  
 3   Time Code                                                                                                     16960 non-null  object 
 4   Access to electricity (% of population) [EG.ELC.ACCS

In [54]:
data_word_dev.describe()

Unnamed: 0,Time,Access to electricity (% of population) [EG.ELC.ACCS.ZS],Adjusted savings: education expenditure (% of GNI) [NY.ADJ.AEDU.GN.ZS],"Automated teller machines (ATMs) (per 100,000 adults) [FB.ATM.TOTL.P5]","Birth rate, crude (per 1,000 people) [SP.DYN.CBRT.IN]","Claims on central government, etc. (% GDP) [FS.AST.CGOV.GD.ZS]","Commercial bank branches (per 100,000 adults) [FB.CBK.BRCH.P5]","Compulsory education, duration (years) [SE.COM.DURS]",Control of Corruption: Estimate [CC.EST],"Death rate, crude (per 1,000 people) [SP.DYN.CDRT.IN]",...,Surface area (sq. km) [AG.SRF.TOTL.K2],"Survival to age 65, female (% of cohort) [SP.DYN.TO65.FE.ZS]","Survival to age 65, male (% of cohort) [SP.DYN.TO65.MA.ZS]",Trade (% of GDP) [NE.TRD.GNFS.ZS],UHC service coverage index [SH.UHC.SRVS.CV.XD],"Unemployment, female (% of female labor force) (modeled ILO estimate) [SL.UEM.TOTL.FE.ZS]","Unemployment, male (% of male labor force) (modeled ILO estimate) [SL.UEM.TOTL.MA.ZS]","Unemployment, total (% of total labor force) (modeled ILO estimate) [SL.UEM.TOTL.ZS]",Urban population (% of total population) [SP.URB.TOTL.IN.ZS],Voice and Accountability: Estimate [VA.EST]
count,16960.0,7348.0,12883.0,3812.0,16300.0,10853.0,3988.0,6071.0,4783.0,16282.0,...,15932.0,16695.0,16695.0,10698.0,1428.0,7752.0,7752.0,7752.0,16569.0,4850.0
mean,1991.5,80.758176,4.041467,42.080247,28.043431,8.631726,16.746426,9.436831,-0.024874,10.455614,...,5394758.0,69.582992,59.627697,72.502845,59.551821,9.159337,7.353507,7.919577,50.236006,-0.021062
std,18.473498,28.745226,2.460375,44.334681,12.876307,18.330783,19.170935,2.187316,1.000047,5.334836,...,15561520.0,17.170751,16.344252,51.003434,18.914353,6.95168,5.102,5.515419,24.771559,0.998757
min,1960.0,0.533899,0.247919,0.0,4.4,-192.52904,0.04,0.0,-1.936706,0.795,...,2.027,0.046306,0.000668,0.020999,11.0,0.147,0.045,0.1,2.077,-2.313395
25%,1975.75,68.435427,2.7,8.535,16.3,0.916163,4.95875,8.5,-0.791694,6.965136,...,21040.0,57.33748,47.880048,41.257562,45.0,4.31075,3.98775,4.152,29.973978,-0.850328
50%,1991.5,98.290939,3.743548,31.415,26.818501,6.973541,11.8,9.0,-0.253887,9.178311,...,207600.0,73.822139,61.516574,60.398081,63.0,6.847948,6.004749,6.388657,48.781,0.020768
75%,2007.25,100.0,4.816466,58.3375,39.57925,14.210991,22.68,11.0,0.666176,12.375,...,1267000.0,83.394009,71.925137,91.44186,76.0,12.36525,9.444972,10.4105,69.524,0.884255
max,2023.0,100.0,68.15293,324.17,58.121,272.697203,242.78,17.0,2.459118,103.534,...,140486900.0,96.676574,94.688532,863.195099,91.0,44.638,36.963,38.8,100.0,1.800992


In [55]:

data_word_dev.clean_names()

Unnamed: 0,country_name,country_code,time,time_code,access_to_electricity_%_of_population_[eg_elc_accs_zs],adjusted_savings_education_expenditure_%_of_gni_[ny_adj_aedu_gn_zs],automated_teller_machines_atms_per_100_000_adults_[fb_atm_totl_p5],birth_rate_crude_per_1_000_people_[sp_dyn_cbrt_in],claims_on_central_government_etc_%_gdp_[fs_ast_cgov_gd_zs],commercial_bank_branches_per_100_000_adults_[fb_cbk_brch_p5],...,surface_area_sq_km_[ag_srf_totl_k2],survival_to_age_65_female_%_of_cohort_[sp_dyn_to65_fe_zs],survival_to_age_65_male_%_of_cohort_[sp_dyn_to65_ma_zs],trade_%_of_gdp_[ne_trd_gnfs_zs],uhc_service_coverage_index_[sh_uhc_srvs_cv_xd],unemployment_female_%_of_female_labor_force_modeled_ilo_estimate_[sl_uem_totl_fe_zs],unemployment_male_%_of_male_labor_force_modeled_ilo_estimate_[sl_uem_totl_ma_zs],unemployment_total_%_of_total_labor_force_modeled_ilo_estimate_[sl_uem_totl_zs],urban_population_%_of_total_population_[sp_urb_totl_in_zs],voice_and_accountability_estimate_[va_est]
0,Afghanistan,AFG,1960,YR1960,,,,50.340000,5.016529,,...,,21.190631,17.743034,11.157027,,,,,8.401000,
1,Afghanistan,AFG,1961,YR1961,,,,50.443000,9.388664,,...,6.528600e+05,21.819029,18.350272,12.550610,,,,,8.684000,
2,Afghanistan,AFG,1962,YR1962,,,,50.570000,12.093496,,...,6.528600e+05,22.397017,18.878496,14.227644,,,,,8.976000,
3,Afghanistan,AFG,1963,YR1963,,,,50.703000,10.857987,,...,6.528600e+05,22.973204,19.395900,26.035511,,,,,9.276000,
4,Afghanistan,AFG,1964,YR1964,,,,50.831000,13.899999,,...,6.528600e+05,23.546267,19.941843,26.944448,,,,,9.586000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16955,World,WLD,2019,YR2019,90.193562,3.864939,40.73,17.817692,32.671524,10.995,...,1.347256e+08,81.938009,73.343691,56.497579,68.0,5.681180,5.532497,5.591542,55.627911,
16956,World,WLD,2020,YR2020,90.482703,3.832505,41.24,17.226614,40.189099,10.590,...,1.347320e+08,80.770399,71.668728,52.433936,,6.602767,6.603577,6.603279,56.061754,
16957,World,WLD,2021,YR2021,91.414096,3.824791,39.49,16.942325,41.642027,11.180,...,1.404869e+08,79.028974,69.531014,56.811601,68.0,6.210113,5.967561,6.064105,56.476512,
16958,World,WLD,2022,YR2022,,,,16.649626,32.473212,,...,,79.980073,70.860986,62.563549,,5.512472,5.106076,5.267477,56.899080,


## Clean colnames

In [56]:
data_word_dev.columns = [re.split(r'\(|\[|\:', col)[0].strip() for col in data_word_dev.columns]
data_word_dev = data_word_dev.clean_names()   

In [57]:
data_word_dev.columns

Index(['country_name', 'country_code', 'time', 'time_code',
       'access_to_electricity', 'adjusted_savings',
       'automated_teller_machines', 'birth_rate_crude',
       'claims_on_central_government_etc_', 'commercial_bank_branches',
       'compulsory_education_duration', 'control_of_corruption',
       'death_rate_crude', 'domestic_general_government_health_expenditure',
       'domestic_general_government_health_expenditure_per_capita',
       'domestic_general_government_health_expenditure_per_capita_ppp',
       'employment_to_population_ratio_15+_female',
       'employment_to_population_ratio_15+_male',
       'employment_to_population_ratio_15+_total',
       'exports_of_goods_and_services',
       'external_balance_on_goods_and_services',
       'fixed_broadband_subscriptions',
       'foreign_direct_investment_net_inflows',
       'foreign_direct_investment_net_inflows', 'gdp', 'gdp_growth',
       'gdp_per_capita', 'gdp_per_capita_growth', 'gdp_per_capita_ppp',
       