# <span style="font-size: 1em">Exploring the Drivers of Modern Slavery</span><span style="font-size: 0.8em"> Assignment</span>
<h3>Practicum I Python 2022-2023</h3>
<h5>M.Sc. In Business Analytics (Part Time) 2022-2024 at Athens University of Economics and Business (A.U.E.B.)</h5>

---

> Student: Panagiotis G. Vaidomarkakis<br />
> Student I.D.: p2822203<br />
> Tutor: Panos Louridas, Associate Professor<br />
> Due Date: 01/05/2023

## Table Of Contents:
* [Importing Libraries](#first-bullet)
* [$1^{st}$ Question : Data Preprocessing](#q1)
* [$2^{nd}$ Question : Slavery Estimation Using All Features](#q2)
* [$3^{rd}$ Question : Slavery Estimation with Theory-based Features](#q3)
* [$4^{th}$ Question : Slavery Estimation with PCA-derived Features](#q4)

## Importing Libraries <a class="anchor" id="first-bullet"></a>
In the following lines, we will import all the nessecary liblaries in order to be able to execute all the following commands. <br> First, we will run a check to see if the PC containing this Jupiter Notebook file has all the necessary libraries and if it hasn't, it will automatically download them in order to import them:

In [1]:
import importlib
import subprocess

def install_library(lib):
    try:
        importlib.import_module(lib)
        print(f'{lib} is already installed.')
    except ImportError:
        print(f'{lib} is not installed. Installing now...')
        subprocess.call(['pip', 'install', lib])

libraries = ['pandas', 'numpy','scikit-learn', 'matplotlib', 'math', 'openpyxl', 'statsmodels']

for lib in libraries:
    install_library(lib)

pandas is already installed.
numpy is already installed.
scikit-learn is not installed. Installing now...
matplotlib is already installed.
math is already installed.
openpyxl is already installed.
statsmodels is not installed. Installing now...


In [2]:
import pandas as pd
import numpy as np
from collections import Counter
import sklearn as sk
import re
import matplotlib.pyplot as plt
import math
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import mean_absolute_error, r2_score
from sklearn.ensemble import RandomForestRegressor
import statsmodels.api as sm
from sklearn.decomposition import PCA

## $1^{st}$ Question : Data Preprocessing<a class="anchor" id="q1"></a>
I will use the *training dataset* to load the dataset.<br>


In [3]:
pd.set_option('display.max_columns', None)
training_df = pd.read_csv('training.csv')
training_df

Unnamed: 0,Country,Data_year,Region,KOF_Globalis,Work_rightCIRI,Trade_open,FDI,VDEM_Libdem,GDPpc_2016,Armedcon,Asia,Subsah_Africa,Americas,Europe,M_East_N_Africa,Russia_Eurasia,Pol_terror,SLAVERY,SDGI_2016,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill_2018,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,AIDS_death_2016,AIDS_death_2018,AIDS_Orph_2016,AIDS_Orph_2018,Physrights_indx_2011,Extrajud_kill_2011,Pol_impris_2011,Torture_2011,Polrights_indx_2011,Free_assem_2011,Freemv_foreign_2011,Freemv_dom_2011,Free_speech_2011,Free_polit_2011,Relig_freeCIRI_2011,Work_rightCIRI_2011,Econ_right_F_2011,Pol_right_F_2011,Indep_judic_2011,Rape_prev_2018,Rape_report_2015,Rape_enclave_2015,Rape_compl_2018,Phys_secF_2014,Phys_secF_2019,Gender_equal_2015,Hum_traff_2019
0,Afghanistan,2018,ASIA,38.57,1,55.92,0.48,0.24,570,1,1,0,0,0,0,0,5.0,2.22,36.50,,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,,9.27439,57.509158,22.612086,,27.700001,31.0400,31.900000,55.299999,,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,,6.39,,27.820000,31.248380,0.000000,0.12,0.425262,,,,,,6.250000,0.862703,0.835789,11.0,,6.5,76,,33.716116,37.4,0.134000,1,1,0.014739,1,0.77,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500.0,0.3,51.5,0.4,500.0,500.0,4700.0,5400.0,4,0,2,0,4,1,0,0,1,1,0,1,0,2,0,4.0,4.0,2.0,17.0,4.0,4.0,2.0,3.0
1,Argentina,2018,AMERICAS,63.02,1,26.12,0.59,0.61,11970,0,0,0,1,0,0,0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67,52,6.3,6.574,24.0,12.500000,94,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147,2.876021,42.929606,99.5,0.005612,0,1,0.016886,0,0.26,0.000000,1,4,4,0,1,1,0.954939,0,0,-3.221412,1,74900.0,5.4,96.3,,1700.0,1700.0,30000.0,31000.0,5,1,2,0,12,2,2,2,1,2,2,1,2,3,1,1.0,4.0,0.0,9.0,3.0,2.0,4.0,1.0
2,Armenia,2018,RUSSIA AND EURASIA,67.09,1,75.92,3.21,0.23,3770,0,0,0,0,0,0,1,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62,25,7.4,4.350,45.0,14.100000,93,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,,,,,,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164,3.958415,83.951898,99.6,0.005065,0,0,0.053967,1,0.13,0.000000,1,4,3,0,1,1,0.872840,0,0,-1.867683,0,4600.0,0.6,99.1,14.3,200.0,200.0,,,4,1,1,0,6,1,1,1,1,1,0,1,1,2,0,0.0,3.0,0.0,9.0,4.0,3.0,0.0,3.0
3,Bangladesh,2016,ASIA,45.54,0,37.95,1.05,0.16,1330,1,1,0,0,0,0,0,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61,176,23.3,4.694,227.0,37.599998,89,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42,3.458803,80.262334,30.5,0.047270,0,1,0.052241,1,0.05,3.871201,1,4,2,0,1,0,0.541176,0,1,-2.852130,0,140000.0,0.2,66.7,2.1,500.0,1000.0,3800.0,4500.0,2,0,1,0,7,1,2,1,1,1,1,0,1,2,0,1.0,4.0,1.0,12.0,4.0,4.0,4.0,3.0
4,Bolivia,2016,AMERICAS,57.74,1,56.40,1.54,0.40,3070,0,0,0,1,0,0,0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59,206,19.6,5.890,120.0,38.400002,94,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,,,,,,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140,3.439012,43.898827,75.8,0.004741,0,0,0.027534,0,0.74,0.000000,1,4,4,0,1,1,0.786002,0,1,-2.344236,1,13500.0,4.3,88.9,4.8,1000.0,1000.0,20000.0,19000.0,5,1,1,1,11,1,2,2,1,2,2,1,1,3,0,3.0,3.0,0.0,10.0,3.0,3.0,0.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,2018,MIDDLE EAST AND NORTH AFRICA,64.86,1,91.44,1.48,0.66,3690,0,0,0,0,0,1,0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66,62,8.2,4.739,33.0,14.000000,98,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199,4.351138,61.795845,99.2,0.004354,0,0,0.009617,0,0.04,0.000000,1,4,3,0,1,1,0.800101,0,0,-1.737197,1,25000.0,1.2,58.3,,100.0,100.0,,,2,0,0,0,7,1,2,0,1,1,0,2,1,2,0,4.0,4.0,2.0,15.0,3.0,4.0,4.0,2.0
66,Uganda,2018,SUBSAHARAN AFRICA,52.66,0,47.22,2.60,0.28,630,1,0,1,0,0,0,0,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50,343,18.7,3.931,161.0,54.599998,78,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,,,,,,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97,3.891180,46.035296,29.9,0.030420,0,1,0.025691,0,0.93,5.463832,1,4,3,0,1,1,0.579077,0,1,-2.605993,0,44300.0,85.0,69.4,,26000.0,23000.0,1000000.0,950000.0,2,1,0,0,7,0,2,2,1,1,1,0,0,3,1,1.0,3.0,2.0,12.0,3.0,3.0,3.0,2.0
67,Ukraine,2018,RUSSIA AND EURASIA,70.60,0,104.81,3.69,0.22,2310,1,0,0,0,0,0,1,4.0,0.67,66.39,0.000000,4.4008,3.7,,0.3,63,24,5.5,4.681,94.0,9.000000,76,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305,2.947390,44.265651,99.8,0.016890,0,0,0.013978,1,0.47,4.143135,1,3,3,0,1,1,0.684342,0,1,-1.712700,0,86600.0,5.2,93.9,3.1,8300.0,6100.0,56000.0,61000.0,4,1,1,0,10,1,2,2,1,1,2,1,2,2,0,1.0,3.0,2.0,12.0,3.0,4.0,0.0,2.0
68,Vietnam,2016,ASIA,64.27,0,184.69,6.14,0.20,2100,0,1,0,0,0,0,0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66,54,11.4,5.360,140.0,21.700001,95,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145,3.909754,60.553253,95.0,0.014510,0,1,0.048466,1,0.24,0.000000,1,0,2,0,0,0,0.559768,0,0,-0.991591,0,71900.0,3.6,85.4,2.1,5300.0,4700.0,94000.0,94000.0,3,1,0,0,1,0,0,1,0,0,0,0,1,2,0,,2.0,0.0,,3.0,3.0,2.0,3.0


First, we need to delete some dummy variables that we have.

In [4]:
training_df.drop(['Asia','Subsah_Africa','Americas','Europe','M_East_N_Africa','Russia_Eurasia'],axis=1,inplace=True)
training_df

Unnamed: 0,Country,Data_year,Region,KOF_Globalis,Work_rightCIRI,Trade_open,FDI,VDEM_Libdem,GDPpc_2016,Armedcon,Pol_terror,SLAVERY,SDGI_2016,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill_2018,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,AIDS_death_2016,AIDS_death_2018,AIDS_Orph_2016,AIDS_Orph_2018,Physrights_indx_2011,Extrajud_kill_2011,Pol_impris_2011,Torture_2011,Polrights_indx_2011,Free_assem_2011,Freemv_foreign_2011,Freemv_dom_2011,Free_speech_2011,Free_polit_2011,Relig_freeCIRI_2011,Work_rightCIRI_2011,Econ_right_F_2011,Pol_right_F_2011,Indep_judic_2011,Rape_prev_2018,Rape_report_2015,Rape_enclave_2015,Rape_compl_2018,Phys_secF_2014,Phys_secF_2019,Gender_equal_2015,Hum_traff_2019
0,Afghanistan,2018,ASIA,38.57,1,55.92,0.48,0.24,570,1,5.0,2.22,36.50,,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,,9.27439,57.509158,22.612086,,27.700001,31.0400,31.900000,55.299999,,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,,6.39,,27.820000,31.248380,0.000000,0.12,0.425262,,,,,,6.250000,0.862703,0.835789,11.0,,6.5,76,,33.716116,37.4,0.134000,1,1,0.014739,1,0.77,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500.0,0.3,51.5,0.4,500.0,500.0,4700.0,5400.0,4,0,2,0,4,1,0,0,1,1,0,1,0,2,0,4.0,4.0,2.0,17.0,4.0,4.0,2.0,3.0
1,Argentina,2018,AMERICAS,63.02,1,26.12,0.59,0.61,11970,0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67,52,6.3,6.574,24.0,12.500000,94,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147,2.876021,42.929606,99.5,0.005612,0,1,0.016886,0,0.26,0.000000,1,4,4,0,1,1,0.954939,0,0,-3.221412,1,74900.0,5.4,96.3,,1700.0,1700.0,30000.0,31000.0,5,1,2,0,12,2,2,2,1,2,2,1,2,3,1,1.0,4.0,0.0,9.0,3.0,2.0,4.0,1.0
2,Armenia,2018,RUSSIA AND EURASIA,67.09,1,75.92,3.21,0.23,3770,0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62,25,7.4,4.350,45.0,14.100000,93,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,,,,,,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164,3.958415,83.951898,99.6,0.005065,0,0,0.053967,1,0.13,0.000000,1,4,3,0,1,1,0.872840,0,0,-1.867683,0,4600.0,0.6,99.1,14.3,200.0,200.0,,,4,1,1,0,6,1,1,1,1,1,0,1,1,2,0,0.0,3.0,0.0,9.0,4.0,3.0,0.0,3.0
3,Bangladesh,2016,ASIA,45.54,0,37.95,1.05,0.16,1330,1,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61,176,23.3,4.694,227.0,37.599998,89,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42,3.458803,80.262334,30.5,0.047270,0,1,0.052241,1,0.05,3.871201,1,4,2,0,1,0,0.541176,0,1,-2.852130,0,140000.0,0.2,66.7,2.1,500.0,1000.0,3800.0,4500.0,2,0,1,0,7,1,2,1,1,1,1,0,1,2,0,1.0,4.0,1.0,12.0,4.0,4.0,4.0,3.0
4,Bolivia,2016,AMERICAS,57.74,1,56.40,1.54,0.40,3070,0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59,206,19.6,5.890,120.0,38.400002,94,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,,,,,,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140,3.439012,43.898827,75.8,0.004741,0,0,0.027534,0,0.74,0.000000,1,4,4,0,1,1,0.786002,0,1,-2.344236,1,13500.0,4.3,88.9,4.8,1000.0,1000.0,20000.0,19000.0,5,1,1,1,11,1,2,2,1,2,2,1,1,3,0,3.0,3.0,0.0,10.0,3.0,3.0,0.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,2018,MIDDLE EAST AND NORTH AFRICA,64.86,1,91.44,1.48,0.66,3690,0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66,62,8.2,4.739,33.0,14.000000,98,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199,4.351138,61.795845,99.2,0.004354,0,0,0.009617,0,0.04,0.000000,1,4,3,0,1,1,0.800101,0,0,-1.737197,1,25000.0,1.2,58.3,,100.0,100.0,,,2,0,0,0,7,1,2,0,1,1,0,2,1,2,0,4.0,4.0,2.0,15.0,3.0,4.0,4.0,2.0
66,Uganda,2018,SUBSAHARAN AFRICA,52.66,0,47.22,2.60,0.28,630,1,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50,343,18.7,3.931,161.0,54.599998,78,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,,,,,,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97,3.891180,46.035296,29.9,0.030420,0,1,0.025691,0,0.93,5.463832,1,4,3,0,1,1,0.579077,0,1,-2.605993,0,44300.0,85.0,69.4,,26000.0,23000.0,1000000.0,950000.0,2,1,0,0,7,0,2,2,1,1,1,0,0,3,1,1.0,3.0,2.0,12.0,3.0,3.0,3.0,2.0
67,Ukraine,2018,RUSSIA AND EURASIA,70.60,0,104.81,3.69,0.22,2310,1,4.0,0.67,66.39,0.000000,4.4008,3.7,,0.3,63,24,5.5,4.681,94.0,9.000000,76,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305,2.947390,44.265651,99.8,0.016890,0,0,0.013978,1,0.47,4.143135,1,3,3,0,1,1,0.684342,0,1,-1.712700,0,86600.0,5.2,93.9,3.1,8300.0,6100.0,56000.0,61000.0,4,1,1,0,10,1,2,2,1,1,2,1,2,2,0,1.0,3.0,2.0,12.0,3.0,4.0,0.0,2.0
68,Vietnam,2016,ASIA,64.27,0,184.69,6.14,0.20,2100,0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66,54,11.4,5.360,140.0,21.700001,95,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145,3.909754,60.553253,95.0,0.014510,0,1,0.048466,1,0.24,0.000000,1,0,2,0,0,0,0.559768,0,0,-0.991591,0,71900.0,3.6,85.4,2.1,5300.0,4700.0,94000.0,94000.0,3,1,0,0,1,0,0,1,0,0,0,0,1,2,0,,2.0,0.0,,3.0,3.0,2.0,3.0


Now, we want to find the columns that we have pairs for different years.

In [5]:
year_cols = [col for col in training_df.columns if re.match(r'^.*_\d{4}$', col)]

# Remove year from each string
lst = [re.sub(r'_\d+$', '', s) for s in year_cols]

# Count occurrences of each string
counts = Counter(lst)

# Keep only the duplicates
prefixes = [s for s in counts if counts[s] > 1]
print(prefixes)

['AIDS_death', 'AIDS_Orph', 'Phys_secF']


For the above columns, we only need to keep the values of the closest year.

In [6]:
for col in training_df.columns:
    for prefix in prefixes:
        if col.startswith(prefix):
            year = col.split('_')[-1] # extract the year
            if int(year) <= 2016:
                training_df.loc[training_df['Data_year']==2016, prefix] = training_df[col] # fill in the value for 2016
            elif int(year) > 2016:
                training_df.loc[training_df['Data_year']==2018, prefix] = training_df[col] # fill in the value for 2018

training_df.drop([col for col in training_df.columns if any(col.startswith(prefix) and col.endswith('_20') for prefix in prefixes) and col not in prefixes], axis=1, inplace=True)

training_df

Unnamed: 0,Country,Data_year,Region,KOF_Globalis,Work_rightCIRI,Trade_open,FDI,VDEM_Libdem,GDPpc_2016,Armedcon,Pol_terror,SLAVERY,SDGI_2016,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill_2018,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,AIDS_death_2016,AIDS_death_2018,AIDS_Orph_2016,AIDS_Orph_2018,Physrights_indx_2011,Extrajud_kill_2011,Pol_impris_2011,Torture_2011,Polrights_indx_2011,Free_assem_2011,Freemv_foreign_2011,Freemv_dom_2011,Free_speech_2011,Free_polit_2011,Relig_freeCIRI_2011,Work_rightCIRI_2011,Econ_right_F_2011,Pol_right_F_2011,Indep_judic_2011,Rape_prev_2018,Rape_report_2015,Rape_enclave_2015,Rape_compl_2018,Phys_secF_2014,Phys_secF_2019,Gender_equal_2015,Hum_traff_2019,AIDS_death,AIDS_Orph,Phys_secF
0,Afghanistan,2018,ASIA,38.57,1,55.92,0.48,0.24,570,1,5.0,2.22,36.50,,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,,9.27439,57.509158,22.612086,,27.700001,31.0400,31.900000,55.299999,,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,,6.39,,27.820000,31.248380,0.000000,0.12,0.425262,,,,,,6.250000,0.862703,0.835789,11.0,,6.5,76,,33.716116,37.4,0.134000,1,1,0.014739,1,0.77,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500.0,0.3,51.5,0.4,500.0,500.0,4700.0,5400.0,4,0,2,0,4,1,0,0,1,1,0,1,0,2,0,4.0,4.0,2.0,17.0,4.0,4.0,2.0,3.0,500.0,5400.0,4.0
1,Argentina,2018,AMERICAS,63.02,1,26.12,0.59,0.61,11970,0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67,52,6.3,6.574,24.0,12.500000,94,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147,2.876021,42.929606,99.5,0.005612,0,1,0.016886,0,0.26,0.000000,1,4,4,0,1,1,0.954939,0,0,-3.221412,1,74900.0,5.4,96.3,,1700.0,1700.0,30000.0,31000.0,5,1,2,0,12,2,2,2,1,2,2,1,2,3,1,1.0,4.0,0.0,9.0,3.0,2.0,4.0,1.0,1700.0,31000.0,2.0
2,Armenia,2018,RUSSIA AND EURASIA,67.09,1,75.92,3.21,0.23,3770,0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62,25,7.4,4.350,45.0,14.100000,93,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,,,,,,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164,3.958415,83.951898,99.6,0.005065,0,0,0.053967,1,0.13,0.000000,1,4,3,0,1,1,0.872840,0,0,-1.867683,0,4600.0,0.6,99.1,14.3,200.0,200.0,,,4,1,1,0,6,1,1,1,1,1,0,1,1,2,0,0.0,3.0,0.0,9.0,4.0,3.0,0.0,3.0,200.0,,3.0
3,Bangladesh,2016,ASIA,45.54,0,37.95,1.05,0.16,1330,1,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61,176,23.3,4.694,227.0,37.599998,89,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42,3.458803,80.262334,30.5,0.047270,0,1,0.052241,1,0.05,3.871201,1,4,2,0,1,0,0.541176,0,1,-2.852130,0,140000.0,0.2,66.7,2.1,500.0,1000.0,3800.0,4500.0,2,0,1,0,7,1,2,1,1,1,1,0,1,2,0,1.0,4.0,1.0,12.0,4.0,4.0,4.0,3.0,500.0,3800.0,4.0
4,Bolivia,2016,AMERICAS,57.74,1,56.40,1.54,0.40,3070,0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59,206,19.6,5.890,120.0,38.400002,94,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,,,,,,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140,3.439012,43.898827,75.8,0.004741,0,0,0.027534,0,0.74,0.000000,1,4,4,0,1,1,0.786002,0,1,-2.344236,1,13500.0,4.3,88.9,4.8,1000.0,1000.0,20000.0,19000.0,5,1,1,1,11,1,2,2,1,2,2,1,1,3,0,3.0,3.0,0.0,10.0,3.0,3.0,0.0,3.0,1000.0,20000.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,2018,MIDDLE EAST AND NORTH AFRICA,64.86,1,91.44,1.48,0.66,3690,0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66,62,8.2,4.739,33.0,14.000000,98,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199,4.351138,61.795845,99.2,0.004354,0,0,0.009617,0,0.04,0.000000,1,4,3,0,1,1,0.800101,0,0,-1.737197,1,25000.0,1.2,58.3,,100.0,100.0,,,2,0,0,0,7,1,2,0,1,1,0,2,1,2,0,4.0,4.0,2.0,15.0,3.0,4.0,4.0,2.0,100.0,,4.0
66,Uganda,2018,SUBSAHARAN AFRICA,52.66,0,47.22,2.60,0.28,630,1,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50,343,18.7,3.931,161.0,54.599998,78,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,,,,,,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97,3.891180,46.035296,29.9,0.030420,0,1,0.025691,0,0.93,5.463832,1,4,3,0,1,1,0.579077,0,1,-2.605993,0,44300.0,85.0,69.4,,26000.0,23000.0,1000000.0,950000.0,2,1,0,0,7,0,2,2,1,1,1,0,0,3,1,1.0,3.0,2.0,12.0,3.0,3.0,3.0,2.0,23000.0,950000.0,3.0
67,Ukraine,2018,RUSSIA AND EURASIA,70.60,0,104.81,3.69,0.22,2310,1,4.0,0.67,66.39,0.000000,4.4008,3.7,,0.3,63,24,5.5,4.681,94.0,9.000000,76,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305,2.947390,44.265651,99.8,0.016890,0,0,0.013978,1,0.47,4.143135,1,3,3,0,1,1,0.684342,0,1,-1.712700,0,86600.0,5.2,93.9,3.1,8300.0,6100.0,56000.0,61000.0,4,1,1,0,10,1,2,2,1,1,2,1,2,2,0,1.0,3.0,2.0,12.0,3.0,4.0,0.0,2.0,6100.0,61000.0,4.0
68,Vietnam,2016,ASIA,64.27,0,184.69,6.14,0.20,2100,0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66,54,11.4,5.360,140.0,21.700001,95,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145,3.909754,60.553253,95.0,0.014510,0,1,0.048466,1,0.24,0.000000,1,0,2,0,0,0,0.559768,0,0,-0.991591,0,71900.0,3.6,85.4,2.1,5300.0,4700.0,94000.0,94000.0,3,1,0,0,1,0,0,1,0,0,0,0,1,2,0,,2.0,0.0,,3.0,3.0,2.0,3.0,5300.0,94000.0,3.0


Now that we created, we new columns, we need to remove the old ones.

In [7]:
pattern = r'^(' + '|'.join(prefixes) + r')_\d{4}$'
date_cols = [col for col in training_df.columns if re.match(pattern, col)]
training_df.drop(columns=date_cols, inplace=True)
training_df

Unnamed: 0,Country,Data_year,Region,KOF_Globalis,Work_rightCIRI,Trade_open,FDI,VDEM_Libdem,GDPpc_2016,Armedcon,Pol_terror,SLAVERY,SDGI_2016,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill_2018,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx_2011,Extrajud_kill_2011,Pol_impris_2011,Torture_2011,Polrights_indx_2011,Free_assem_2011,Freemv_foreign_2011,Freemv_dom_2011,Free_speech_2011,Free_polit_2011,Relig_freeCIRI_2011,Work_rightCIRI_2011,Econ_right_F_2011,Pol_right_F_2011,Indep_judic_2011,Rape_prev_2018,Rape_report_2015,Rape_enclave_2015,Rape_compl_2018,Gender_equal_2015,Hum_traff_2019,AIDS_death,AIDS_Orph,Phys_secF
0,Afghanistan,2018,ASIA,38.57,1,55.92,0.48,0.24,570,1,5.0,2.22,36.50,,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,,9.27439,57.509158,22.612086,,27.700001,31.0400,31.900000,55.299999,,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,,6.39,,27.820000,31.248380,0.000000,0.12,0.425262,,,,,,6.250000,0.862703,0.835789,11.0,,6.5,76,,33.716116,37.4,0.134000,1,1,0.014739,1,0.77,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500.0,0.3,51.5,0.4,4,0,2,0,4,1,0,0,1,1,0,1,0,2,0,4.0,4.0,2.0,17.0,2.0,3.0,500.0,5400.0,4.0
1,Argentina,2018,AMERICAS,63.02,1,26.12,0.59,0.61,11970,0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67,52,6.3,6.574,24.0,12.500000,94,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147,2.876021,42.929606,99.5,0.005612,0,1,0.016886,0,0.26,0.000000,1,4,4,0,1,1,0.954939,0,0,-3.221412,1,74900.0,5.4,96.3,,5,1,2,0,12,2,2,2,1,2,2,1,2,3,1,1.0,4.0,0.0,9.0,4.0,1.0,1700.0,31000.0,2.0
2,Armenia,2018,RUSSIA AND EURASIA,67.09,1,75.92,3.21,0.23,3770,0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62,25,7.4,4.350,45.0,14.100000,93,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,,,,,,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164,3.958415,83.951898,99.6,0.005065,0,0,0.053967,1,0.13,0.000000,1,4,3,0,1,1,0.872840,0,0,-1.867683,0,4600.0,0.6,99.1,14.3,4,1,1,0,6,1,1,1,1,1,0,1,1,2,0,0.0,3.0,0.0,9.0,0.0,3.0,200.0,,3.0
3,Bangladesh,2016,ASIA,45.54,0,37.95,1.05,0.16,1330,1,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61,176,23.3,4.694,227.0,37.599998,89,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42,3.458803,80.262334,30.5,0.047270,0,1,0.052241,1,0.05,3.871201,1,4,2,0,1,0,0.541176,0,1,-2.852130,0,140000.0,0.2,66.7,2.1,2,0,1,0,7,1,2,1,1,1,1,0,1,2,0,1.0,4.0,1.0,12.0,4.0,3.0,500.0,3800.0,4.0
4,Bolivia,2016,AMERICAS,57.74,1,56.40,1.54,0.40,3070,0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59,206,19.6,5.890,120.0,38.400002,94,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,,,,,,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140,3.439012,43.898827,75.8,0.004741,0,0,0.027534,0,0.74,0.000000,1,4,4,0,1,1,0.786002,0,1,-2.344236,1,13500.0,4.3,88.9,4.8,5,1,1,1,11,1,2,2,1,2,2,1,1,3,0,3.0,3.0,0.0,10.0,0.0,3.0,1000.0,20000.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,2018,MIDDLE EAST AND NORTH AFRICA,64.86,1,91.44,1.48,0.66,3690,0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66,62,8.2,4.739,33.0,14.000000,98,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199,4.351138,61.795845,99.2,0.004354,0,0,0.009617,0,0.04,0.000000,1,4,3,0,1,1,0.800101,0,0,-1.737197,1,25000.0,1.2,58.3,,2,0,0,0,7,1,2,0,1,1,0,2,1,2,0,4.0,4.0,2.0,15.0,4.0,2.0,100.0,,4.0
66,Uganda,2018,SUBSAHARAN AFRICA,52.66,0,47.22,2.60,0.28,630,1,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50,343,18.7,3.931,161.0,54.599998,78,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,,,,,,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97,3.891180,46.035296,29.9,0.030420,0,1,0.025691,0,0.93,5.463832,1,4,3,0,1,1,0.579077,0,1,-2.605993,0,44300.0,85.0,69.4,,2,1,0,0,7,0,2,2,1,1,1,0,0,3,1,1.0,3.0,2.0,12.0,3.0,2.0,23000.0,950000.0,3.0
67,Ukraine,2018,RUSSIA AND EURASIA,70.60,0,104.81,3.69,0.22,2310,1,4.0,0.67,66.39,0.000000,4.4008,3.7,,0.3,63,24,5.5,4.681,94.0,9.000000,76,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305,2.947390,44.265651,99.8,0.016890,0,0,0.013978,1,0.47,4.143135,1,3,3,0,1,1,0.684342,0,1,-1.712700,0,86600.0,5.2,93.9,3.1,4,1,1,0,10,1,2,2,1,1,2,1,2,2,0,1.0,3.0,2.0,12.0,0.0,2.0,6100.0,61000.0,4.0
68,Vietnam,2016,ASIA,64.27,0,184.69,6.14,0.20,2100,0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66,54,11.4,5.360,140.0,21.700001,95,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145,3.909754,60.553253,95.0,0.014510,0,1,0.048466,1,0.24,0.000000,1,0,2,0,0,0,0.559768,0,0,-0.991591,0,71900.0,3.6,85.4,2.1,3,1,0,0,1,0,0,1,0,0,0,0,1,2,0,,2.0,0.0,,2.0,3.0,5300.0,94000.0,3.0


Now that we've done that, we need to do the same work with *Work_rightCIRI* and *Work_rightCIRI_2011*, assuming that *Work_rightCIRI* has more recent data than *Work_rightCIRI_2011*, so the *Work_rightCIRI_2011* data points will go to *Data_year = 2016* rows and *Work_rightCIRI* points will go to *Data_year = 2018*. 

In [8]:
# assuming your dataframe is named `training_df`
training_df['Work_rightCIRI1'] = training_df.apply(
    lambda row: row['Work_rightCIRI_2011'] if row['Data_year'] == 2016 else row['Work_rightCIRI'],
    axis=1
)

training_df.drop(['Work_rightCIRI_2011','Work_rightCIRI'], axis=1, inplace=True)
training_df.rename(columns={"Work_rightCIRI1": "Work_rightCIRI"}, inplace=True)
training_df

Unnamed: 0,Country,Data_year,Region,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc_2016,Armedcon,Pol_terror,SLAVERY,SDGI_2016,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill_2018,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx_2011,Extrajud_kill_2011,Pol_impris_2011,Torture_2011,Polrights_indx_2011,Free_assem_2011,Freemv_foreign_2011,Freemv_dom_2011,Free_speech_2011,Free_polit_2011,Relig_freeCIRI_2011,Econ_right_F_2011,Pol_right_F_2011,Indep_judic_2011,Rape_prev_2018,Rape_report_2015,Rape_enclave_2015,Rape_compl_2018,Gender_equal_2015,Hum_traff_2019,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,2018,ASIA,38.57,55.92,0.48,0.24,570,1,5.0,2.22,36.50,,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,,9.27439,57.509158,22.612086,,27.700001,31.0400,31.900000,55.299999,,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,,6.39,,27.820000,31.248380,0.000000,0.12,0.425262,,,,,,6.250000,0.862703,0.835789,11.0,,6.5,76,,33.716116,37.4,0.134000,1,1,0.014739,1,0.77,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500.0,0.3,51.5,0.4,4,0,2,0,4,1,0,0,1,1,0,0,2,0,4.0,4.0,2.0,17.0,2.0,3.0,500.0,5400.0,4.0,1
1,Argentina,2018,AMERICAS,63.02,26.12,0.59,0.61,11970,0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67,52,6.3,6.574,24.0,12.500000,94,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147,2.876021,42.929606,99.5,0.005612,0,1,0.016886,0,0.26,0.000000,1,4,4,0,1,1,0.954939,0,0,-3.221412,1,74900.0,5.4,96.3,,5,1,2,0,12,2,2,2,1,2,2,2,3,1,1.0,4.0,0.0,9.0,4.0,1.0,1700.0,31000.0,2.0,1
2,Armenia,2018,RUSSIA AND EURASIA,67.09,75.92,3.21,0.23,3770,0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62,25,7.4,4.350,45.0,14.100000,93,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,,,,,,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164,3.958415,83.951898,99.6,0.005065,0,0,0.053967,1,0.13,0.000000,1,4,3,0,1,1,0.872840,0,0,-1.867683,0,4600.0,0.6,99.1,14.3,4,1,1,0,6,1,1,1,1,1,0,1,2,0,0.0,3.0,0.0,9.0,0.0,3.0,200.0,,3.0,1
3,Bangladesh,2016,ASIA,45.54,37.95,1.05,0.16,1330,1,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61,176,23.3,4.694,227.0,37.599998,89,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42,3.458803,80.262334,30.5,0.047270,0,1,0.052241,1,0.05,3.871201,1,4,2,0,1,0,0.541176,0,1,-2.852130,0,140000.0,0.2,66.7,2.1,2,0,1,0,7,1,2,1,1,1,1,1,2,0,1.0,4.0,1.0,12.0,4.0,3.0,500.0,3800.0,4.0,0
4,Bolivia,2016,AMERICAS,57.74,56.40,1.54,0.40,3070,0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59,206,19.6,5.890,120.0,38.400002,94,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,,,,,,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140,3.439012,43.898827,75.8,0.004741,0,0,0.027534,0,0.74,0.000000,1,4,4,0,1,1,0.786002,0,1,-2.344236,1,13500.0,4.3,88.9,4.8,5,1,1,1,11,1,2,2,1,2,2,1,3,0,3.0,3.0,0.0,10.0,0.0,3.0,1000.0,20000.0,3.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,2018,MIDDLE EAST AND NORTH AFRICA,64.86,91.44,1.48,0.66,3690,0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66,62,8.2,4.739,33.0,14.000000,98,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199,4.351138,61.795845,99.2,0.004354,0,0,0.009617,0,0.04,0.000000,1,4,3,0,1,1,0.800101,0,0,-1.737197,1,25000.0,1.2,58.3,,2,0,0,0,7,1,2,0,1,1,0,1,2,0,4.0,4.0,2.0,15.0,4.0,2.0,100.0,,4.0,1
66,Uganda,2018,SUBSAHARAN AFRICA,52.66,47.22,2.60,0.28,630,1,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50,343,18.7,3.931,161.0,54.599998,78,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,,,,,,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97,3.891180,46.035296,29.9,0.030420,0,1,0.025691,0,0.93,5.463832,1,4,3,0,1,1,0.579077,0,1,-2.605993,0,44300.0,85.0,69.4,,2,1,0,0,7,0,2,2,1,1,1,0,3,1,1.0,3.0,2.0,12.0,3.0,2.0,23000.0,950000.0,3.0,0
67,Ukraine,2018,RUSSIA AND EURASIA,70.60,104.81,3.69,0.22,2310,1,4.0,0.67,66.39,0.000000,4.4008,3.7,,0.3,63,24,5.5,4.681,94.0,9.000000,76,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305,2.947390,44.265651,99.8,0.016890,0,0,0.013978,1,0.47,4.143135,1,3,3,0,1,1,0.684342,0,1,-1.712700,0,86600.0,5.2,93.9,3.1,4,1,1,0,10,1,2,2,1,1,2,2,2,0,1.0,3.0,2.0,12.0,0.0,2.0,6100.0,61000.0,4.0,0
68,Vietnam,2016,ASIA,64.27,184.69,6.14,0.20,2100,0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66,54,11.4,5.360,140.0,21.700001,95,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145,3.909754,60.553253,95.0,0.014510,0,1,0.048466,1,0.24,0.000000,1,0,2,0,0,0,0.559768,0,0,-0.991591,0,71900.0,3.6,85.4,2.1,3,1,0,0,1,0,0,1,0,0,0,1,2,0,,2.0,0.0,,2.0,3.0,5300.0,94000.0,3.0,0


Now that we have dealt with these issues, We need to do some final checkings before proceeding to imputation. Firstily, we need to remove dates from column names.

In [9]:
training_df.columns = training_df.columns.str.replace(r'_\d+$', '')
training_df

  training_df.columns = training_df.columns.str.replace(r'_\d+$', '')


Unnamed: 0,Country,Data_year,Region,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Armedcon,Pol_terror,SLAVERY,SDGI,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Gender_equal,Hum_traff,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,2018,ASIA,38.57,55.92,0.48,0.24,570,1,5.0,2.22,36.50,,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,,9.27439,57.509158,22.612086,,27.700001,31.0400,31.900000,55.299999,,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,,6.39,,27.820000,31.248380,0.000000,0.12,0.425262,,,,,,6.250000,0.862703,0.835789,11.0,,6.5,76,,33.716116,37.4,0.134000,1,1,0.014739,1,0.77,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500.0,0.3,51.5,0.4,4,0,2,0,4,1,0,0,1,1,0,0,2,0,4.0,4.0,2.0,17.0,2.0,3.0,500.0,5400.0,4.0,1
1,Argentina,2018,AMERICAS,63.02,26.12,0.59,0.61,11970,0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67,52,6.3,6.574,24.0,12.500000,94,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147,2.876021,42.929606,99.5,0.005612,0,1,0.016886,0,0.26,0.000000,1,4,4,0,1,1,0.954939,0,0,-3.221412,1,74900.0,5.4,96.3,,5,1,2,0,12,2,2,2,1,2,2,2,3,1,1.0,4.0,0.0,9.0,4.0,1.0,1700.0,31000.0,2.0,1
2,Armenia,2018,RUSSIA AND EURASIA,67.09,75.92,3.21,0.23,3770,0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62,25,7.4,4.350,45.0,14.100000,93,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,,,,,,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164,3.958415,83.951898,99.6,0.005065,0,0,0.053967,1,0.13,0.000000,1,4,3,0,1,1,0.872840,0,0,-1.867683,0,4600.0,0.6,99.1,14.3,4,1,1,0,6,1,1,1,1,1,0,1,2,0,0.0,3.0,0.0,9.0,0.0,3.0,200.0,,3.0,1
3,Bangladesh,2016,ASIA,45.54,37.95,1.05,0.16,1330,1,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61,176,23.3,4.694,227.0,37.599998,89,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42,3.458803,80.262334,30.5,0.047270,0,1,0.052241,1,0.05,3.871201,1,4,2,0,1,0,0.541176,0,1,-2.852130,0,140000.0,0.2,66.7,2.1,2,0,1,0,7,1,2,1,1,1,1,1,2,0,1.0,4.0,1.0,12.0,4.0,3.0,500.0,3800.0,4.0,0
4,Bolivia,2016,AMERICAS,57.74,56.40,1.54,0.40,3070,0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59,206,19.6,5.890,120.0,38.400002,94,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,,,,,,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140,3.439012,43.898827,75.8,0.004741,0,0,0.027534,0,0.74,0.000000,1,4,4,0,1,1,0.786002,0,1,-2.344236,1,13500.0,4.3,88.9,4.8,5,1,1,1,11,1,2,2,1,2,2,1,3,0,3.0,3.0,0.0,10.0,0.0,3.0,1000.0,20000.0,3.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,2018,MIDDLE EAST AND NORTH AFRICA,64.86,91.44,1.48,0.66,3690,0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66,62,8.2,4.739,33.0,14.000000,98,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199,4.351138,61.795845,99.2,0.004354,0,0,0.009617,0,0.04,0.000000,1,4,3,0,1,1,0.800101,0,0,-1.737197,1,25000.0,1.2,58.3,,2,0,0,0,7,1,2,0,1,1,0,1,2,0,4.0,4.0,2.0,15.0,4.0,2.0,100.0,,4.0,1
66,Uganda,2018,SUBSAHARAN AFRICA,52.66,47.22,2.60,0.28,630,1,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50,343,18.7,3.931,161.0,54.599998,78,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,,,,,,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97,3.891180,46.035296,29.9,0.030420,0,1,0.025691,0,0.93,5.463832,1,4,3,0,1,1,0.579077,0,1,-2.605993,0,44300.0,85.0,69.4,,2,1,0,0,7,0,2,2,1,1,1,0,3,1,1.0,3.0,2.0,12.0,3.0,2.0,23000.0,950000.0,3.0,0
67,Ukraine,2018,RUSSIA AND EURASIA,70.60,104.81,3.69,0.22,2310,1,4.0,0.67,66.39,0.000000,4.4008,3.7,,0.3,63,24,5.5,4.681,94.0,9.000000,76,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305,2.947390,44.265651,99.8,0.016890,0,0,0.013978,1,0.47,4.143135,1,3,3,0,1,1,0.684342,0,1,-1.712700,0,86600.0,5.2,93.9,3.1,4,1,1,0,10,1,2,2,1,1,2,2,2,0,1.0,3.0,2.0,12.0,0.0,2.0,6100.0,61000.0,4.0,0
68,Vietnam,2016,ASIA,64.27,184.69,6.14,0.20,2100,0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66,54,11.4,5.360,140.0,21.700001,95,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145,3.909754,60.553253,95.0,0.014510,0,1,0.048466,1,0.24,0.000000,1,0,2,0,0,0,0.559768,0,0,-0.991591,0,71900.0,3.6,85.4,2.1,3,1,0,0,1,0,0,1,0,0,0,1,2,0,,2.0,0.0,,2.0,3.0,5300.0,94000.0,3.0,0


Now, if a country has 2 rows, one for *2016* and one for *2018* and <b>one out of two</b> has missing values, we need to fill them from the other row.

In [10]:
mask = (training_df['Data_year'] == 2016) | (training_df['Data_year'] == 2018)
grouped = training_df[mask].groupby('Country')

# fill missing values with values from another row
for country, group in grouped:
    for col in [c for c in training_df.columns if c not in ['Country', 'Data_year']]:
        group[col].fillna(method='ffill', inplace=True)
        group[col].fillna(method='bfill', inplace=True)
    training_df.update(group)
training_df['Data_year'] = training_df['Data_year'].astype(int)
training_df

Unnamed: 0,Country,Data_year,Region,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Armedcon,Pol_terror,SLAVERY,SDGI,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Gender_equal,Hum_traff,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,2018,ASIA,38.57,55.92,0.48,0.24,570.0,1.0,5.0,2.22,36.50,,2.0206,40.9,26.799999,9.5,50.0,396.0,35.5,3.575,189.0,91.099998,66.0,46.990051,,9.27439,57.509158,22.612086,,27.700001,31.0400,31.900000,55.299999,,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,,6.39,,27.820000,31.248380,0.000000,0.12,0.425262,,,,,,6.250000,0.862703,0.835789,11.0,,6.5,76.0,,33.716116,37.4,0.134000,1.0,1.0,0.014739,1.0,0.77,7.928766,1.0,4.0,2.0,0.0,0.0,0.0,0.566547,0.0,1.0,-2.359148,0.0,12500.0,0.3,51.5,0.4,4.0,0.0,2.0,0.0,4.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,4.0,4.0,2.0,17.0,2.0,3.0,500.0,5400.0,4.0,1.0
1,Argentina,2018,AMERICAS,63.02,26.12,0.59,0.61,11970.0,0.0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67.0,52.0,6.3,6.574,24.0,12.500000,94.0,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147.0,2.876021,42.929606,99.5,0.005612,0.0,1.0,0.016886,0.0,0.26,0.000000,1.0,4.0,4.0,0.0,1.0,1.0,0.954939,0.0,0.0,-3.221412,1.0,74900.0,5.4,96.3,,5.0,1.0,2.0,0.0,12.0,2.0,2.0,2.0,1.0,2.0,2.0,2.0,3.0,1.0,1.0,4.0,0.0,9.0,4.0,1.0,1700.0,31000.0,2.0,1.0
2,Armenia,2018,RUSSIA AND EURASIA,67.09,75.92,3.21,0.23,3770.0,0.0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62.0,25.0,7.4,4.350,45.0,14.100000,93.0,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,,,,,,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164.0,3.958415,83.951898,99.6,0.005065,0.0,0.0,0.053967,1.0,0.13,0.000000,1.0,4.0,3.0,0.0,1.0,1.0,0.872840,0.0,0.0,-1.867683,0.0,4600.0,0.6,99.1,14.3,4.0,1.0,1.0,0.0,6.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2.0,0.0,0.0,3.0,0.0,9.0,0.0,3.0,200.0,,3.0,1.0
3,Bangladesh,2016,ASIA,45.54,37.95,1.05,0.16,1330.0,1.0,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61.0,176.0,23.3,4.694,227.0,37.599998,89.0,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42.0,3.458803,80.262334,30.5,0.047270,0.0,1.0,0.052241,1.0,0.05,3.871201,1.0,4.0,2.0,0.0,1.0,0.0,0.541176,0.0,1.0,-2.852130,0.0,140000.0,0.2,66.7,2.1,2.0,0.0,1.0,0.0,7.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,0.0,1.0,4.0,1.0,12.0,4.0,3.0,500.0,3800.0,4.0,0.0
4,Bolivia,2016,AMERICAS,57.74,56.40,1.54,0.40,3070.0,0.0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59.0,206.0,19.6,5.890,120.0,38.400002,94.0,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,,,,,,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140.0,3.439012,43.898827,75.8,0.004741,0.0,0.0,0.027534,0.0,0.74,0.000000,1.0,4.0,4.0,0.0,1.0,1.0,0.786002,0.0,1.0,-2.344236,1.0,13500.0,4.3,88.9,4.8,5.0,1.0,1.0,1.0,11.0,1.0,2.0,2.0,1.0,2.0,2.0,1.0,3.0,0.0,3.0,3.0,0.0,10.0,0.0,3.0,1000.0,20000.0,3.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,2018,MIDDLE EAST AND NORTH AFRICA,64.86,91.44,1.48,0.66,3690.0,0.0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66.0,62.0,8.2,4.739,33.0,14.000000,98.0,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199.0,4.351138,61.795845,99.2,0.004354,0.0,0.0,0.009617,0.0,0.04,0.000000,1.0,4.0,3.0,0.0,1.0,1.0,0.800101,0.0,0.0,-1.737197,1.0,25000.0,1.2,58.3,,2.0,0.0,0.0,0.0,7.0,1.0,2.0,0.0,1.0,1.0,0.0,1.0,2.0,0.0,4.0,4.0,2.0,15.0,4.0,2.0,100.0,,4.0,1.0
66,Uganda,2018,SUBSAHARAN AFRICA,52.66,47.22,2.60,0.28,630.0,1.0,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50.0,343.0,18.7,3.931,161.0,54.599998,78.0,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,,,,,,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97.0,3.891180,46.035296,29.9,0.030420,0.0,1.0,0.025691,0.0,0.93,5.463832,1.0,4.0,3.0,0.0,1.0,1.0,0.579077,0.0,1.0,-2.605993,0.0,44300.0,85.0,69.4,,2.0,1.0,0.0,0.0,7.0,0.0,2.0,2.0,1.0,1.0,1.0,0.0,3.0,1.0,1.0,3.0,2.0,12.0,3.0,2.0,23000.0,950000.0,3.0,0.0
67,Ukraine,2018,RUSSIA AND EURASIA,70.60,104.81,3.69,0.22,2310.0,1.0,4.0,0.67,66.39,0.000000,4.4008,3.7,,0.3,63.0,24.0,5.5,4.681,94.0,9.000000,76.0,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305.0,2.947390,44.265651,99.8,0.016890,0.0,0.0,0.013978,1.0,0.47,4.143135,1.0,3.0,3.0,0.0,1.0,1.0,0.684342,0.0,1.0,-1.712700,0.0,86600.0,5.2,93.9,3.1,4.0,1.0,1.0,0.0,10.0,1.0,2.0,2.0,1.0,1.0,2.0,2.0,2.0,0.0,1.0,3.0,2.0,12.0,0.0,2.0,6100.0,61000.0,4.0,0.0
68,Vietnam,2016,ASIA,64.27,184.69,6.14,0.20,2100.0,0.0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66.0,54.0,11.4,5.360,140.0,21.700001,95.0,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145.0,3.909754,60.553253,95.0,0.014510,0.0,1.0,0.048466,1.0,0.24,0.000000,1.0,0.0,2.0,0.0,0.0,0.0,0.559768,0.0,0.0,-0.991591,0.0,71900.0,3.6,85.4,2.1,3.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.0,0.0,,2.0,0.0,,2.0,3.0,5300.0,94000.0,3.0,0.0


Now, we need to do a final check. For each row, we need to calculate the percentage of missing values and if the percentage is more than 50%, then we need to delete that row. The same work will be applied to the columns too.

In [11]:
missing_data_row = training_df.groupby(['Country', 'Data_year']).apply(lambda x: x.isnull().sum().sum())
missing_data_row_perc = round((missing_data_row / len(training_df.columns))*100, 2)
missing_data_row = missing_data_row[missing_data_row > 0].reset_index()
missing_data_row.columns = ['Country', 'Data_year', 'Missing_data']
missing_data_row['Missing_perc'] = missing_data_row_perc[missing_data_row_perc > 0].values
missing_data_row = missing_data_row[['Country', 'Data_year', 'Missing_data', 'Missing_perc']]

missing_data_row.sort_values(by=['Missing_perc'],ascending=False, inplace=True)
print(missing_data_row)

                             Country  Data_year  Missing_data  Missing_perc
14  Democratic Republic of the Congo       2018            17         15.45
0                        Afghanistan       2018            13         11.82
23                           Hungary       2018            12         10.91
22                           Hungary       2016            12         10.91
38                           Myanmar       2016            11         10.00
..                               ...        ...           ...           ...
11                             Chile       2018             1          0.91
12                          Colombia       2018             1          0.91
35                            Mexico       2018             1          0.91
1                          Argentina       2018             1          0.91
37                           Morocco       2018             1          0.91

[64 rows x 4 columns]


In [12]:
missing_data_col = training_df.isnull().sum().reset_index()
missing_data_col.columns = ['Column', 'Missing_data']
missing_data_col['Missing_perc'] = round(missing_data_col['Missing_data'] / len(training_df) * 100, 2)
missing_data_col = missing_data_col[missing_data_col['Missing_data'] > 0][['Column', 'Missing_data', 'Missing_perc']]

missing_data_col.sort_values(by=['Missing_perc'],ascending=False, inplace=True)
print(missing_data_col)

                Column  Missing_data  Missing_perc
85     Sexwrk_Syphilis            30         42.86
53    Fish_overexploit            26         37.14
51       Ocean_protect            25         35.71
107          AIDS_Orph            22         31.43
49        Ocean_biodiv            16         22.86
52     Ocean_fisheries            16         22.86
50         Ocean_clean            16         22.86
82         Sexwrk_size            11         15.71
106         AIDS_death            10         14.29
29            M_school             9         12.86
84       Sexwrk_condom             8         11.43
83          Sexwrk_HIV             7         10.00
16         Wasting_u5s             7         10.00
14        Stunting_u5s             7         10.00
39           Child_lab             6          8.57
44          Inequality             6          8.57
103         Rape_compl             5          7.14
100          Rape_prev             5          7.14
12             Poverty         

As we can see from the above, we don't need to delete any of the columns or rows. So, we can now proceed to imputation method with *Decision Trees*.

In [13]:
# drop non-numeric columns
numeric_df = training_df.select_dtypes(include='number')

# impute missing values
imputer = IterativeImputer(estimator=DecisionTreeRegressor(), max_iter=10000)
imputed_df = pd.DataFrame(imputer.fit_transform(numeric_df), columns=numeric_df.columns)

# concatenate the imputed numeric columns with the non-numeric columns
imputed_training_df = pd.concat([training_df[['Country', 'Region']], imputed_df], axis=1)
imputed_training_df



Unnamed: 0,Country,Region,Data_year,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Armedcon,Pol_terror,SLAVERY,SDGI,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Gender_equal,Hum_traff,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,ASIA,2018.0,38.57,55.92,0.48,0.24,570.0,1.0,5.0,2.22,36.50,10.910000,2.0206,40.9,26.799999,9.5,50.0,396.0,35.5,3.575,189.0,91.099998,66.0,46.990051,71.85668,9.27439,57.509158,22.612086,31.198909,27.700001,31.0400,31.900000,55.299999,2.888889,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,3.140831,6.39,0.0,27.820000,31.248380,0.000000,0.12,0.425262,67.27,46.50,0.000000,2.0,13.57,6.250000,0.862703,0.835789,11.0,3.567911,6.5,76.0,3.764837,33.716116,37.4,0.134000,1.0,1.0,0.014739,1.0,0.77,7.928766,1.0,4.0,2.0,0.0,0.0,0.0,0.566547,0.0,1.0,-2.359148,0.0,12500.0,0.3,51.5,0.4,4.0,0.0,2.0,0.0,4.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,4.0,4.0,2.0,17.0,2.0,3.0,500.0,5400.0,4.0,1.0
1,Argentina,AMERICAS,2018.0,63.02,26.12,0.59,0.61,11970.0,0.0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67.0,52.0,6.3,6.574,24.0,12.500000,94.0,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42.0,49.36,14.705882,12.585822,0.861084,32.0,2.336129,5.5,147.0,2.876021,42.929606,99.5,0.005612,0.0,1.0,0.016886,0.0,0.26,0.000000,1.0,4.0,4.0,0.0,1.0,1.0,0.954939,0.0,0.0,-3.221412,1.0,74900.0,5.4,96.3,1.3,5.0,1.0,2.0,0.0,12.0,2.0,2.0,2.0,1.0,2.0,2.0,2.0,3.0,1.0,1.0,4.0,0.0,9.0,4.0,1.0,1700.0,31000.0,2.0,1.0
2,Armenia,RUSSIA AND EURASIA,2018.0,67.09,75.92,3.21,0.23,3770.0,0.0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62.0,25.0,7.4,4.350,45.0,14.100000,93.0,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,87.28,71.58,0.000000,67.0,63.52,10.526316,0.593045,0.841606,35.0,3.401135,1.8,164.0,3.958415,83.951898,99.6,0.005065,0.0,0.0,0.053967,1.0,0.13,0.000000,1.0,4.0,3.0,0.0,1.0,1.0,0.872840,0.0,0.0,-1.867683,0.0,4600.0,0.6,99.1,14.3,4.0,1.0,1.0,0.0,6.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2.0,0.0,0.0,3.0,0.0,9.0,0.0,3.0,200.0,1000.0,3.0,1.0
3,Bangladesh,ASIA,2016.0,45.54,37.95,1.05,0.16,1330.0,1.0,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61.0,176.0,23.3,4.694,227.0,37.599998,89.0,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2.0,2.36,10.526316,3.526961,0.768760,25.0,2.852616,2.7,42.0,3.458803,80.262334,30.5,0.047270,0.0,1.0,0.052241,1.0,0.05,3.871201,1.0,4.0,2.0,0.0,1.0,0.0,0.541176,0.0,1.0,-2.852130,0.0,140000.0,0.2,66.7,2.1,2.0,0.0,1.0,0.0,7.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,0.0,1.0,4.0,1.0,12.0,4.0,3.0,500.0,3800.0,4.0,0.0
4,Bolivia,AMERICAS,2016.0,57.74,56.40,1.54,0.40,3070.0,0.0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59.0,206.0,19.6,5.890,120.0,38.400002,94.0,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,71.21,50.15,0.000000,64.0,45.76,26.785714,5.260437,0.868154,34.0,3.185076,12.1,140.0,3.439012,43.898827,75.8,0.004741,0.0,0.0,0.027534,0.0,0.74,0.000000,1.0,4.0,4.0,0.0,1.0,1.0,0.786002,0.0,1.0,-2.344236,1.0,13500.0,4.3,88.9,4.8,5.0,1.0,1.0,1.0,11.0,1.0,2.0,2.0,1.0,2.0,2.0,1.0,3.0,0.0,3.0,3.0,0.0,10.0,0.0,3.0,1000.0,20000.0,3.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,MIDDLE EAST AND NORTH AFRICA,2018.0,64.86,91.44,1.48,0.66,3690.0,0.0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66.0,62.0,8.2,4.739,33.0,14.000000,98.0,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58.0,17.82,1.388889,5.817336,0.972199,38.0,3.484549,2.2,199.0,4.351138,61.795845,99.2,0.004354,0.0,0.0,0.009617,0.0,0.04,0.000000,1.0,4.0,3.0,0.0,1.0,1.0,0.800101,0.0,0.0,-1.737197,1.0,25000.0,1.2,58.3,14.3,2.0,0.0,0.0,0.0,7.0,1.0,2.0,0.0,1.0,1.0,0.0,1.0,2.0,0.0,4.0,4.0,2.0,15.0,4.0,2.0,100.0,8100.0,4.0,1.0
66,Uganda,SUBSAHARAN AFRICA,2018.0,52.66,47.22,2.60,0.28,630.0,1.0,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50.0,343.0,18.7,3.931,161.0,54.599998,78.0,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,2.888889,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,79.20,56.92,0.000000,30.0,13.57,26.666667,5.663806,0.755702,25.0,3.567911,10.7,97.0,3.891180,46.035296,29.9,0.030420,0.0,1.0,0.025691,0.0,0.93,5.463832,1.0,4.0,3.0,0.0,1.0,1.0,0.579077,0.0,1.0,-2.605993,0.0,44300.0,85.0,69.4,0.6,2.0,1.0,0.0,0.0,7.0,0.0,2.0,2.0,1.0,1.0,1.0,0.0,3.0,1.0,1.0,3.0,2.0,12.0,3.0,2.0,23000.0,950000.0,3.0,0.0
67,Ukraine,RUSSIA AND EURASIA,2018.0,70.60,104.81,3.69,0.22,2310.0,1.0,4.0,0.67,66.39,0.000000,4.4008,3.7,1.170000,0.3,63.0,24.0,5.5,4.681,94.0,9.000000,76.0,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90.0,32.54,1.459854,5.702895,0.943358,27.0,2.866647,4.3,305.0,2.947390,44.265651,99.8,0.016890,0.0,0.0,0.013978,1.0,0.47,4.143135,1.0,3.0,3.0,0.0,1.0,1.0,0.684342,0.0,1.0,-1.712700,0.0,86600.0,5.2,93.9,3.1,4.0,1.0,1.0,0.0,10.0,1.0,2.0,2.0,1.0,1.0,2.0,2.0,2.0,0.0,1.0,3.0,2.0,12.0,0.0,2.0,6100.0,61000.0,4.0,0.0
68,Vietnam,ASIA,2016.0,64.27,184.69,6.14,0.20,2100.0,0.0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66.0,54.0,11.4,5.360,140.0,21.700001,95.0,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3.0,0.45,2.777778,9.076540,0.746622,31.0,3.407291,3.3,145.0,3.909754,60.553253,95.0,0.014510,0.0,1.0,0.048466,1.0,0.24,0.000000,1.0,0.0,2.0,0.0,0.0,0.0,0.559768,0.0,0.0,-0.991591,0.0,71900.0,3.6,85.4,2.1,3.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,2.0,0.0,11.0,2.0,3.0,5300.0,94000.0,3.0,0.0


After almost 40 min. because of many iterations, we proceed with MinMaxScaler and after that, saving the file in order to retrieve it easily later on.

In [14]:
imputed_training_df.to_excel('imputed_unscaled_training.xlsx', index=False)

In [15]:
cols_to_normalize = imputed_training_df.select_dtypes(include='number').columns.drop(['SLAVERY', 'Data_year'])

imputed_training_df[cols_to_normalize] = MinMaxScaler().fit_transform(imputed_training_df[cols_to_normalize])
imputed_training_df

Unnamed: 0,Country,Region,Data_year,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Armedcon,Pol_terror,SLAVERY,SDGI,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Gender_equal,Hum_traff,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,ASIA,2018.0,0.000000,0.121572,0.406324,0.194030,0.004849,1.0,1.00,2.22,0.114657,0.141358,0.239843,0.846320,0.490714,0.436019,0.18750,0.484587,0.775281,0.000000,0.222329,0.833176,0.352941,0.000000,0.220364,0.146244,0.655584,0.000000,0.121232,0.502935,0.244864,0.199765,0.060924,0.445285,0.368071,0.109631,0.225746,0.001839,0.219149,0.296855,0.233757,0.053692,0.000000,0.059137,0.287684,0.000000,0.250,0.029766,0.000000,0.057648,0.000000,0.010638,0.146550,0.092593,0.006239,0.634798,0.000000,0.359124,0.069845,0.103371,0.339327,0.000000,0.359263,0.972799,1.0,1.0,0.188840,1.0,0.820225,1.000000,1.0,1.00,0.50,0.0,0.0,0.0,0.491216,0.0,1.0,0.361693,0.0,0.008706,0.003529,0.436702,0.016327,0.500,0.0,1.0,0.0,0.333333,0.5,0.0,0.0,0.5,0.5,0.0,0.0,0.5,0.0,1.00,1.0,1.0,1.0,0.333333,0.666667,0.005413,0.003387,1.0,0.5
1,Argentina,AMERICAS,2018.0,0.535832,0.018650,0.407511,0.746269,0.225950,0.0,0.25,0.13,0.781910,0.000000,0.614533,0.138528,0.073330,0.042654,0.71875,0.060419,0.119101,0.830288,0.023390,0.092366,0.901961,0.985634,0.872015,1.000000,0.212390,0.865606,0.570649,0.677104,0.033574,0.957697,0.981092,0.173410,0.997783,1.000000,0.467704,0.320172,0.093617,0.201560,0.197044,0.783479,0.215003,0.503197,0.980227,0.117881,0.075,0.358160,0.711502,0.464002,0.176471,0.436170,0.533636,0.217865,0.122888,0.695511,0.283784,0.000000,0.058758,0.262921,0.104765,0.157257,0.994882,0.028950,0.0,1.0,0.217296,0.0,0.247191,0.000000,1.0,1.00,1.00,0.0,1.0,1.0,0.979599,0.0,0.0,0.068583,1.0,0.053236,0.063529,0.957027,0.053061,0.625,0.5,1.0,0.0,1.000000,1.0,1.0,1.0,0.5,1.0,1.0,1.0,1.0,0.5,0.25,1.0,0.0,0.2,0.666667,0.000000,0.021651,0.023095,0.0,0.5
2,Armenia,RUSSIA AND EURASIA,2018.0,0.625027,0.190647,0.435787,0.179104,0.066912,0.0,0.50,0.34,0.750880,0.031614,0.388528,0.411255,0.088646,0.184834,0.56250,0.027127,0.143820,0.214563,0.048710,0.107446,0.882353,0.996365,0.560082,0.445839,0.676455,0.860320,0.712650,0.170254,0.298757,0.876616,1.000000,0.064213,1.000000,0.849861,0.552045,0.313745,0.082979,0.516489,0.510507,0.553191,0.207636,0.151838,0.996547,0.226646,0.075,0.128709,0.764616,0.700231,0.000000,0.702128,0.686783,0.155945,0.003556,0.648759,0.324324,0.310501,0.017738,0.301124,0.390413,0.857428,0.995906,0.024929,0.0,0.0,0.708861,1.0,0.101124,0.000000,1.0,1.00,0.75,0.0,1.0,1.0,0.876364,0.0,0.0,0.528756,0.0,0.003069,0.007059,0.989547,0.583673,0.500,0.5,0.5,0.0,0.500000,0.5,0.5,0.5,0.5,0.5,0.0,0.5,0.5,0.0,0.00,0.5,0.0,0.2,0.000000,0.666667,0.001353,0.000000,0.5,0.5
3,Bangladesh,ASIA,2016.0,0.152750,0.059508,0.412476,0.074627,0.019589,1.0,0.75,0.95,0.288952,0.565561,0.592475,0.742424,0.291595,0.663507,0.53125,0.213317,0.501124,0.309801,0.268146,0.328935,0.803922,0.621570,0.765825,0.215217,0.223684,0.659472,0.282589,0.352250,0.022677,0.537015,0.724790,0.141545,0.552106,0.047337,0.518092,0.047622,0.272340,0.128867,0.164575,0.093867,0.012726,0.173681,0.298899,0.000000,0.600,0.025539,0.905617,0.148347,0.333333,0.010638,0.025308,0.155945,0.032749,0.473914,0.189189,0.150581,0.027716,0.026966,0.258563,0.794455,0.288639,0.335201,0.0,1.0,0.685980,1.0,0.011236,0.488248,1.0,1.00,0.50,0.0,1.0,0.0,0.459313,0.0,1.0,0.194113,0.0,0.099693,0.002353,0.613240,0.085714,0.250,0.0,0.5,0.0,0.583333,0.5,1.0,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.25,1.0,0.5,0.5,0.666667,0.666667,0.005413,0.002156,1.0,0.0
4,Bolivia,AMERICAS,2016.0,0.420118,0.123230,0.417764,0.432836,0.053336,0.0,0.25,0.44,0.576144,0.099767,0.227632,0.352814,0.282022,0.061611,0.46875,0.250308,0.417978,0.640919,0.139137,0.336475,0.901961,0.981059,0.490348,0.528036,0.548398,0.694434,0.696551,1.000000,0.002454,0.415981,0.789916,0.315639,0.894678,0.733015,0.465240,0.185897,0.561702,0.102508,0.273237,0.462078,0.093101,0.817528,0.959978,0.113363,0.225,0.122981,0.150554,0.151166,0.000000,0.670213,0.494700,0.396825,0.049998,0.712479,0.310811,0.247509,0.131929,0.247191,0.253340,0.173799,0.752303,0.022547,0.0,0.0,0.358453,0.0,0.786517,0.000000,1.0,1.00,1.00,0.0,1.0,1.0,0.767170,0.0,1.0,0.366761,1.0,0.009420,0.050588,0.871080,0.195918,0.625,0.5,0.5,0.5,0.916667,0.5,1.0,1.0,0.5,1.0,1.0,0.5,1.0,0.0,0.75,0.5,0.0,0.3,0.000000,0.666667,0.012179,0.014627,0.5,0.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,MIDDLE EAST AND NORTH AFRICA,2018.0,0.576156,0.244249,0.417116,0.820896,0.065361,0.0,0.50,0.22,0.743178,0.025784,0.212034,0.179654,0.073330,0.118483,0.68750,0.072750,0.161798,0.322259,0.034242,0.106503,0.980392,0.949151,0.964584,0.672812,0.261992,0.594337,0.190932,0.573386,0.550443,0.901293,0.951681,0.169694,1.000000,1.000000,0.266716,0.124217,0.044681,0.468029,0.359571,0.551439,0.206966,0.278636,0.943357,0.278706,0.125,0.186723,0.764616,0.736869,0.000000,0.606383,0.192516,0.020576,0.055539,0.962210,0.364865,0.334820,0.022173,0.379775,0.494054,0.479267,0.991812,0.019702,0.0,0.0,0.120936,0.0,0.000000,0.000000,1.0,1.00,0.75,0.0,1.0,1.0,0.784899,0.0,0.0,0.573112,1.0,0.017626,0.014118,0.515679,0.583673,0.250,0.0,0.0,0.0,0.583333,0.5,1.0,0.0,0.5,0.5,0.0,0.5,0.5,0.0,1.00,1.0,1.0,0.8,0.666667,0.333333,0.000000,0.005466,1.0,0.5
66,Uganda,SUBSAHARAN AFRICA,2018.0,0.308788,0.091524,0.429204,0.253731,0.006012,1.0,0.50,0.76,0.271347,0.430682,0.239651,0.701299,0.465824,0.189573,0.18750,0.419236,0.397753,0.098560,0.188570,0.489161,0.588235,0.762476,0.763993,0.194921,0.642704,0.542011,0.949100,0.645793,0.007955,0.049354,0.558824,0.445285,0.092711,0.000000,0.101009,0.020433,0.346808,0.100970,0.327824,0.195369,0.049565,0.498135,0.206582,0.005620,0.150,0.004809,0.455865,0.324622,0.000000,0.308511,0.146550,0.395062,0.054012,0.442572,0.189189,0.359124,0.116408,0.150562,0.372669,0.210265,0.282497,0.211327,0.0,1.0,0.334020,0.0,1.000000,0.689115,1.0,1.00,0.75,0.0,1.0,1.0,0.506972,0.0,1.0,0.277783,0.0,0.031399,1.000000,0.644599,0.024490,0.250,0.5,0.0,0.0,0.583333,0.0,1.0,1.0,0.5,0.5,0.5,0.0,1.0,0.5,0.25,0.5,1.0,0.5,0.500000,0.333333,0.309878,0.730562,0.5,0.0
67,Ukraine,RUSSIA AND EURASIA,2018.0,0.701950,0.290426,0.440967,0.164179,0.038596,1.0,0.75,0.67,0.772447,0.000000,0.591736,0.041126,0.000000,0.000000,0.59375,0.025894,0.101124,0.306202,0.107789,0.059378,0.549020,0.995707,0.928070,0.723782,0.331428,0.853649,0.766706,0.197652,0.066519,0.951821,0.920168,0.183475,1.000000,1.000000,0.091286,0.510723,0.051064,0.306440,0.390996,0.516896,0.036169,0.000533,0.858503,0.147516,0.050,0.493136,0.824226,0.700231,0.050000,0.946809,0.351720,0.021627,0.054401,0.892986,0.216216,0.154672,0.045455,0.617978,0.123599,0.180060,0.997953,0.111861,0.0,0.0,0.178759,1.0,0.483146,0.522545,1.0,0.75,0.75,0.0,1.0,1.0,0.639337,0.0,1.0,0.581439,0.0,0.061586,0.061176,0.929152,0.126531,0.500,0.5,0.5,0.0,0.833333,0.5,1.0,1.0,0.5,0.5,1.0,1.0,0.5,0.0,0.25,0.5,1.0,0.5,0.000000,0.333333,0.081191,0.046189,1.0,0.0
68,Vietnam,ASIA,2016.0,0.563226,0.566312,0.467408,0.134328,0.034523,0.0,0.50,0.15,0.579445,0.041850,0.765686,0.380952,0.188206,0.255924,0.68750,0.062885,0.233708,0.494186,0.163251,0.179076,0.921569,0.945137,0.943629,0.404707,0.166034,0.738428,0.871947,0.436399,0.072745,0.741481,0.949580,0.124309,0.988914,0.430898,0.563932,0.125545,0.348936,0.050872,0.310750,0.578348,0.125921,0.265583,0.599109,0.001380,0.550,0.152507,0.457394,0.324622,0.076923,0.021277,0.004651,0.041152,0.087970,0.420777,0.270270,0.312296,0.034368,0.258427,0.377571,0.458058,0.948823,0.094364,0.0,1.0,0.635938,1.0,0.224719,0.000000,1.0,0.00,0.50,0.0,0.0,0.0,0.482691,0.0,0.0,0.826566,0.0,0.051095,0.042353,0.830430,0.085714,0.375,0.5,0.0,0.0,0.083333,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.5,0.0,0.25,0.0,0.0,0.4,0.333333,0.666667,0.070365,0.071594,0.5,0.0


In [16]:
imputed_training_df.to_excel('imputed_training.xlsx', index=False)

## $2^{nd}$ Question : Slavery Estimation Using All Features<a class="anchor" id="q1"></a>

In [17]:
imputed_training_df = pd.read_excel('imputed_training.xlsx')
imputed_training_df

Unnamed: 0,Country,Region,Data_year,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Armedcon,Pol_terror,SLAVERY,SDGI,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Gender_equal,Hum_traff,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,ASIA,2018,0.000000,0.121572,0.406324,0.194030,0.004849,1,1.00,2.22,0.114657,0.141358,0.239843,0.846320,0.490714,0.436019,0.18750,0.484587,0.775281,0.000000,0.222329,0.833176,0.352941,0.000000,0.220364,0.146244,0.655584,0.000000,0.121232,0.502935,0.244864,0.199765,0.060924,0.445285,0.368071,0.109631,0.225746,0.001839,0.219149,0.296855,0.233757,0.053692,0.000000,0.059137,0.287684,0.000000,0.250,0.029766,0.000000,0.057648,0.000000,0.010638,0.146550,0.092593,0.006239,0.634798,0.000000,0.359124,0.069845,0.103371,0.339327,0.000000,0.359263,0.972799,1,1,0.188840,1,0.820225,1.000000,1,1.00,0.50,0,0,0,0.491216,0,1,0.361693,0,0.008706,0.003529,0.436702,0.016327,0.500,0.0,1.0,0.0,0.333333,0.5,0.0,0.0,0.5,0.5,0.0,0.0,0.5,0.0,1.00,1.0,1.0,1.0,0.333333,0.666667,0.005413,0.003387,1.0,0.5
1,Argentina,AMERICAS,2018,0.535832,0.018650,0.407511,0.746269,0.225950,0,0.25,0.13,0.781910,0.000000,0.614533,0.138528,0.073330,0.042654,0.71875,0.060419,0.119101,0.830288,0.023390,0.092366,0.901961,0.985634,0.872015,1.000000,0.212390,0.865606,0.570649,0.677104,0.033574,0.957697,0.981092,0.173410,0.997783,1.000000,0.467704,0.320172,0.093617,0.201560,0.197044,0.783479,0.215003,0.503197,0.980227,0.117881,0.075,0.358160,0.711502,0.464002,0.176471,0.436170,0.533636,0.217865,0.122888,0.695511,0.283784,0.000000,0.058758,0.262921,0.104765,0.157257,0.994882,0.028950,0,1,0.217296,0,0.247191,0.000000,1,1.00,1.00,0,1,1,0.979599,0,0,0.068583,1,0.053236,0.063529,0.957027,0.053061,0.625,0.5,1.0,0.0,1.000000,1.0,1.0,1.0,0.5,1.0,1.0,1.0,1.0,0.5,0.25,1.0,0.0,0.2,0.666667,0.000000,0.021651,0.023095,0.0,0.5
2,Armenia,RUSSIA AND EURASIA,2018,0.625027,0.190647,0.435787,0.179104,0.066912,0,0.50,0.34,0.750880,0.031614,0.388528,0.411255,0.088646,0.184834,0.56250,0.027127,0.143820,0.214563,0.048710,0.107446,0.882353,0.996365,0.560082,0.445839,0.676455,0.860320,0.712650,0.170254,0.298757,0.876616,1.000000,0.064213,1.000000,0.849861,0.552045,0.313745,0.082979,0.516489,0.510507,0.553191,0.207636,0.151838,0.996547,0.226646,0.075,0.128709,0.764616,0.700231,0.000000,0.702128,0.686783,0.155945,0.003556,0.648759,0.324324,0.310501,0.017738,0.301124,0.390413,0.857428,0.995906,0.024929,0,0,0.708861,1,0.101124,0.000000,1,1.00,0.75,0,1,1,0.876364,0,0,0.528756,0,0.003069,0.007059,0.989547,0.583673,0.500,0.5,0.5,0.0,0.500000,0.5,0.5,0.5,0.5,0.5,0.0,0.5,0.5,0.0,0.00,0.5,0.0,0.2,0.000000,0.666667,0.001353,0.000000,0.5,0.5
3,Bangladesh,ASIA,2016,0.152750,0.059508,0.412476,0.074627,0.019589,1,0.75,0.95,0.288952,0.565561,0.592475,0.742424,0.291595,0.663507,0.53125,0.213317,0.501124,0.309801,0.268146,0.328935,0.803922,0.621570,0.765825,0.215217,0.223684,0.659472,0.282589,0.352250,0.022677,0.537015,0.724790,0.141545,0.552106,0.047337,0.518092,0.047622,0.272340,0.128867,0.164575,0.093867,0.012726,0.173681,0.298899,0.000000,0.600,0.025539,0.905617,0.148347,0.333333,0.010638,0.025308,0.155945,0.032749,0.473914,0.189189,0.150581,0.027716,0.026966,0.258563,0.794455,0.288639,0.335201,0,1,0.685980,1,0.011236,0.488248,1,1.00,0.50,0,1,0,0.459313,0,1,0.194113,0,0.099693,0.002353,0.613240,0.085714,0.250,0.0,0.5,0.0,0.583333,0.5,1.0,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.25,1.0,0.5,0.5,0.666667,0.666667,0.005413,0.002156,1.0,0.0
4,Bolivia,AMERICAS,2016,0.420118,0.123230,0.417764,0.432836,0.053336,0,0.25,0.44,0.576144,0.099767,0.227632,0.352814,0.282022,0.061611,0.46875,0.250308,0.417978,0.640919,0.139137,0.336475,0.901961,0.981059,0.490348,0.528036,0.548398,0.694434,0.696551,1.000000,0.002454,0.415981,0.789916,0.315639,0.894678,0.733015,0.465240,0.185897,0.561702,0.102508,0.273237,0.462078,0.093101,0.817528,0.959978,0.113363,0.225,0.122981,0.150554,0.151166,0.000000,0.670213,0.494700,0.396825,0.049998,0.712479,0.310811,0.247509,0.131929,0.247191,0.253340,0.173799,0.752303,0.022547,0,0,0.358453,0,0.786517,0.000000,1,1.00,1.00,0,1,1,0.767170,0,1,0.366761,1,0.009420,0.050588,0.871080,0.195918,0.625,0.5,0.5,0.5,0.916667,0.5,1.0,1.0,0.5,1.0,1.0,0.5,1.0,0.0,0.75,0.5,0.0,0.3,0.000000,0.666667,0.012179,0.014627,0.5,0.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,MIDDLE EAST AND NORTH AFRICA,2018,0.576156,0.244249,0.417116,0.820896,0.065361,0,0.50,0.22,0.743178,0.025784,0.212034,0.179654,0.073330,0.118483,0.68750,0.072750,0.161798,0.322259,0.034242,0.106503,0.980392,0.949151,0.964584,0.672812,0.261992,0.594337,0.190932,0.573386,0.550443,0.901293,0.951681,0.169694,1.000000,1.000000,0.266716,0.124217,0.044681,0.468029,0.359571,0.551439,0.206966,0.278636,0.943357,0.278706,0.125,0.186723,0.764616,0.736869,0.000000,0.606383,0.192516,0.020576,0.055539,0.962210,0.364865,0.334820,0.022173,0.379775,0.494054,0.479267,0.991812,0.019702,0,0,0.120936,0,0.000000,0.000000,1,1.00,0.75,0,1,1,0.784899,0,0,0.573112,1,0.017626,0.014118,0.515679,0.583673,0.250,0.0,0.0,0.0,0.583333,0.5,1.0,0.0,0.5,0.5,0.0,0.5,0.5,0.0,1.00,1.0,1.0,0.8,0.666667,0.333333,0.000000,0.005466,1.0,0.5
66,Uganda,SUBSAHARAN AFRICA,2018,0.308788,0.091524,0.429204,0.253731,0.006012,1,0.50,0.76,0.271347,0.430682,0.239651,0.701299,0.465824,0.189573,0.18750,0.419236,0.397753,0.098560,0.188570,0.489161,0.588235,0.762476,0.763993,0.194921,0.642704,0.542011,0.949100,0.645793,0.007955,0.049354,0.558824,0.445285,0.092711,0.000000,0.101009,0.020433,0.346808,0.100970,0.327824,0.195369,0.049565,0.498135,0.206582,0.005620,0.150,0.004809,0.455865,0.324622,0.000000,0.308511,0.146550,0.395062,0.054012,0.442572,0.189189,0.359124,0.116408,0.150562,0.372669,0.210265,0.282497,0.211327,0,1,0.334020,0,1.000000,0.689115,1,1.00,0.75,0,1,1,0.506972,0,1,0.277783,0,0.031399,1.000000,0.644599,0.024490,0.250,0.5,0.0,0.0,0.583333,0.0,1.0,1.0,0.5,0.5,0.5,0.0,1.0,0.5,0.25,0.5,1.0,0.5,0.500000,0.333333,0.309878,0.730562,0.5,0.0
67,Ukraine,RUSSIA AND EURASIA,2018,0.701950,0.290426,0.440967,0.164179,0.038596,1,0.75,0.67,0.772447,0.000000,0.591736,0.041126,0.000000,0.000000,0.59375,0.025894,0.101124,0.306202,0.107789,0.059378,0.549020,0.995707,0.928070,0.723782,0.331428,0.853649,0.766706,0.197652,0.066519,0.951821,0.920168,0.183475,1.000000,1.000000,0.091286,0.510723,0.051064,0.306440,0.390996,0.516896,0.036169,0.000533,0.858503,0.147516,0.050,0.493136,0.824226,0.700231,0.050000,0.946809,0.351720,0.021627,0.054401,0.892986,0.216216,0.154672,0.045455,0.617978,0.123599,0.180060,0.997953,0.111861,0,0,0.178759,1,0.483146,0.522545,1,0.75,0.75,0,1,1,0.639337,0,1,0.581439,0,0.061586,0.061176,0.929152,0.126531,0.500,0.5,0.5,0.0,0.833333,0.5,1.0,1.0,0.5,0.5,1.0,1.0,0.5,0.0,0.25,0.5,1.0,0.5,0.000000,0.333333,0.081191,0.046189,1.0,0.0
68,Vietnam,ASIA,2016,0.563226,0.566312,0.467408,0.134328,0.034523,0,0.50,0.15,0.579445,0.041850,0.765686,0.380952,0.188206,0.255924,0.68750,0.062885,0.233708,0.494186,0.163251,0.179076,0.921569,0.945137,0.943629,0.404707,0.166034,0.738428,0.871947,0.436399,0.072745,0.741481,0.949580,0.124309,0.988914,0.430898,0.563932,0.125545,0.348936,0.050872,0.310750,0.578348,0.125921,0.265583,0.599109,0.001380,0.550,0.152507,0.457394,0.324622,0.076923,0.021277,0.004651,0.041152,0.087970,0.420777,0.270270,0.312296,0.034368,0.258427,0.377571,0.458058,0.948823,0.094364,0,1,0.635938,1,0.224719,0.000000,1,0.00,0.50,0,0,0,0.482691,0,0,0.826566,0,0.051095,0.042353,0.830430,0.085714,0.375,0.5,0.0,0.0,0.083333,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.5,0.0,0.25,0.0,0.0,0.4,0.333333,0.666667,0.070365,0.071594,0.5,0.0


In [18]:
oos_df = pd.read_csv('OOS_Data.csv')
oos_df

Unnamed: 0,Country,Data_year,SLAVERY,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,AIDS_death,AIDS_Orph,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Work_rightCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Phys_secF,Gender_equal,Pop_dens,Urban_population,Rural_population,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Pol_terror,Armedcon
0,Afghanistan,2016,1.13,34.60,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,85.400000,9.27439,57.509158,22.612086,87.528344,27.700001,31.040,31.900000,55.299999,1.378323,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,2.000000,6.39,13.5,27.820000,31.248380,0.000000,0.12,0.425262,89.74,43.54,3.508772,35,35.030000,6.250000,0.862703,0.835789,11,2.796261,6.5,76,2.500000,33.716116,37.4,1,1,0.014739,1,0.770,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500,0.3,51.50,0.4,500,4700.0,4,0,2,0,4,1,0,0,1,1,0,1,0,2,0,4,4,2,17,4,2,54.197114,8670939,25985093,38.48,47.66,0.892198,0.231,547.228110,5.0,1
1,Albania,2016,0.29,1.06,4.8926,23.1,4.900000,9.4,65,29,6.2,4.959,19.0,14.000000,98,98.791190,91.249350,11.82000,75.924076,91.974349,71.682848,20.700001,4.341,93.199997,95.099998,0.514286,100.000000,61.41861,-1.614397,35.025264,5.100000,17.289093,3.932861,60.10,28.2,34.509998,86.202107,3.357200,0.25,1.607038,90.11,67.39,40.000000,62,17.869804,37.500000,5.045726,0.854346,36,3.567322,5.0,158,3.048397,61.408330,98.6,0,1,0.024703,0,0.220,0.000000,1,4,4,0,1,1,0.899040,0,0,-2.001445,1,3700,0.0,76.67,7.3,200,2200.0,6,1,2,1,9,2,1,2,1,1,1,1,1,2,0,1,3,0,8,3,1,104.967190,1680247,1195854,66.91,74.81,8.304178,0.467,4124.108907,2.0,0
2,Albania,2018,0.69,0.40,4.8926,23.1,4.900000,9.4,65,29,6.2,4.959,19.0,13.500000,98,98.791190,91.249350,11.82000,75.924076,96.900000,71.682848,20.700001,4.341,93.199997,95.099998,0.514286,100.000000,61.41861,-1.614397,35.025264,5.100000,13.900000,4.300000,66.40,57.6,34.509998,86.202107,3.357200,0.20,1.607038,90.11,67.39,40.000000,62,17.869804,37.500000,5.045726,0.854346,38,3.567322,5.0,158,3.600000,60.000000,98.6,0,1,0.024703,0,0.220,0.000000,1,4,4,0,1,1,0.899040,0,0,-2.001445,1,1700,0.0,76.67,0.7,100,4200.0,6,1,2,1,9,2,1,2,1,1,1,1,1,2,0,1,3,0,8,3,1,104.612263,1706345,1167112,67.48,77.08,8.855371,0.428,5268.848504,2.0,0
3,Algeria,2016,0.63,0.30,1.3784,11.7,5.000000,4.1,62,140,15.5,5.605,78.0,25.500000,95,91.779641,97.303940,13.96534,28.954424,62.500000,22.507553,31.600000,66.920,87.599998,83.599998,1.901503,100.000000,95.00000,-2.106337,7.288244,5.000000,10.473392,3.376276,18.09,0.0,43.560001,82.414714,34.639917,0.05,3.316038,82.52,60.67,20.000000,77,19.970000,27.118644,6.845082,0.904063,36,3.274635,0.7,162,3.748118,52.770917,99.4,0,1,0.007576,0,0.340,4.110874,1,3,2,0,1,1,0.736499,0,1,-2.277159,0,54000,3.5,65.30,9.2,200,1400.0,4,1,1,1,3,0,1,1,0,1,0,0,1,2,1,1,2,0,9,4,5,17.025957,29016679,11589373,56.73,55.93,1.091525,0.129,3946.421445,2.5,1
4,Algeria,2018,0.27,0.30,1.3784,11.7,4.600000,4.1,62,140,15.6,5.605,78.0,25.200000,95,91.779641,97.303940,13.96534,28.954424,77.600000,22.507553,31.600000,66.920,87.599998,83.599998,1.901503,100.000000,95.00000,-2.106337,7.288244,5.000000,10.000000,3.500000,42.90,65.7,43.560001,82.414714,34.639917,0.10,3.316038,82.52,60.67,20.000000,77,19.970000,27.118644,6.845082,0.904063,33,3.274635,0.7,162,3.800000,58.000000,99.6,0,1,0.007576,0,0.340,4.110874,1,3,2,0,1,1,0.736499,0,1,-2.277159,0,54000,3.5,65.30,9.2,200,1500.0,4,1,1,1,3,0,1,1,0,1,0,0,1,2,1,1,2,0,9,3,5,17.730075,29770548,11547594,56.78,57.96,0.978948,0.139,4114.715061,3.0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
358,Samoa,2018,1.09,0.00,1.2059,46.5,3.200000,4.3,65,51,9.2,4.297,68.0,17.300000,85,98.714789,92.097321,15.09423,75.097108,83.902084,86.951074,0.000000,37.860,93.000000,90.000000,2.567086,97.900000,67.32469,-1.353223,34.711929,3.300000,8.200000,3.500000,29.40,22.5,36.209999,71.828471,5.250645,0.00,1.103999,67.30,58.60,58.333333,76,42.412233,61.111111,4.280834,0.817721,32,3.881608,6.1,422,4.184283,61.795845,58.6,0,0,0.024703,0,0.630,0.000000,1,4,2,0,1,0,0.401608,0,1,-1.736349,0,2200,7.7,84.40,4.1,100,7500.0,3,1,1,1,11,1,1,2,0,2,0,1,1,2,1,1,3,0,8,3,0,69.303887,36247,160193,52.52,80.39,2.230980,0.405,4183.407935,1.0,0
359,Yemen,2016,1.13,86.00,0.9627,46.5,26.100000,16.3,55,385,22.1,4.077,48.0,41.900002,75,87.414620,98.463520,9.15499,57.322176,34.383202,14.903130,0.000000,168.600,53.299999,54.900002,2.811765,48.406712,66.76764,-15.240292,4.896862,22.700001,15.942108,2.200000,22.55,0.2,37.689999,73.606072,0.487950,0.04,0.919968,83.50,57.87,27.272727,69,11.190000,28.571429,0.000000,0.882363,18,2.796261,4.8,55,2.800000,58.443890,17.1,0,0,-0.041667,1,0.135,7.748460,1,3,0,1,0,0,0.191787,1,1,-2.907491,0,54000,0.0,34.88,3.0,500,3000.0,0,0,0,0,2,0,1,0,0,1,0,0,0,1,0,0,4,2,13,4,4,51.457867,9763156,17821057,48.78,37.42,1.152626,0.040,1139.870568,5.0,1
360,Yemen,2018,0.31,86.00,0.9627,46.8,28.800000,16.2,55,385,26.8,4.077,48.0,55.300000,75,87.414620,93.088640,9.15499,57.322176,45.200000,14.903130,0.000000,168.600,53.299999,54.900002,2.811765,72.000000,66.76764,-15.240292,4.896862,22.700000,13.800000,2.200000,24.60,5.7,37.689999,60.871443,0.487950,0.00,0.919968,83.50,57.87,27.272727,69,11.190000,28.571429,0.000000,0.882363,16,2.852292,4.8,55,2.800000,52.000000,30.7,0,0,-0.041667,1,0.135,7.748460,1,3,0,1,0,0,0.191787,1,1,-2.907491,0,54000,0.0,34.88,9.1,500,4000.0,0,0,0,0,2,0,1,0,0,1,0,0,0,1,0,0,4,2,13,4,4,53.977853,10174671,18075749,49.60,47.00,0.046095,0.040,944.408499,5.0,1
361,Zambia,2016,0.67,64.43,2.7554,40.0,47.799999,6.3,50,224,21.4,5.129,406.0,64.000000,85,64.049637,91.395590,13.50000,36.321070,80.219780,86.951074,12.700000,1.500,43.900002,65.400002,0.255639,22.062559,17.16012,-3.123845,9.585456,40.599998,10.658291,3.581099,17.34,0.7,57.490002,36.166429,4.200000,0.20,0.212450,79.20,58.60,11.538462,88,12.450000,23.255814,4.256581,0.879240,38,3.986408,10.7,119,4.549865,36.419808,14.0,0,0,0.026586,0,0.780,0.000000,1,4,4,0,1,1,0.738819,0,0,-2.093788,1,18000,48.8,78.50,19.0,17000,530000.0,4,1,1,0,8,1,1,2,1,1,1,1,0,2,1,1,3,1,9,3,2,22.012009,7041054,9550336,57.42,73.96,6.156355,0.304,1280.578447,3.0,0


In [19]:
print(oos_df.columns.to_list())

['Country', 'Data_year', 'SLAVERY', 'Poverty', 'Cereal_yield', 'Stunting_u5s', 'Undernourish', 'Wasting_u5s', 'Life_expect', 'Maternal_mort', 'Neonatal_mort', 'Wellbeing', 'Tuberculosis', 'Infant_mort', 'Infant_vaccines', 'Literacy_15_24yrs', 'Primary_school', 'Yrs_of_school', 'Lack_contraception', 'F_school', 'M_school', 'F_parliam', 'Freshwater', 'Sanitation', 'Water_acc', 'Co2_fuel', 'Electric_acc', 'Fuel_acc', 'Growth_rate', 'ATMs', 'Child_lab', 'Unemploy', 'Infrastruct', 'Internet_use', 'Broadband', 'Inequality', 'Piped_water', 'Treated_waste', 'Climate_chg_vuln', 'Co2_energy_pc', 'Ocean_biodiv', 'Ocean_clean', 'Ocean_protect', 'Ocean_fisheries', 'Fish_overexploit', 'Terrestrial_protect', 'Forest_change', 'Species_survival', 'CPI', 'Gov_efficien', 'Homicides', 'Prison_pop', 'Property_rights', 'Safe_night', 'Regist_birth', 'Masskill_ongo', 'Masskill_ever', 'GDPpc_growth', 'Minority_rule', 'Ethnic_fract', 'Battle_deaths', 'Pol_cand_restr', 'Party_ban', 'Relig_freeMK', 'Polkill_apprv

First, we will make three prediction models for 2016 and after that, three prediction models for 2018.
### 2016

Because we have one different column in oos_df and imputed_training_df, we need to keep the common columns in a variable in order to train with them the models and also  keep only 2016 rows to split the data in train and validation.

In [20]:
target = 'SLAVERY'
com_cols = list(set(imputed_training_df.columns) & set(oos_df.columns))
imputed_training_com = imputed_training_df[com_cols]
oos_com = oos_df[com_cols]
X_train = imputed_training_com[imputed_training_com['Data_year'] == 2016].drop(target, axis=1)
y_train = imputed_training_com[imputed_training_com['Data_year'] == 2016][target]
X_val = oos_com[oos_com['Data_year'] == 2016].drop(target, axis=1)
y_val = oos_com[oos_com['Data_year'] == 2016][target]
#X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=21)

We use *OneHotEncoder* for categorical variables.

In [21]:
encoder = OneHotEncoder(handle_unknown='ignore')
X_train_encoded = encoder.fit_transform(X_train[['Country']])
X_val_encoded = encoder.transform(X_val[['Country']])

X_train_final = np.concatenate((X_train.drop(['Country'], axis=1).values, X_train_encoded.toarray()), axis=1)
X_val_final = np.concatenate((X_val.drop(['Country'], axis=1).values, X_val_encoded.toarray()), axis=1)

Let's now fit our first model *(Linear)*.

In [22]:
# Fit the model
lr_model = LinearRegression().fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_lr, r2_lr = mean_absolute_error(y_val, lr_model.predict(X_val_final)), r2_score(y_val, lr_model.predict(X_val_final))
print('Linear Regression 2016:\nMAE:', mae_lr, '\nR^2:', r2_lr)

Linear Regression 2016:
MAE: 2299.8925403683984 
R^2: -52233101.8113071


This model isn't good so we won't analyze it. Let's move to *LASSO* selection and then apply again *Linear*:

In [23]:
# perform LASSO regression for variable selection
lasso_model = Lasso(alpha=0.06)
lasso_model.fit(X_train_final, y_train)
selected_features = lasso_model.coef_ != 0
X_train_lasso = X_train_final[:, selected_features]
X_val_lasso = X_val_final[:, selected_features]

# fit the linear regression model on the selected features
lr_model_lasso = LinearRegression()
lr_model_lasso.fit(X_train_lasso, y_train)

# make predictions
y_pred_lr_lasso = lr_model_lasso.predict(X_val_lasso)

# evaluate the model
mae_lr_lasso = mean_absolute_error(y_val, y_pred_lr_lasso)
r2_lr_lasso = r2_score(y_val, y_pred_lr_lasso)
print('Linear Regression with LASSO 2016:')
print(f'MAE: {mae_lr_lasso}\nR^2: {r2_lr_lasso}')

Linear Regression with LASSO 2016:
MAE: 0.47597232795915473
R^2: -4.932299835627163


Much better results for *MAE* but *$R^2$* is still negative so we don't have predictive power yet.<br>
Below are the coefficients for *Linear* model with *LASSO*:

In [24]:
feature_names = oos_df.columns.to_list()
# Inspect the selected features
selected_features = np.where(lasso_model.coef_ != 0)[0]
# Print the corresponding feature names
selected_feature_names = [feature_names[i] for i in selected_features if i < len(feature_names)]
print("Selected feature names:", selected_feature_names)
print('Lasso parameters:')
print('Intercept:',lasso_model.intercept_)
for feature, coef in zip(feature_names, lasso_model.coef_):
    if coef!=0.0:
        print(feature, ':', coef)

Selected feature names: ['ATMs', 'Rape_report']
Lasso parameters:
Intercept: 0.6351810713795726
ATMs : -0.046629642416153645
Rape_report : -0.014987779148010139


Let's move to *Decision Trees* now.

In [25]:
# Fit the model
dt_model = DecisionTreeRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_dt, r2_dt = mean_absolute_error(y_val, dt_model.predict(X_val_final)), r2_score(y_val, dt_model.predict(X_val_final))
print('Decision Tree 2016:')
print(f'MAE: {mae_dt}\nR^2: {r2_dt}\n10 Cross Validation Scores: {cross_val_score(dt_model, X_val_final, y_val, cv=10)}')

Decision Tree 2016:
MAE: 0.43905759162303665
R^2: -0.06394681278881853
10 Cross Validation Scores: [ -7.36722973  -9.19925786   0.33112282  -1.10985192  -0.27273013
 -11.81209881 -11.9843608   -7.52301075   0.4672161   -0.13315881]


Lastly, we have a *Random Forest* algorithm.

In [26]:
# Fit the model
rf_model = RandomForestRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_rf, r2_rf = mean_absolute_error(y_val, rf_model.predict(X_val_final)), r2_score(y_val, rf_model.predict(X_val_final))
print('Random Forest 2016:')
print(f'MAE: {mae_rf}\nR^2: {r2_rf}\n10 Cross Validation Scores: {cross_val_score(rf_model, X_val_final, y_val, cv=10)}')

Random Forest 2016:
MAE: 0.3604743455497382
R^2: 0.017037619228432055
10 Cross Validation Scores: [ 6.11847153e-01  8.31119078e-03 -1.00155284e+01 -1.24300363e-01
  7.08002086e-02 -1.70735623e+00 -7.72349917e+00 -2.52799166e+00
  7.95135910e-01  5.21667720e-02]


Below, we will print the best model for **2016**:

In [27]:
# choose the model with the lowest MAE on the validation set
mae_vals = [mae_lr, mae_lr_lasso, mae_dt, mae_rf]
r2_vals = [ r2_lr, r2_lr_lasso, r2_dt, r2_rf]
best_model_idx = (np.argmin(mae_vals) & np.argmax(r2_vals))
best_model = [lr_model, lr_model_lasso, dt_model, rf_model][best_model_idx]
print(f'Best model for 2016: {best_model}, MAE: {mae_vals[best_model_idx]}, R^2: {r2_vals[best_model_idx]}')

Best model for 2016: RandomForestRegressor(random_state=21), MAE: 0.3604743455497382, R^2: 0.017037619228432055


Now, we need to do the same procedure for 2018 as well.
### 2018

Because we have one different column in oos_df and imputed_training_df, we need to keep the common columns in a variable in order to train with them the models and also  keep only 2018 rows to split the data in train and validation.

In [28]:
X_train = imputed_training_com[imputed_training_com['Data_year'] == 2018].drop(target, axis=1)
y_train = imputed_training_com[imputed_training_com['Data_year'] == 2018][target]
X_val = oos_com[oos_com['Data_year'] == 2018].drop(target, axis=1)
y_val = oos_com[oos_com['Data_year'] == 2018][target]
#X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=21)

We use *OneHotEncoder* for categorical variables.

In [29]:
encoder = OneHotEncoder(handle_unknown='ignore')
X_train_encoded = encoder.fit_transform(X_train[['Country']])
X_val_encoded = encoder.transform(X_val[['Country']])

X_train_final = np.concatenate((X_train.drop(['Country'], axis=1).values, X_train_encoded.toarray()), axis=1)
X_val_final = np.concatenate((X_val.drop(['Country'], axis=1).values, X_val_encoded.toarray()), axis=1)

Let's now fit our first model *(Linear)*.

In [30]:
# Fit the model
lr_model = LinearRegression().fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_lr, r2_lr = mean_absolute_error(y_val, lr_model.predict(X_val_final)), r2_score(y_val, lr_model.predict(X_val_final))
print('Linear Regression 2018:\nMAE:', mae_lr, '\nR^2:', r2_lr)

Linear Regression 2018:
MAE: 4496.550551497466 
R^2: -101354378.67347732


This model isn't good so we won't analyze it. Let's move to *LASSO* selection and then apply again *Linear*:

In [31]:
# perform LASSO regression for variable selection
lasso_model = Lasso(alpha=0.004)
lasso_model.fit(X_train_final, y_train)
selected_features = lasso_model.coef_ != 0.0
X_train_lasso = X_train_final[:, selected_features]
X_val_lasso = X_val_final[:, selected_features]

# fit the linear regression model on the selected features
lr_model_lasso = LinearRegression()
lr_model_lasso.fit(X_train_lasso, y_train)

# evaluate the model
mae_lr_lasso = mean_absolute_error(y_val, lr_model_lasso.predict(X_val_lasso))
r2_lr_lasso = r2_score(y_val, lr_model_lasso.predict(X_val_lasso))
print('Linear Regression with LASSO 2018:')
print(f'MAE: {mae_lr_lasso}\nR^2: {r2_lr_lasso}')

Linear Regression with LASSO 2018:
MAE: 50.71036157005082
R^2: -2318.320793766741


Much better results for *MAE* and *$R^2$* is first time positive so we start to gain predictive power.<br>
Below are the coefficients for *Linear* model with *LASSO*:

In [32]:
feature_names = oos_df.columns.to_list()
# Inspect the selected features
selected_features = np.where(lasso_model.coef_ != 0.0)[0]
# Print the corresponding feature names
selected_feature_names = [feature_names[i] for i in selected_features if i < len(feature_names)]
print("Selected feature names:", selected_feature_names)
print('Lasso parameters:')
print('Intercept:',lasso_model.intercept_)
for feature, coef in zip(feature_names, lasso_model.coef_):
    if coef!=0.0:
        print(feature, ':', coef)

Selected feature names: ['Country', 'SLAVERY', 'Stunting_u5s', 'Maternal_mort', 'Infant_mort', 'Literacy_15_24yrs', 'Sanitation', 'Electric_acc', 'Fuel_acc', 'Child_lab', 'Internet_use', 'Broadband', 'CPI', 'Regist_birth', 'Ethnic_fract', 'Soc_powerdist', 'Sexwrk_Syphilis', 'Extrajud_kill', 'Econ_right_F', 'Pol_right_F', 'Rape_enclave', 'FDI', 'Armedcon']
Lasso parameters:
Intercept: 1.3553923798293044
Country : -0.24558316452910736
SLAVERY : 0.04509027454647682
Stunting_u5s : 0.031154198248103736
Maternal_mort : -0.3908345347695219
Infant_mort : 0.06283340492045408
Literacy_15_24yrs : 0.0891291161463455
Sanitation : -0.00763401942283349
Electric_acc : 0.22724725840528154
Fuel_acc : 0.14612880248259924
Child_lab : -0.04547194756187166
Internet_use : -0.25020148337847753
Broadband : -0.3235121143028767
CPI : 0.7562430188870312
Regist_birth : -0.05359505561379676
Ethnic_fract : -0.02310438417120644
Soc_powerdist : -0.2569747554321673
Sexwrk_Syphilis : 0.04442181623893902
Extrajud_kill : 

Let's move to *Decision Trees* now.

In [33]:
# Fit the model
dt_model = DecisionTreeRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_dt, r2_dt = mean_absolute_error(y_val, dt_model.predict(X_val_final)), r2_score(y_val, dt_model.predict(X_val_final))
print('Decision Tree 2018:')
print(f'MAE: {mae_dt}\nR^2: {r2_dt}\n10 Cross Validation Scores: {cross_val_score(dt_model, X_val_final, y_val, cv=10)}')

Decision Tree 2018:
MAE: 1.7449999999999999
R^2: -1.9603735106551623
10 Cross Validation Scores: [ 2.13132328e-01 -1.88174687e-01 -9.18369331e-02 -4.80570309e-01
 -5.66618851e-03  7.29743703e-03 -1.81665615e+00 -9.46012664e+00
 -1.70759593e+01 -7.36929021e+00]


Lastly, we have a *Random Forest* algorithm.

In [34]:
# Fit the model
rf_model = RandomForestRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_rf, r2_rf = mean_absolute_error(y_val, rf_model.predict(X_val_final)), r2_score(y_val, rf_model.predict(X_val_final))
print('Random Forest 2018:')
print(f'MAE: {mae_rf}\nR^2: {r2_rf}\n10 Cross Validation Scores: {cross_val_score(rf_model, X_val_final, y_val, cv=10)}')

Random Forest 2018:
MAE: 0.6435622093023259
R^2: -0.0628052969054167
10 Cross Validation Scores: [ 0.31986712  0.58842843  0.59923473  0.03694561  0.56017334 -0.16650703
  0.28311677 -0.00348982 -0.46298449 -1.58888195]


Below, we will print the best model for **2018**:

In [35]:
# choose the model with the lowest MAE on the validation set
mae_vals = [mae_lr, mae_lr_lasso, mae_dt, mae_rf]
r2_vals = [ r2_lr, r2_lr_lasso, r2_dt, r2_rf]
best_model_idx = np.argmin(mae_vals)
best_model = [lr_model, lr_model_lasso, dt_model, rf_model][best_model_idx]
print(f'Best model for 2018: {best_model}, MAE: {mae_vals[best_model_idx]}, R^2: {r2_vals[best_model_idx]}')

Best model for 2018: RandomForestRegressor(random_state=21), MAE: 0.6435622093023259, R^2: -0.0628052969054167


## $3^{rd}$ Question : Slavery Estimation with Theory-based Features<a class="anchor" id="q1"></a>

In [36]:
imputed_training_com_q3 = imputed_training_com.copy()
imputed_training_com_q3

Unnamed: 0,Free_discuss,Ocean_fisheries,Work_rightCIRI,Ethnic_fract,Poverty,Freemv_dom,Neonatal_mort,Free_polit,Trade_openMK,Water_acc,AIDS_Orph,M_school,Polkill_apprvd,Species_survival,Relig_freeCIRI,Free_assem,Electric_acc,Treated_waste,Pol_right_F,Fish_overexploit,Rape_report,Extrajud_kill,Pol_terror,Regist_birth,Trade_open,Primary_school,Battle_deaths,Masskill_ongo,Broadband,Indep_judic,Fuel_acc,ATMs,CPI,Relig_freeMK,Internet_use,Maternal_mort,VDEM_Libdem,Terrestrial_protect,Unemploy,Pol_impris,Wasting_u5s,Democ,Literacy_15_24yrs,Sexwrk_HIV,Homicides,Sexwrk_condom,Stunting_u5s,Pol_cand_restr,Forest_change,Ocean_protect,Growth_rate,Co2_energy_pc,Masskill_ever,F_parliam,Gender_equal,Child_lab,Undernourish,Ocean_clean,Infant_mort,Rape_enclave,Rape_prev,Country,Wellbeing,Infrastruct,Rape_compl,Torture,Social_ineq,Co2_fuel,Prison_pop,Free_speech,Inequality,Party_ban,Tuberculosis,AIDS_death,Property_rights,Armedcon,Minority_rule,Data_year,Gov_efficien,Infant_vaccines,SLAVERY,Econ_right_F,F_school,Freemv_M,Sanitation,Lack_contraception,KOF_Globalis,FDI,GDPpc,Physrights_indx,Sexwrk_Syphilis,Life_expect,Climate_chg_vuln,Freemv_foreign,GDPpc_growth,Freshwater,Freemv_F,Safe_night,Yrs_of_school,Polrights_indx,Piped_water,Soc_powerdist,Ocean_biodiv,Sexwrk_size,Cereal_yield,Phys_secF
0,0.491216,0.010638,0.5,0.820225,0.141358,0.0,0.775281,0.5,0.361693,0.060924,0.003387,0.121232,0,0.634798,0.0,0.5,0.368071,0.000000,0.5,0.146550,1.0,0.0,1.00,0.359263,0.121572,0.220364,1.000000,1,0.000000,0.0,0.109631,0.001839,0.000000,0.50,0.053692,0.484587,0.194030,0.092593,0.296855,1.0,0.436019,0,0.000000,0.003529,0.069845,0.436702,0.846320,1,0.006239,0.000000,0.225746,0.029766,1,0.502935,0.333333,0.219149,0.490714,0.057648,0.833176,1.0,1.00,Afghanistan,0.000000,0.233757,1.0,0.0,0,0.445285,0.103371,0.5,0.059137,1.00,0.222329,0.005413,0.339327,1,1,2018,0.359124,0.352941,2.22,0.0,0.000000,0,0.199765,0.655584,0.000000,0.406324,0.004849,0.500,0.016327,0.18750,0.250,0.0,0.188840,0.244864,0,0.000000,0.146244,0.333333,0.287684,1,0.000000,0.008706,0.239843,1.0
1,0.979599,0.436170,0.5,0.247191,0.000000,1.0,0.119101,1.0,0.068583,0.981092,0.023095,0.570649,0,0.695511,1.0,1.0,0.997783,0.117881,1.0,0.533636,1.0,0.5,0.25,0.994882,0.018650,0.872015,0.000000,0,0.215003,0.5,1.000000,0.320172,0.283784,1.00,0.783479,0.060419,0.746269,0.217865,0.201560,1.0,0.042654,1,0.985634,0.063529,0.058758,0.957027,0.138528,1,0.122888,0.176471,0.467704,0.358160,1,0.677104,0.666667,0.093617,0.073330,0.464002,0.092366,0.0,0.25,Argentina,0.830288,0.197044,0.2,0.0,0,0.173410,0.262921,0.5,0.503197,1.00,0.023390,0.021651,0.104765,0,0,2018,0.000000,0.901961,0.13,1.0,0.865606,1,0.957697,0.212390,0.535832,0.407511,0.225950,0.625,0.053061,0.71875,0.075,1.0,0.217296,0.033574,1,0.157257,1.000000,1.000000,0.980227,0,0.711502,0.053236,0.614533,0.0
2,0.876364,0.702128,0.5,0.101124,0.031614,0.5,0.143820,0.5,0.528756,1.000000,0.000000,0.712650,0,0.648759,0.0,0.5,1.000000,0.226646,0.5,0.686783,0.5,0.5,0.50,0.995906,0.190647,0.560082,0.000000,0,0.207636,0.0,0.849861,0.313745,0.324324,0.75,0.553191,0.027127,0.179104,0.155945,0.516489,0.5,0.184834,0,0.996365,0.007059,0.017738,0.989547,0.411255,1,0.003556,0.000000,0.552045,0.128709,0,0.170254,0.000000,0.082979,0.088646,0.700231,0.107446,0.0,0.00,Armenia,0.214563,0.510507,0.2,0.0,0,0.064213,0.301124,0.5,0.151838,1.00,0.048710,0.001353,0.390413,0,1,2018,0.310501,0.882353,0.34,0.5,0.860320,1,0.876616,0.676455,0.625027,0.435787,0.066912,0.500,0.583673,0.56250,0.075,0.5,0.708861,0.298757,1,0.857428,0.445839,0.500000,0.996547,0,0.764616,0.003069,0.388528,0.5
3,0.459313,0.010638,0.0,0.011236,0.565561,0.5,0.501124,0.5,0.194113,0.724790,0.002156,0.282589,0,0.473914,0.5,0.5,0.552106,0.000000,0.5,0.025308,1.0,0.0,0.75,0.288639,0.059508,0.765825,0.488248,0,0.012726,0.0,0.047337,0.047622,0.189189,0.50,0.093867,0.213317,0.074627,0.155945,0.128867,0.5,0.663507,0,0.621570,0.002353,0.027716,0.613240,0.742424,1,0.032749,0.333333,0.518092,0.025539,1,0.352250,0.666667,0.272340,0.291595,0.148347,0.328935,0.5,0.25,Bangladesh,0.309801,0.164575,0.5,0.0,0,0.141545,0.026966,0.5,0.173681,1.00,0.268146,0.005413,0.258563,1,1,2016,0.150581,0.803922,0.95,0.5,0.659472,1,0.537015,0.223684,0.152750,0.412476,0.019589,0.250,0.085714,0.53125,0.600,1.0,0.685980,0.022677,0,0.794455,0.215217,0.583333,0.298899,1,0.905617,0.099693,0.592475,1.0
4,0.767170,0.670213,0.5,0.786517,0.099767,1.0,0.417978,1.0,0.366761,0.789916,0.014627,0.696551,0,0.712479,1.0,0.5,0.894678,0.113363,1.0,0.494700,0.5,0.5,0.25,0.752303,0.123230,0.490348,0.000000,0,0.093101,0.0,0.733015,0.185897,0.310811,1.00,0.462078,0.250308,0.432836,0.396825,0.102508,0.5,0.061611,1,0.981059,0.050588,0.131929,0.871080,0.352814,1,0.049998,0.000000,0.465240,0.122981,0,1.000000,0.000000,0.561702,0.282022,0.151166,0.336475,0.0,0.75,Bolivia,0.640919,0.273237,0.3,0.5,0,0.315639,0.247191,0.5,0.817528,1.00,0.139137,0.012179,0.253340,0,0,2016,0.247509,0.901961,0.44,0.5,0.694434,1,0.415981,0.548398,0.420118,0.417764,0.053336,0.625,0.195918,0.46875,0.225,1.0,0.358453,0.002454,1,0.173799,0.528036,0.916667,0.959978,1,0.150554,0.009420,0.227632,0.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,0.784899,0.606383,0.5,0.000000,0.025784,0.0,0.161798,0.5,0.573112,0.951681,0.005466,0.190932,0,0.962210,0.0,0.5,1.000000,0.278706,0.5,0.192516,1.0,0.0,0.50,0.991812,0.244249,0.964584,0.000000,0,0.206966,0.0,1.000000,0.124217,0.364865,0.75,0.551439,0.072750,0.820896,0.020576,0.468029,0.0,0.118483,1,0.949151,0.014118,0.022173,0.515679,0.179654,1,0.055539,0.000000,0.266716,0.186723,0,0.573386,0.666667,0.044681,0.073330,0.736869,0.106503,1.0,1.00,Tunisia,0.322259,0.359571,0.8,0.0,0,0.169694,0.379775,0.5,0.278636,1.00,0.034242,0.000000,0.494054,0,0,2018,0.334820,0.980392,0.22,0.5,0.594337,1,0.901293,0.261992,0.576156,0.417116,0.065361,0.250,0.583673,0.68750,0.125,1.0,0.120936,0.550443,1,0.479267,0.672812,0.583333,0.943357,0,0.764616,0.017626,0.212034,1.0
66,0.506972,0.308511,0.0,1.000000,0.430682,1.0,0.397753,0.5,0.277783,0.558824,0.730562,0.949100,0,0.442572,0.5,0.0,0.092711,0.005620,1.0,0.146550,0.5,0.5,0.50,0.282497,0.091524,0.763993,0.689115,0,0.049565,0.5,0.000000,0.020433,0.189189,0.75,0.195369,0.419236,0.253731,0.395062,0.100970,0.0,0.189573,0,0.762476,1.000000,0.116408,0.644599,0.701299,1,0.054012,0.000000,0.101009,0.004809,1,0.645793,0.500000,0.346808,0.465824,0.324622,0.489161,1.0,0.25,Uganda,0.098560,0.327824,0.5,0.0,0,0.445285,0.150562,0.5,0.498135,1.00,0.188570,0.309878,0.372669,1,0,2018,0.359124,0.588235,0.76,0.0,0.542011,1,0.049354,0.642704,0.308788,0.429204,0.006012,0.250,0.024490,0.18750,0.150,1.0,0.334020,0.007955,1,0.210265,0.194921,0.583333,0.206582,1,0.455865,0.031399,0.239651,0.5
67,0.639337,0.946809,0.0,0.483146,0.000000,1.0,0.101124,0.5,0.581439,0.920168,0.046189,0.766706,0,0.892986,1.0,0.5,1.000000,0.147516,0.5,0.351720,0.5,0.5,0.75,0.997953,0.290426,0.928070,0.522545,0,0.036169,0.0,1.000000,0.510723,0.216216,0.75,0.516896,0.025894,0.164179,0.021627,0.306440,0.5,0.000000,0,0.995707,0.061176,0.045455,0.929152,0.041126,1,0.054401,0.050000,0.091286,0.493136,0,0.197652,0.000000,0.051064,0.000000,0.700231,0.059378,1.0,0.25,Ukraine,0.306202,0.390996,0.5,0.0,0,0.183475,0.617978,0.5,0.000533,0.75,0.107789,0.081191,0.123599,1,1,2018,0.154672,0.549020,0.67,1.0,0.853649,1,0.951821,0.331428,0.701950,0.440967,0.038596,0.500,0.126531,0.59375,0.050,1.0,0.178759,0.066519,1,0.180060,0.723782,0.833333,0.858503,1,0.824226,0.061586,0.591736,1.0
68,0.482691,0.021277,0.0,0.224719,0.041850,0.5,0.233708,0.0,0.826566,0.949580,0.071594,0.871947,0,0.420777,0.0,0.0,0.988914,0.001380,0.5,0.004651,0.0,0.5,0.50,0.948823,0.566312,0.943629,0.000000,0,0.125921,0.0,0.430898,0.125545,0.270270,0.50,0.578348,0.062885,0.134328,0.041152,0.050872,0.0,0.255924,0,0.945137,0.042353,0.034368,0.830430,0.380952,1,0.087970,0.076923,0.563932,0.152507,1,0.436399,0.333333,0.348936,0.188206,0.324622,0.179076,0.0,0.25,Vietnam,0.494186,0.310750,0.4,0.0,0,0.124309,0.258427,0.0,0.265583,0.00,0.163251,0.070365,0.377571,0,1,2016,0.312296,0.921569,0.15,0.5,0.738428,0,0.741481,0.166034,0.563226,0.467408,0.034523,0.375,0.085714,0.68750,0.550,0.0,0.635938,0.072745,0,0.458058,0.404707,0.083333,0.599109,0,0.457394,0.051095,0.765686,0.5


We found in the GitHub of the paper the below csv file that resarchers used. [Variable_descriptions.csv](https://github.com/ml-slavery/ml-slavery/blob/main/Data/Meta_Data/Variable_descriptions.csv)

In [37]:
Variable_selected = pd.read_csv("https://raw.githubusercontent.com/ml-slavery/ml-slavery/main/Data/Meta_Data/Variable_descriptions.csv")
Variable_selected

Unnamed: 0,Variable Name,Short Variable Description,Source Extracted from,Cited Original Source,Theory Selected?
0,AIDS_death,Number of AIDS-related deaths,UNAIDS,UNAIDS,N
1,AIDS_Orph,AIDS orphans (0-17),UNAIDS,UNAIDS,Y
2,Armedcon,Is there the presence of any type of armed con...,Silverman and Landman (2019),Uppsala Conflict Data Project (UCDP),Y
3,ATMs,"Automated teller machines (per 100,000)",UN's SDGs dataset (2018),IMF Financial Access Survey (2015),Y
4,Battle_deaths,Battle-related deaths (log of battle related d...,Early Warning Project,Peace Research Institute Oslo (PRIO) and Uppsa...,N
...,...,...,...,...,...
101,Wasting_u5s,"Prevalence of wasting, under- 5s (%)",UN's SDGs dataset (2018),"UNICEF, WHO & WB (2015)",Y
102,Water_acc,Access to improved water (%),UN's SDGs dataset (2018),WHO and UNICEF (2016),N
103,Wellbeing,Subjective wellbeing (0-10),UN's SDGs dataset (2018),Helliwel et al. (2015),N
104,Work_rightCIRI,Workers rights (0-2): a minimum age for the em...,Silverman and Landman (2019),The Cingranelli-Richards (CIRI) Human Rights D...,Y


We keep only the Theory Selected = Y Variable Name and we create a new df with them.

In [38]:
selected_var = Variable_selected.loc[Variable_selected['Theory Selected?'] == 'Y', 'Variable Name']
selected_var = selected_var.reset_index(drop=True).rename("variable_selected", inplace=True)
selected_var.index += 1
print(selected_var.size)
selected_var_df = pd.DataFrame({'variable_selected': selected_var})
selected_var_df

34


Unnamed: 0,variable_selected
1,AIDS_Orph
2,Armedcon
3,ATMs
4,Broadband
5,Child_lab
6,Climate_chg_vuln
7,CPI
8,Democ
9,F_school
10,Free_discuss


We also want to keep Country, Region, Data_year and SLAVERY.

In [39]:
col_to_keep = ['Country', 'Region', 'Data_year', 'SLAVERY']
common_cols = list(set(com_cols) & set([val for row in selected_var_df.values for val in row] + col_to_keep))
common_cols

['Broadband',
 'Free_discuss',
 'Freemv_M',
 'KOF_Globalis',
 'ATMs',
 'CPI',
 'Work_rightCIRI',
 'GDPpc',
 'Internet_use',
 'Maternal_mort',
 'Poverty',
 'Infrastruct',
 'Unemploy',
 'Sexwrk_Syphilis',
 'Climate_chg_vuln',
 'Wasting_u5s',
 'Democ',
 'Literacy_15_24yrs',
 'Neonatal_mort',
 'Freemv_F',
 'AIDS_Orph',
 'Sexwrk_condom',
 'Stunting_u5s',
 'Rape_report',
 'Soc_powerdist',
 'Armedcon',
 'Minority_rule',
 'Trade_open',
 'Phys_secF',
 'Data_year',
 'SLAVERY',
 'Gender_equal',
 'Child_lab',
 'F_school',
 'Undernourish',
 'Rape_enclave',
 'Country']

In the below step, we keep from imputed_training_q3 only the common cols with the provided dataset and we do the same with OOS dataset as well.

In [40]:
imputed_training_com_q3 = imputed_training_com_q3[common_cols]
oos_com_q3 = oos_df[common_cols]
imputed_training_com_q3

Unnamed: 0,Broadband,Free_discuss,Freemv_M,KOF_Globalis,ATMs,CPI,Work_rightCIRI,GDPpc,Internet_use,Maternal_mort,Poverty,Infrastruct,Unemploy,Sexwrk_Syphilis,Climate_chg_vuln,Wasting_u5s,Democ,Literacy_15_24yrs,Neonatal_mort,Freemv_F,AIDS_Orph,Sexwrk_condom,Stunting_u5s,Rape_report,Soc_powerdist,Armedcon,Minority_rule,Trade_open,Phys_secF,Data_year,SLAVERY,Gender_equal,Child_lab,F_school,Undernourish,Rape_enclave,Country
0,0.000000,0.491216,0,0.000000,0.001839,0.000000,0.5,0.004849,0.053692,0.484587,0.141358,0.233757,0.296855,0.016327,0.250,0.436019,0,0.000000,0.775281,0,0.003387,0.436702,0.846320,1.0,1,1,1,0.121572,1.0,2018,2.22,0.333333,0.219149,0.000000,0.490714,1.0,Afghanistan
1,0.215003,0.979599,1,0.535832,0.320172,0.283784,0.5,0.225950,0.783479,0.060419,0.000000,0.197044,0.201560,0.053061,0.075,0.042654,1,0.985634,0.119101,1,0.023095,0.957027,0.138528,1.0,0,0,0,0.018650,0.0,2018,0.13,0.666667,0.093617,0.865606,0.073330,0.0,Argentina
2,0.207636,0.876364,1,0.625027,0.313745,0.324324,0.5,0.066912,0.553191,0.027127,0.031614,0.510507,0.516489,0.583673,0.075,0.184834,0,0.996365,0.143820,1,0.000000,0.989547,0.411255,0.5,0,0,1,0.190647,0.5,2018,0.34,0.000000,0.082979,0.860320,0.088646,0.0,Armenia
3,0.012726,0.459313,1,0.152750,0.047622,0.189189,0.0,0.019589,0.093867,0.213317,0.565561,0.164575,0.128867,0.085714,0.600,0.663507,0,0.621570,0.501124,0,0.002156,0.613240,0.742424,1.0,1,1,1,0.059508,1.0,2016,0.95,0.666667,0.272340,0.659472,0.291595,0.5,Bangladesh
4,0.093101,0.767170,1,0.420118,0.185897,0.310811,0.5,0.053336,0.462078,0.250308,0.099767,0.273237,0.102508,0.195918,0.225,0.061611,1,0.981059,0.417978,1,0.014627,0.871080,0.352814,0.5,1,0,0,0.123230,0.5,2016,0.44,0.000000,0.561702,0.694434,0.282022,0.0,Bolivia
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,0.206966,0.784899,1,0.576156,0.124217,0.364865,0.5,0.065361,0.551439,0.072750,0.025784,0.359571,0.468029,0.583673,0.125,0.118483,1,0.949151,0.161798,1,0.005466,0.515679,0.179654,1.0,0,0,0,0.244249,1.0,2018,0.22,0.666667,0.044681,0.594337,0.073330,1.0,Tunisia
66,0.049565,0.506972,1,0.308788,0.020433,0.189189,0.0,0.006012,0.195369,0.419236,0.430682,0.327824,0.100970,0.024490,0.150,0.189573,0,0.762476,0.397753,1,0.730562,0.644599,0.701299,0.5,1,1,0,0.091524,0.5,2018,0.76,0.500000,0.346808,0.542011,0.465824,1.0,Uganda
67,0.036169,0.639337,1,0.701950,0.510723,0.216216,0.0,0.038596,0.516896,0.025894,0.000000,0.390996,0.306440,0.126531,0.050,0.000000,0,0.995707,0.101124,1,0.046189,0.929152,0.041126,0.5,1,1,1,0.290426,1.0,2018,0.67,0.000000,0.051064,0.853649,0.000000,1.0,Ukraine
68,0.125921,0.482691,0,0.563226,0.125545,0.270270,0.0,0.034523,0.578348,0.062885,0.041850,0.310750,0.050872,0.085714,0.550,0.255924,0,0.945137,0.233708,0,0.071594,0.830430,0.380952,0.0,0,0,1,0.566312,0.5,2016,0.15,0.333333,0.348936,0.738428,0.188206,0.0,Vietnam


Now, we will repeat the three prediction models from *Q2* for *2016* and after that, three prediction models for *2018*.
### 2016

In [41]:
X_train = imputed_training_com_q3[imputed_training_com_q3['Data_year'] == 2016].drop(target, axis=1)
y_train = imputed_training_com_q3[imputed_training_com_q3['Data_year'] == 2016][target]
X_val = oos_com_q3[oos_com_q3['Data_year'] == 2016].drop(target, axis=1)
y_val = oos_com_q3[oos_com_q3['Data_year'] == 2016][target]
#X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=21)

We use *OneHotEncoder* for categorical variables.

In [42]:
encoder = OneHotEncoder(handle_unknown='ignore')
X_train_encoded = encoder.fit_transform(X_train[['Country']])
X_val_encoded = encoder.transform(X_val[['Country']])

X_train_final = np.concatenate((X_train.drop(['Country'], axis=1).values, X_train_encoded.toarray()), axis=1)
X_val_final = np.concatenate((X_val.drop(['Country'], axis=1).values, X_val_encoded.toarray()), axis=1)

Let's now fit our first model *(Linear)*.

In [43]:
# Fit the model
lr_model = LinearRegression().fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_lr, r2_lr = mean_absolute_error(y_val, lr_model.predict(X_val_final)), r2_score(y_val, lr_model.predict(X_val_final))
print('Linear Regression 2016:\nMAE:', mae_lr, '\nR^2:', r2_lr)

Linear Regression 2016:
MAE: 16024.693490781687 
R^2: -2744990654.1467505


This model isn't good so we won't analyze it. Let's move to *LASSO* selection and then apply again *Linear*:

In [44]:
# perform LASSO regression for variable selection
lasso_model = Lasso(alpha=0.061)
lasso_model.fit(X_train_final, y_train)
selected_features = lasso_model.coef_ != 0.0
X_train_lasso = X_train_final[:, selected_features]
X_val_lasso = X_val_final[:, selected_features]

# fit the linear regression model on the selected features
lr_model_lasso = LinearRegression()
lr_model_lasso.fit(X_train_lasso, y_train)

# evaluate the model
mae_lr_lasso = mean_absolute_error(y_val, lr_model_lasso.predict(X_val_lasso))
r2_lr_lasso = r2_score(y_val, lr_model_lasso.predict(X_val_lasso))
print('Linear Regression with LASSO 2016:')
print(f'MAE: {mae_lr_lasso}\nR^2: {r2_lr_lasso}')

Linear Regression with LASSO 2016:
MAE: 0.33324668208937047
R^2: 0.07448072845949805


Much better results for *MAE* and *$R^2$* is also positive so we start to gain predictive power.<br>
Below are the coefficients for *Linear* model with *LASSO*:

In [45]:
feature_names = oos_com_q3.columns.to_list()
# Inspect the selected features
selected_features = np.where(lasso_model.coef_ != 0.0)[0]
# Print the corresponding feature names
selected_feature_names = [feature_names[i] for i in selected_features if i < len(feature_names)]
print("Selected feature names:", selected_feature_names)
print('Lasso parameters:')
print('Intercept:',lasso_model.intercept_)
for feature, coef in zip(feature_names, lasso_model.coef_):
    if coef!=0.0:
        print(feature, ':', coef)

Selected feature names: ['Freemv_F', 'Minority_rule']
Lasso parameters:
Intercept: 0.6243023256461165
Freemv_F : -0.023953488469174725
Minority_rule : 0.006569767435388358


Let's move to *Decision Trees* now.

In [46]:
# Fit the model
dt_model = DecisionTreeRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_dt, r2_dt = mean_absolute_error(y_val, dt_model.predict(X_val_final)), r2_score(y_val, dt_model.predict(X_val_final))
print('Decision Tree 2016:')
print(f'MAE: {mae_dt}\nR^2: {r2_dt}\n10 Cross Validation Scores: {cross_val_score(dt_model, X_val_final, y_val, cv=10)}')

Decision Tree 2016:
MAE: 0.3866492146596858
R^2: -0.09975608862772378
10 Cross Validation Scores: [ 3.27876810e-01 -9.08413253e+00 -1.28299142e+00 -1.09269572e-01
 -1.23193396e-01  6.40486955e-02 -9.83090093e+01 -3.91190889e-02
  3.00829204e-01  1.13113997e-01]


Lastly, we have a *Random Forest* algorithm.

In [47]:
# Fit the model
rf_model = RandomForestRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_rf, r2_rf = mean_absolute_error(y_val, rf_model.predict(X_val_final)), r2_score(y_val, rf_model.predict(X_val_final))
print('Random Forest 2016:')
print(f'MAE: {mae_rf}\nR^2: {r2_rf}\n10 Cross Validation Scores: {cross_val_score(rf_model, X_val_final, y_val, cv=10)}')

Random Forest 2016:
MAE: 0.33654240837696336
R^2: 0.025394461018713343
10 Cross Validation Scores: [  0.42775136  -0.6732188   -0.5448716    0.03142776  -0.04361044
  -2.28194064 -11.58298948  -0.07596269   0.40458395   0.11154217]


Below, we will print the best model with the variables of the authors for **2016**:

In [48]:
# choose the model with the lowest MAE on the validation set
mae_vals = [mae_lr, mae_lr_lasso, mae_dt, mae_rf]
r2_vals = [ r2_lr, r2_lr_lasso, r2_dt, r2_rf]
best_model_idx = np.argmin(mae_vals)
best_model = [lr_model, lr_model_lasso, dt_model, rf_model][best_model_idx]
print(f'Best model for 2016: {best_model}, MAE: {mae_vals[best_model_idx]}, R^2: {r2_vals[best_model_idx]}')

Best model for 2016: LinearRegression(), MAE: 0.33324668208937047, R^2: 0.07448072845949805


Now, we need to do the same procedure for 2018 as well.
### 2018

In [49]:
X_train = imputed_training_com_q3[imputed_training_com_q3['Data_year'] == 2018].drop(target, axis=1)
y_train = imputed_training_com_q3[imputed_training_com_q3['Data_year'] == 2018][target]
X_val = oos_com_q3[oos_com_q3['Data_year'] == 2018].drop(target, axis=1)
y_val = oos_com_q3[oos_com_q3['Data_year'] == 2018][target]
#X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=21)

We use *OneHotEncoder* for categorical variables.

In [50]:
encoder = OneHotEncoder(handle_unknown='ignore')
X_train_encoded = encoder.fit_transform(X_train[['Country']])
X_val_encoded = encoder.transform(X_val[['Country']])

X_train_final = np.concatenate((X_train.drop(['Country'], axis=1).values, X_train_encoded.toarray()), axis=1)
X_val_final = np.concatenate((X_val.drop(['Country'], axis=1).values, X_val_encoded.toarray()), axis=1)

Let's now fit our first model *(Linear)*.

In [51]:
# Fit the model
lr_model = LinearRegression().fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_lr, r2_lr = mean_absolute_error(y_val, lr_model.predict(X_val_final)), r2_score(y_val, lr_model.predict(X_val_final))
print('Linear Regression 2018:\nMAE:', mae_lr, '\nR^2:', r2_lr)

Linear Regression 2018:
MAE: 7110.177554475 
R^2: -293512816.8673965


This model isn't good so we won't analyze it. Let's move to *LASSO* selection and then apply again *Linear*:

In [52]:
# perform LASSO regression for variable selection
lasso_model = Lasso(alpha=0.01)
lasso_model.fit(X_train_final, y_train)
selected_features = lasso_model.coef_ != 0.0
X_train_lasso = X_train_final[:, selected_features]
X_val_lasso = X_val_final[:, selected_features]

# fit the linear regression model on the selected features
lr_model_lasso = LinearRegression()
lr_model_lasso.fit(X_train_lasso, y_train)

# evaluate the model
mae_lr_lasso = mean_absolute_error(y_val, lr_model_lasso.predict(X_val_lasso))
r2_lr_lasso = r2_score(y_val, lr_model_lasso.predict(X_val_lasso))
print('Linear Regression with LASSO 2018:')
print(f'MAE: {mae_lr_lasso}\nR^2: {r2_lr_lasso}')

Linear Regression with LASSO 2018:
MAE: 40.524782766181396
R^2: -1440.550802624187


Much better results for *MAE* but *$R^2$* is negative so we don't start to gain predictive power.<br>
Below are the coefficients for *Linear* model with *LASSO*:

In [53]:
feature_names = oos_com_q3.columns.to_list()
# Inspect the selected features
selected_features = np.where(lasso_model.coef_ != 0.0)[0]
# Print the corresponding feature names
selected_feature_names = [feature_names[i] for i in selected_features if i < len(feature_names)]
print("Selected feature names:", selected_feature_names)
print('Lasso parameters:')
print('Intercept:',lasso_model.intercept_)
for feature, coef in zip(feature_names, lasso_model.coef_):
    if coef!=0.0:
        print(feature, ':', coef)

Selected feature names: ['Free_discuss', 'Work_rightCIRI', 'Internet_use', 'Climate_chg_vuln', 'Literacy_15_24yrs', 'Neonatal_mort', 'Rape_report', 'Armedcon', 'Minority_rule', 'Phys_secF', 'Rape_enclave']
Lasso parameters:
Intercept: 0.6004955495412365
Free_discuss : -0.3604976542382206
Work_rightCIRI : 0.045150713327221086
Internet_use : -0.16569022975483477
Climate_chg_vuln : 0.09309161437527515
Literacy_15_24yrs : -0.015863552646290986
Neonatal_mort : 0.7170151192239336
Rape_report : -0.04901162088328785
Armedcon : 0.05658340635884571
Minority_rule : 0.1818972301362467
Phys_secF : -0.03410570780164882
Rape_enclave : 0.5983832980685262


Let's move to *Decision Trees* now.

In [54]:
# Fit the model
dt_model = DecisionTreeRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_dt, r2_dt = mean_absolute_error(y_val, dt_model.predict(X_val_final)), r2_score(y_val, dt_model.predict(X_val_final))
print('Decision Tree 2018:')
print(f'MAE: {mae_dt}\nR^2: {r2_dt}\n10 Cross Validation Scores: {cross_val_score(dt_model, X_val_final, y_val, cv=10)}')

Decision Tree 2018:
MAE: 1.5088953488372094
R^2: -1.3544695137112934
10 Cross Validation Scores: [-1.73930427e+00  3.85701339e-02 -5.47021762e-03 -6.44668704e+00
 -8.88908573e-02 -1.46857227e+00 -8.68642815e+00 -3.74116486e-01
 -1.30933963e+01 -8.21917133e+00]


Lastly, we have a *Random Forest* algorithm.

In [55]:
# Fit the model
rf_model = RandomForestRegressor(random_state=21).fit(X_train_final, y_train)

# Make predictions and evaluate the model
mae_rf, r2_rf = mean_absolute_error(y_val, rf_model.predict(X_val_final)), r2_score(y_val, rf_model.predict(X_val_final))
print('Random Forest 2018:')
print(f'MAE: {mae_rf}\nR^2: {r2_rf}\n10 Cross Validation Scores: {cross_val_score(rf_model, X_val_final, y_val, cv=10)}')

Random Forest 2018:
MAE: 0.6660436046511632
R^2: -0.07098623671765192
10 Cross Validation Scores: [ 0.06260791  0.61774043  0.31099127  0.06453773  0.49087902 -0.26921112
  0.16589943  0.10811022 -4.72824976 -0.46054094]


Below, we will print the best model with the variables of the authors for **2018**:

In [56]:
# choose the model with the lowest MAE on the validation set
mae_vals = [mae_lr, mae_lr_lasso, mae_dt, mae_rf]
r2_vals = [ r2_lr, r2_lr_lasso, r2_dt, r2_rf]
best_model_idx = np.argmin(mae_vals)
best_model = [lr_model, lr_model_lasso, dt_model, rf_model][best_model_idx]
print(f'Best model for 2018: {best_model}, MAE: {mae_vals[best_model_idx]}, R^2: {r2_vals[best_model_idx]}')

Best model for 2018: RandomForestRegressor(random_state=21), MAE: 0.6660436046511632, R^2: -0.07098623671765192


## $4^{th}$ Question : Slavery Estimation with PCA-derived Features<a class="anchor" id="q1"></a>

In [57]:
imputed_training_df = pd.read_excel('imputed_unscaled_training.xlsx')
#imputed_training_df = pd.read_excel('imputed_training.xlsx')
imputed_training_df

Unnamed: 0,Country,Region,Data_year,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Armedcon,Pol_terror,SLAVERY,SDGI,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Gender_equal,Hum_traff,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,ASIA,2018,38.57,55.92,0.48,0.24,570,1,5.0,2.22,36.50,10.910000,2.0206,40.9,26.799999,9.5,50,396,35.5,3.575,189.0,91.099998,66,46.990051,71.85668,9.27439,57.509158,22.612086,31.198909,27.700001,31.0400,31.900000,55.299999,2.888889,43.000000,14.86678,-3.583961,0.802082,10.300000,9.581312,3.140831,6.39,0.0,27.820000,31.248380,0.000000,0.12,0.425262,67.27,46.50,0.000000,2,13.57,6.250000,0.862703,0.835789,11,3.567911,6.5,76,3.764837,33.716116,37.4,0.134000,1,1,0.014739,1,0.77,7.928766,1,4,2,0,0,0,0.566547,0,1,-2.359148,0,12500,0.3,51.5,0.4,4,0,2,0,4,1,0,0,1,1,0,0,2,0,4,4,2,17,2,3,500,5400,4,1
1,Argentina,AMERICAS,2018,63.02,26.12,0.59,0.61,11970,0,2.0,0.13,66.82,0.000000,4.5550,8.2,5.000000,1.2,67,52,6.3,6.574,24.0,12.500000,94,99.238410,95.38000,17.94733,24.840764,100.105597,65.840221,36.599998,4.3020,96.400002,99.099998,1.309626,99.800003,95.00000,-1.071438,59.449954,4.400000,6.666788,2.985268,64.70,32.1,44.490002,98.091536,11.746875,0.05,4.562048,85.89,62.36,17.647059,42,49.36,14.705882,12.585822,0.861084,32,2.336129,5.5,147,2.876021,42.929606,99.5,0.005612,0,1,0.016886,0,0.26,0.000000,1,4,4,0,1,1,0.954939,0,0,-3.221412,1,74900,5.4,96.3,1.3,5,1,2,0,12,2,2,2,1,2,2,2,3,1,1,4,0,9,4,1,1700,31000,2,1
2,Armenia,RUSSIA AND EURASIA,2018,67.09,75.92,3.21,0.23,3770,0,3.0,0.34,65.41,2.440000,3.0263,20.8,5.800000,4.2,62,25,7.4,4.350,45.0,14.100000,93,99.807284,84.11982,12.31785,59.047619,99.632353,76.785713,10.700000,37.8600,89.500000,100.000000,0.675325,100.000000,81.48745,-0.195630,58.265829,3.900000,16.298620,4.313498,46.30,31.0,31.299999,99.666683,22.585273,0.05,1.671657,87.28,71.58,0.000000,67,63.52,10.526316,0.593045,0.841606,35,3.401135,1.8,164,3.958415,83.951898,99.6,0.005065,0,0,0.053967,1,0.13,0.000000,1,4,3,0,1,1,0.872840,0,0,-1.867683,0,4600,0.6,99.1,14.3,4,1,1,0,6,1,1,1,1,1,0,1,2,0,0,3,0,9,0,3,200,1000,3,1
3,Bangladesh,ASIA,2016,45.54,37.95,1.05,0.16,1330,1,4.0,0.95,44.42,43.650002,4.4058,36.1,16.400000,14.3,61,176,23.3,4.694,227.0,37.599998,89,79.939430,91.54676,9.97506,25.673250,81.651376,43.636364,20.000000,2.9230,60.599998,86.900002,1.124528,59.599998,9.26030,-0.548201,9.236862,12.800000,4.443554,2.847685,9.60,1.9,32.119999,32.330873,0.000000,0.26,0.372017,90.97,50.04,33.333333,2,2.36,10.526316,3.526961,0.768760,25,2.852616,2.7,42,3.458803,80.262334,30.5,0.047270,0,1,0.052241,1,0.05,3.871201,1,4,2,0,1,0,0.541176,0,1,-2.852130,0,140000,0.2,66.7,2.1,2,0,1,0,7,1,2,1,1,1,1,1,2,0,1,4,1,12,4,3,500,3800,4,0
4,Bolivia,AMERICAS,2016,57.74,56.40,1.54,0.40,3070,0,2.0,0.44,57.47,7.700000,1.9380,18.1,15.900000,1.6,59,206,19.6,5.890,120.0,38.400002,94,98.995885,81.60256,13.15285,49.608355,84.781398,75.544797,53.099998,0.3638,50.299999,90.000000,2.135802,90.500000,70.97138,-1.097022,34.711929,26.400000,3.637385,3.308116,39.02,13.9,56.290001,96.137165,11.296667,0.11,1.599499,71.21,50.15,0.000000,64,45.76,26.785714,5.260437,0.868154,34,3.185076,12.1,140,3.439012,43.898827,75.8,0.004741,0,0,0.027534,0,0.74,0.000000,1,4,4,0,1,1,0.786002,0,1,-2.344236,1,13500,4.3,88.9,4.8,5,1,1,1,11,1,2,2,1,2,2,1,3,0,3,3,0,10,0,3,1000,20000,3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,MIDDLE EAST AND NORTH AFRICA,2018,64.86,91.44,1.48,0.66,3690,0,3.0,0.22,65.06,1.990000,1.8325,10.1,5.000000,2.8,66,62,8.2,4.739,33.0,14.000000,98,97.304445,98.72154,14.62357,28.496959,75.820182,36.571429,31.299999,69.7100,91.599998,97.699997,1.288043,100.000000,95.00000,-3.158522,23.348354,2.100000,14.816524,3.673942,46.16,30.9,36.060001,94.532943,27.773102,0.07,2.402456,87.28,73.01,0.000000,58,17.82,1.388889,5.817336,0.972199,38,3.484549,2.2,199,4.351138,61.795845,99.2,0.004354,0,0,0.009617,0,0.04,0.000000,1,4,3,0,1,1,0.800101,0,0,-1.737197,1,25000,1.2,58.3,14.3,2,0,0,0,7,1,2,0,1,1,0,1,2,0,4,4,2,15,4,2,100,8100,4,1
66,Uganda,SUBSAHARAN AFRICA,2018,52.66,47.22,2.60,0.28,630,1,3.0,0.76,43.62,33.240002,2.0193,34.2,25.500000,4.3,50,343,18.7,3.931,161.0,54.599998,78,87.408829,91.48062,9.76888,56.559767,71.135647,95.011345,35.000000,1.0600,19.100000,79.000000,2.888889,18.162560,5.00000,-4.879242,4.227783,16.299999,3.590334,3.539417,17.71,7.4,44.299999,23.420564,0.560000,0.08,0.110887,79.20,56.92,0.000000,30,13.57,26.666667,5.663806,0.755702,25,3.567911,10.7,97,3.891180,46.035296,29.9,0.030420,0,1,0.025691,0,0.93,5.463832,1,4,3,0,1,1,0.579077,0,1,-2.605993,0,44300,85.0,69.4,0.6,2,1,0,0,7,0,2,2,1,1,1,0,3,1,1,3,2,12,3,2,23000,950000,3,0
67,Ukraine,RUSSIA AND EURASIA,2018,70.60,104.81,3.69,0.22,2310,1,4.0,0.67,66.39,0.000000,4.4008,3.7,1.170000,0.3,63,24,5.5,4.681,94.0,9.000000,76,99.772410,97.40348,15.14135,33.615222,99.035088,80.952375,12.100000,8.4710,95.900002,96.199997,1.368095,100.000000,95.00000,-4.980211,94.555988,2.400000,9.874470,3.807098,43.40,5.4,25.620001,86.342962,14.700000,0.04,6.262352,88.84,71.58,5.000000,90,32.54,1.459854,5.702895,0.943358,27,2.866647,4.3,305,2.947390,44.265651,99.8,0.016890,0,0,0.013978,1,0.47,4.143135,1,3,3,0,1,1,0.684342,0,1,-1.712700,0,86600,5.2,93.9,3.1,4,1,1,0,10,1,2,2,1,1,2,2,2,0,1,3,2,12,0,2,6100,61000,4,0
68,Vietnam,ASIA,2016,64.27,184.69,6.14,0.20,2100,0,3.0,0.15,57.62,3.230000,5.5774,19.4,11.000000,5.7,66,54,11.4,5.360,140.0,21.700001,95,97.091673,97.96511,11.90000,21.423774,88.719899,89.064399,24.299999,9.2590,78.000000,97.599998,1.024409,99.000000,43.78080,-0.072191,23.593043,16.400000,2.058134,3.467071,48.31,18.8,35.570000,61.306621,0.137500,0.24,1.971434,79.24,56.92,7.692308,3,0.45,2.777778,9.076540,0.746622,31,3.407291,3.3,145,3.909754,60.553253,95.0,0.014510,0,1,0.048466,1,0.24,0.000000,1,0,2,0,0,0,0.559768,0,0,-0.991591,0,71900,3.6,85.4,2.1,3,1,0,0,1,0,0,1,0,0,0,1,2,0,1,2,0,11,2,3,5300,94000,3,0


We now have to scale with Standard Scaler the data.

In [58]:
cols_to_normalize = imputed_training_df.select_dtypes(include='number').columns.drop(['SLAVERY', 'Data_year'])

imputed_training_df[cols_to_normalize] = StandardScaler().fit_transform(imputed_training_df[cols_to_normalize])
imputed_training_df

Unnamed: 0,Country,Region,Data_year,KOF_Globalis,Trade_open,FDI,VDEM_Libdem,GDPpc,Armedcon,Pol_terror,SLAVERY,SDGI,Poverty,Cereal_yield,Stunting_u5s,Undernourish,Wasting_u5s,Life_expect,Maternal_mort,Neonatal_mort,Wellbeing,Tuberculosis,Infant_mort,Infant_vaccines,Literacy_15_24yrs,Primary_school,Yrs_of_school,Lack_contraception,F_school,M_school,F_parliam,Freshwater,Sanitation,Water_acc,Co2_fuel,Electric_acc,Fuel_acc,Growth_rate,ATMs,Child_lab,Unemploy,Infrastruct,Internet_use,Broadband,Inequality,Piped_water,Treated_waste,Climate_chg_vuln,Co2_energy_pc,Ocean_biodiv,Ocean_clean,Ocean_protect,Ocean_fisheries,Fish_overexploit,Terrestrial_protect,Forest_change,Species_survival,CPI,Gov_efficien,Homicides,Prison_pop,Property_rights,Safe_night,Regist_birth,Risk_masskill,Masskill_ongo,Masskill_ever,GDPpc_growth,Minority_rule,Ethnic_fract,Battle_deaths,Pol_cand_restr,Party_ban,Relig_freeMK,Polkill_apprvd,Freemv_M,Freemv_F,Free_discuss,Social_ineq,Soc_powerdist,Trade_openMK,Democ,Sexwrk_size,Sexwrk_HIV,Sexwrk_condom,Sexwrk_Syphilis,Physrights_indx,Extrajud_kill,Pol_impris,Torture,Polrights_indx,Free_assem,Freemv_foreign,Freemv_dom,Free_speech,Free_polit,Relig_freeCIRI,Econ_right_F,Pol_right_F,Indep_judic,Rape_prev,Rape_report,Rape_enclave,Rape_compl,Gender_equal,Hum_traff,AIDS_death,AIDS_Orph,Phys_secF,Work_rightCIRI
0,Afghanistan,ASIA,2018,-2.025640,-0.440650,-0.383947,-0.855273,-0.689245,1.384437,1.915094,2.22,-1.722946,-0.156427,-0.755378,1.341877,1.624495,0.677430,-1.634282,1.133409,1.780640,-1.743157,0.124480,2.018781,-1.522106,-3.071282,-2.218026,-1.280252,1.209906,-2.956825,-1.799977,0.682268,0.631153,-1.427107,-2.486692,1.371566,-1.291700,-1.204003,-1.208591,-0.987079,-0.305687,0.251542,-0.610840,-1.250160,-0.955028,-1.326172,-1.273525,-0.737934,0.052756,-0.776957,-2.099434,-1.437682,-0.685755,-1.930101,-0.652495,-0.680598,-0.474261,-0.013841,-1.935207,0.348834,-0.280271,-0.781813,-0.305255,-1.798694,-1.443546,4.347250,1.914854,0.699544,-0.712798,1.300887,1.282136,2.657150,0.359211,0.375419,-1.302224,-0.171499,-2.315953,-1.300887,-0.842957,-0.27735,0.792406,-0.310678,-1.384437,-0.434928,-0.497240,-1.604535,-0.831016,0.380613,-1.165998,1.453459,-0.423999,-0.935647,0.149175,-1.851183,-2.045220,0.473050,-0.138866,-1.194045,-1.752870,-0.168073,-0.849662,2.160692,0.838054,1.217718,2.897194,-0.245605,0.965834,-0.507181,-0.520459,0.919265,0.733799
1,Argentina,AMERICAS,2018,0.116540,-1.059232,-0.373575,1.108519,0.938218,-0.722315,-1.031204,0.13,1.035057,-0.791059,0.861868,-1.016504,-0.664075,-1.108354,1.011698,-0.593308,-0.854862,1.725938,-0.865091,-0.782847,0.684497,0.702496,0.606319,2.325878,-0.667453,0.854666,-0.113320,1.467223,-0.488322,1.142437,0.785636,-0.252582,0.795234,1.193368,0.074948,0.596492,-0.800974,-0.201855,-0.807396,1.219209,0.292108,0.416105,0.945677,-0.228460,-0.707556,0.577338,0.510815,0.277169,0.138427,-0.463936,0.787350,-0.205354,0.241982,0.230849,-0.417195,-1.934508,-0.353999,-0.073148,-1.623700,-1.135280,0.805753,-0.508758,-0.522233,0.699544,-0.597982,-0.768706,-0.824273,-0.621867,0.359211,0.375419,0.948533,-0.171499,0.431788,0.768706,1.233465,-0.27735,-1.261980,-1.681527,0.722315,-0.187576,-0.182765,0.790156,-0.651229,0.832189,0.317999,1.453459,-0.423999,1.372557,1.454455,0.689656,0.876523,0.473050,1.249793,1.127709,1.909843,2.184951,0.472034,-0.360115,0.838054,-1.150067,-1.031204,0.900551,-1.738502,-0.429650,-0.432135,-1.878498,0.733799
2,Armenia,RUSSIA AND EURASIA,2018,0.473132,-0.025494,-0.126536,-0.908349,-0.232413,-0.722315,-0.049105,0.34,0.906799,-0.649125,-0.113622,-0.107770,-0.580091,-0.462890,0.233469,-0.728835,-0.755579,-0.846675,-0.739145,-0.725816,0.605690,0.743585,-0.745643,-0.014809,1.298316,0.831390,0.419607,-0.817084,0.916695,0.867556,0.852876,-0.904911,0.802582,0.789109,0.522360,0.564519,-0.842948,1.296517,0.870840,0.439988,0.249372,-0.962457,0.997973,0.241613,-0.707556,-0.368914,0.705672,1.274075,-0.685755,0.452417,1.357012,-0.440257,-0.490736,0.042426,-0.200336,0.039683,-0.626792,0.096532,-0.018108,1.818514,0.809375,-0.529447,-0.522233,-1.429503,1.385382,1.300887,-1.361200,-0.621867,0.359211,0.375419,-0.176845,-0.171499,0.431788,0.768706,0.794548,-0.27735,-1.261980,0.470665,-1.384437,-0.466244,-0.478742,0.939825,1.945695,0.380613,0.317999,0.227650,-0.423999,-0.358596,0.149175,-0.580763,-0.584349,0.473050,-0.138866,-1.194045,0.078487,-0.168073,-0.849662,-1.200384,-0.705730,-1.150067,-1.031204,-1.391761,0.965834,-0.526564,-0.535640,-0.479616,0.733799
3,Bangladesh,ASIA,2016,-1.414965,-0.813667,-0.330202,-1.279877,-0.580747,1.384437,0.932995,0.95,-1.002518,1.748052,0.766661,0.995692,0.532700,1.710173,0.077823,0.029113,0.679506,-0.448753,0.352382,0.111821,0.290461,-0.691425,0.146078,-0.988920,-0.619612,-0.052998,-1.194407,0.003150,-0.546059,-0.283760,-0.125834,-0.442941,-0.681786,-1.371733,0.342247,-0.759329,-0.095819,-0.547711,-0.981233,-1.114220,-0.881210,-0.876755,-1.237586,-0.737934,1.573381,-0.794388,1.222956,-1.054922,0.871034,-1.930101,-1.103478,-0.440257,-0.311484,-0.662242,-0.923199,-0.977100,-0.560437,-1.121174,-0.759219,1.552849,-1.693468,1.066869,-0.522233,0.699544,1.293064,1.300887,-1.691617,0.979105,0.359211,0.375419,-1.302224,-0.171499,0.431788,-1.300887,-0.978595,-0.27735,0.792406,-1.094433,-1.384437,0.070480,-0.503407,-0.792050,-0.491418,-0.522537,-1.165998,0.227650,-0.423999,-0.070070,0.149175,0.689656,-0.584349,0.473050,-0.138866,-0.033168,0.078487,-0.168073,-0.849662,-0.360115,0.838054,0.033826,0.441945,0.900551,0.965834,-0.507181,-0.525979,0.919265,-1.100699
4,Bolivia,AMERICAS,2016,-0.346066,-0.430686,-0.284000,-0.006066,-0.332345,-0.722315,-1.031204,0.44,0.184551,-0.343152,-0.808086,-0.302499,0.480210,-1.022292,-0.233469,0.179699,0.345555,0.934720,-0.289340,0.140336,0.684497,0.684979,-1.047880,0.332377,0.755869,0.100951,0.359188,2.922477,-0.653208,-0.694090,0.105769,0.597076,0.453535,0.474496,0.061878,-0.071468,1.045859,-0.673122,-0.399472,0.131687,-0.414990,1.649390,0.880792,-0.247986,-0.055860,-0.392537,-1.547104,-1.043029,-0.685755,0.342454,0.642521,0.473566,-0.205574,0.299233,-0.272623,-0.360823,0.132605,-0.143016,-0.788575,-1.065492,-0.052675,-0.541702,-0.522233,-1.429503,-0.028444,-0.768706,1.158230,-0.621867,0.359211,0.375419,0.948533,-0.171499,0.431788,0.768706,0.330296,-0.27735,0.792406,-0.286972,0.722315,-0.430964,-0.250593,0.394605,0.047943,0.832189,0.317999,0.227650,1.430997,1.084032,0.149175,0.689656,0.876523,0.473050,1.249793,1.127709,0.078487,2.184951,-0.849662,1.320423,-0.705730,-1.150067,-0.540155,-1.391761,0.965834,-0.474877,-0.470087,-0.479616,0.733799
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Tunisia,MIDDLE EAST AND NORTH AFRICA,2018,0.277751,0.296668,-0.289657,1.373896,-0.243834,-0.722315,-0.049105,0.22,0.874962,-0.675301,-0.875408,-0.879473,-0.664075,-0.764107,0.856052,-0.543113,-0.683374,-0.396700,-0.811114,-0.729380,0.999726,0.562811,1.007523,0.943889,-0.457342,-0.339803,-1.538393,0.999778,2.250201,0.951215,0.681041,-0.274778,0.802582,1.193368,-0.991252,-0.378300,-0.994052,1.065955,0.062753,0.434059,0.245487,-0.464963,0.827532,0.466615,-0.490324,-0.129666,0.705672,1.428692,-0.685755,0.122529,-0.481516,-0.953806,-0.171550,1.305714,0.016523,0.194306,-0.597301,0.445874,0.564446,0.223176,0.794887,-0.556339,-0.522233,-1.429503,-0.986775,-0.768706,-1.732919,-0.621867,0.359211,0.375419,-0.176845,-0.171499,0.431788,0.768706,0.405672,-0.27735,-1.261980,0.678114,0.722315,-0.385378,-0.441745,-1.241055,1.945695,-0.522537,-1.165998,-0.998158,-0.423999,-0.070070,0.149175,0.689656,-2.045220,0.473050,-0.138866,-1.194045,0.078487,-0.168073,-0.849662,2.160692,0.838054,1.217718,1.915094,0.900551,-0.386334,-0.533025,-0.511144,0.919265,0.733799
66,Uganda,SUBSAHARAN AFRICA,2018,-0.791148,-0.621243,-0.184053,-0.642971,-0.680679,1.384437,-0.049105,0.76,-1.075289,1.142505,-0.756207,0.858661,1.488020,-0.441375,-1.634282,0.867374,0.264324,-1.331354,-0.043447,0.717771,-0.576419,-0.151927,0.138137,-1.074648,1.155347,-0.570211,1.306998,1.326108,-0.624059,-1.937032,-0.716049,1.371566,-2.204272,-1.499190,-1.870293,-0.894581,0.197995,-0.680442,-0.107221,-0.770770,-0.667526,0.396246,-1.533410,-0.713647,-0.381708,-0.879877,-0.427024,-0.311027,-0.685755,-0.903786,-0.652495,0.466875,-0.180930,-0.788561,-0.923199,0.348834,0.029386,-0.572208,-0.117841,-0.911656,-1.715200,0.429553,-0.522233,0.699544,-0.127023,-0.768706,1.942970,1.637753,0.359211,0.375419,-0.176845,-0.171499,0.431788,0.768706,-0.775967,-0.27735,0.792406,-0.703119,-1.384437,-0.308874,4.725510,-0.647727,-0.791063,-0.522537,0.317999,-0.998158,-0.423999,-0.070070,-1.156105,0.689656,0.876523,0.473050,-0.138866,-0.033168,-1.752870,2.184951,0.472034,-0.360115,-0.705730,1.217718,0.441945,0.327473,-0.386334,0.946523,2.738578,-0.479616,-1.100699
67,Ukraine,RUSSIA AND EURASIA,2018,0.780660,0.574199,-0.081277,-0.961424,-0.440843,1.384437,0.932995,0.67,0.995943,-0.791059,0.763471,-1.341052,-1.066150,-1.301993,0.389115,-0.733855,-0.927068,-0.463791,-0.445273,-0.907601,-0.734033,0.741066,0.849270,1.159177,-0.163210,0.802014,0.622478,-0.693608,-0.313773,1.122518,0.568975,-0.192451,0.802582,1.193368,-1.921874,1.544402,-0.968868,0.297147,0.230997,0.317176,-0.745229,-1.556106,0.555623,-0.100380,-0.816172,1.133981,0.924361,1.274075,-0.452237,1.295461,0.110675,-0.949818,-0.178542,1.026722,-0.778627,-0.951091,-0.442473,1.503882,-1.517834,-1.039079,0.816619,-0.082191,-0.522233,-1.429503,-0.753472,1.300887,0.043072,1.091566,0.359211,-0.875978,-0.176845,-0.171499,0.431788,0.768706,-0.213201,-0.27735,0.792406,0.717060,-1.384437,-0.141197,-0.195098,0.661869,-0.291655,0.380613,0.317999,0.227650,-0.423999,0.795506,0.149175,0.689656,0.876523,0.473050,-0.138866,1.127709,1.909843,-0.168073,-0.849662,-0.360115,-0.705730,1.217718,0.441945,-1.391761,-0.386334,-0.145370,-0.328629,0.919265,-1.100699
68,Vietnam,ASIA,2016,0.226058,2.232333,0.149732,-1.067575,-0.470822,-0.722315,-0.049105,0.15,0.198196,-0.603171,1.514280,-0.208741,-0.034194,-0.140158,0.856052,-0.583269,-0.394552,0.321643,-0.169392,-0.454921,0.763304,0.547443,0.916702,-0.188547,-0.863817,0.294665,1.017446,0.382398,-0.280781,0.409420,0.673570,-0.545905,0.765840,-0.338973,0.585420,-0.371693,0.206390,-0.918798,-0.198631,0.525109,-0.224618,-0.516175,-0.275587,-0.731971,1.356149,-0.270774,-0.421417,-0.311027,-0.326496,-1.893447,-1.180318,-0.875747,0.027577,-0.876399,-0.489481,0.051094,-0.516200,-0.093110,-0.090290,0.133704,0.642760,-0.172210,-0.522233,0.699544,1.091154,1.300887,-0.906877,-0.621867,0.359211,-4.630170,-1.302224,-0.171499,-2.315953,-1.300887,-0.879200,-0.27735,-1.261980,1.863496,-1.384437,-0.199468,-0.293757,0.207520,-0.491418,-0.070962,0.317999,-0.998158,-0.423999,-1.801224,-1.156105,-1.851183,-0.584349,-1.596543,-1.527525,-1.194045,0.078487,-0.168073,-0.849662,-0.360115,-2.249514,-1.150067,-0.049105,-0.245605,0.965834,-0.197058,-0.214774,-0.479616,-1.100699


### 2016
Let's start with 2016 PCA and move on to 2018 PCA. Before we do that, we need some transformations.

In [59]:
X_train = imputed_training_df[imputed_training_df['Data_year'] == 2016].drop(target, axis=1)
y_train = imputed_training_df[imputed_training_df['Data_year'] == 2016][target]
encoder = OneHotEncoder(handle_unknown='ignore')
X_train_encoded = encoder.fit_transform(X_train[['Country','Region']])
X_train_final = np.concatenate((X_train.drop(['Country','Region'], axis=1).values, X_train_encoded.toarray()), axis=1)
X_train, X_val, y_train, y_val = train_test_split(X_train_final, y_train, test_size=0.25, random_state=21)

Let's fit the PCA now.

In [60]:
n_components = 6
pca16 = PCA(n_components=n_components)
X_train_pca = pca16.fit_transform(X_train)
X_train_pca

array([[-0.02251408,  0.85561239, -2.84352155, -1.83708888,  1.00309246,
         1.72182797],
       [ 1.39761631, -3.39337999, -3.03542207, -1.1865572 , -0.29544466,
         2.40615277],
       [-4.03579815, -2.67876314, -0.53573493, -1.60578172, -0.93946485,
         1.09719094],
       [ 4.35627178, -3.64293463, -1.58545936,  8.06637999,  1.33050973,
         1.06192702],
       [-7.00929747,  0.80490847,  0.96550887, -1.8650857 ,  1.6873279 ,
         1.32751966],
       [ 0.70997091, -6.61497153,  1.18860652,  1.30199189,  4.84278838,
        -0.76415083],
       [-4.46173585, -0.063319  ,  9.87498504,  3.8478863 , -3.80986548,
         1.20927974],
       [-5.82258311, -1.61233693, -2.52935709,  1.35543537,  2.17720885,
        -3.84191466],
       [ 2.62850357,  1.53306987, -0.99653452, -2.14067636, -3.23968404,
        -3.26667341],
       [-2.66993968, -0.81046362, -3.13844551, -2.35907772, -2.27163672,
         3.40292974],
       [-6.62620189,  0.70019814, -2.13904395,  1.

Let's print the *explained_variance_ratio* in order to see how much of the variance is explained by the component that was found.

In [61]:
print(pca16.explained_variance_ratio_)
print(pca16.components_)

[0.24717838 0.13628206 0.09624032 0.08480302 0.06099973 0.05370516]
[[ 2.87644480e-17  1.36647612e-01  7.77690628e-02  9.77642006e-02
   1.06360265e-01  8.17266379e-02 -1.29791215e-01 -1.08178805e-01
   1.49892704e-01 -1.04070597e-01  6.31136410e-02 -1.23528646e-01
  -7.67188216e-02 -9.93874366e-02  6.70701918e-02 -1.26179172e-01
  -1.66853048e-01  5.33633566e-02 -1.15332856e-02 -1.40339207e-01
   1.21140366e-01  1.60579606e-01  1.07755638e-01  1.76335670e-01
  -1.27628878e-01  1.29034389e-01  6.32364179e-02 -2.91243418e-02
  -4.14243915e-02  1.00300552e-01  1.33816246e-01  5.42348204e-02
   1.02827907e-01  1.45643726e-01 -2.42470910e-02  1.59480290e-01
  -1.10918410e-01  4.02774225e-02  1.17739804e-01  1.33494598e-01
   1.31424435e-01  8.38470570e-02  1.55660672e-01  8.25380884e-02
  -1.21277907e-01  1.23668052e-01  8.96149921e-02  5.57813148e-02
   7.91482115e-03  7.46551088e-02 -3.55036907e-03 -2.44798219e-03
  -3.27994644e-02  2.73514963e-02  9.23855731e-02  3.67887615e-02
   3.939

In [62]:
# Get feature names
feature_names = [ 'Data_year', 'KOF_Globalis', 'Trade_open', 'FDI', 'VDEM_Libdem', 'GDPpc', 'Armedcon', 'Pol_terror', 'SDGI', 'Poverty', 'Cereal_yield', 'Stunting_u5s', 'Undernourish', 'Wasting_u5s', 'Life_expect', 'Maternal_mort', 'Neonatal_mort', 'Wellbeing', 'Tuberculosis', 'Infant_mort', 'Infant_vaccines', 'Literacy_15_24yrs', 'Primary_school', 'Yrs_of_school', 'Lack_contraception', 'F_school', 'M_school', 'F_parliam', 'Freshwater', 'Sanitation', 'Water_acc', 'Co2_fuel', 'Electric_acc', 'Fuel_acc', 'Growth_rate', 'ATMs', 'Child_lab', 'Unemploy', 'Infrastruct', 'Internet_use', 'Broadband', 'Inequality', 'Piped_water', 'Treated_waste', 'Climate_chg_vuln', 'Co2_energy_pc', 'Ocean_biodiv', 'Ocean_clean', 'Ocean_protect', 'Ocean_fisheries', 'Fish_overexploit', 'Terrestrial_protect', 'Forest_change', 'Species_survival', 'CPI', 'Gov_efficien', 'Homicides', 'Prison_pop', 'Property_rights', 'Safe_night', 'Regist_birth', 'Risk_masskill', 'Masskill_ongo', 'Masskill_ever', 'GDPpc_growth', 'Minority_rule', 'Ethnic_fract', 'Battle_deaths', 'Pol_cand_restr', 'Party_ban', 'Relig_freeMK', 'Polkill_apprvd', 'Freemv_M', 'Freemv_F', 'Free_discuss', 'Social_ineq', 'Soc_powerdist', 'Trade_openMK', 'Democ', 'Sexwrk_size', 'Sexwrk_HIV', 'Sexwrk_condom', 'Sexwrk_Syphilis', 'Physrights_indx', 'Extrajud_kill', 'Pol_impris', 'Torture', 'Polrights_indx', 'Free_assem', 'Freemv_foreign', 'Freemv_dom', 'Free_speech', 'Free_polit', 'Relig_freeCIRI', 'Econ_right_F', 'Pol_right_F', 'Indep_judic', 'Rape_prev', 'Rape_report', 'Rape_enclave', 'Rape_compl', 'Gender_equal', 'Hum_traff', 'AIDS_death', 'AIDS_Orph', 'Phys_secF', 'Work_rightCIRI','Country', 'Region']

# Loop over each principal component
for i in range(n_components):
    # Get loadings for this component
    loadings = pca16.components_[i]
    # Sort loadings by absolute value
    loadings_sorted = sorted(zip(loadings, feature_names), key=lambda x: abs(x[0]), reverse=True)
    # Print top 3 contributing features for this component
    print(f"Principal Component {i}:")
    for loading, feature_name in loadings_sorted[:4]:
        print(f"  {feature_name}: {loading:.3f}")

Principal Component 0:
  Yrs_of_school: 0.176
  Neonatal_mort: -0.167
  Literacy_15_24yrs: 0.161
  ATMs: 0.159
Principal Component 1:
  AIDS_Orph: 0.247
  AIDS_death: 0.242
  Inequality: 0.194
  Infant_vaccines: -0.184
Principal Component 2:
  Forest_change: 0.335
  Unemploy: 0.252
  Trade_openMK: 0.219
  Rape_enclave: -0.203
Principal Component 3:
  ATMs: 0.261
  Freemv_foreign: -0.255
  Co2_energy_pc: 0.206
  Prison_pop: 0.204
Principal Component 4:
  Sexwrk_size: -0.261
  Tuberculosis: 0.251
  Party_ban: -0.220
  Free_polit: -0.197
Principal Component 5:
  Pol_right_F: 0.312
  Social_ineq: -0.275
  Wasting_u5s: 0.223
  F_parliam: 0.192


Above, we can see the top 4 features for each of the 6 principal components based on how much they contribute with their loadings.<br>
Below, we use test set to evaluate PCA using a Linear Regression model.

In [63]:
X_test_pca = pca16.transform(X_val)
# Train a linear regression model on the transformed training set
model = LinearRegression()
model.fit(X_test_pca, y_val)

# Evaluate the performance of the model on the transformed test set
mae, r2 = mean_absolute_error(y_val, model.predict(X_test_pca)), model.score(X_test_pca, y_val)
print('Linear Model 2018:')
print(f'MAE: {mae}\nR^2: {r2}\n3 Cross Validation Scores: {cross_val_score(model, X_test_pca, y_val, cv=3)}')

Linear Model 2018:
MAE: 7.533656238527848e-17
R^2: 1.0
3 Cross Validation Scores: [-4.41811224  0.83724123  0.55994531]


As we can see from $R^2$, we have clearly overfit.
### 2018
Let's do the same for 2018.

In [64]:
X_train = imputed_training_df[imputed_training_df['Data_year'] == 2018].drop(target, axis=1)
y_train = imputed_training_df[imputed_training_df['Data_year'] == 2018][target]
encoder = OneHotEncoder(handle_unknown='ignore')
X_train_encoded = encoder.fit_transform(X_train[['Country','Region']])
X_train_final = np.concatenate((X_train.drop(['Country','Region'], axis=1).values, X_train_encoded.toarray()), axis=1)
X_train, X_val, y_train, y_val = train_test_split(X_train_final, y_train, test_size=0.25, random_state=21)

Let's fit the PCA now.

In [65]:
n_components = 6
pca18 = PCA(n_components=n_components)
X_train_pca = pca18.fit_transform(X_train)
X_train_pca

array([[ 1.00934050e+00,  3.60327731e+00,  8.13546509e-01,
         4.11161595e+00,  4.03329976e+00, -2.27479809e+00],
       [ 1.45643162e+00,  4.15697735e-01,  9.32320949e-01,
        -2.91102091e+00,  6.67271975e-01, -3.35738494e+00],
       [ 6.75373518e-01,  2.56123586e+00, -5.65579145e-01,
        -1.92023270e+00,  1.51601854e+00, -4.18147476e+00],
       [ 6.66943018e+00, -3.76586249e+00,  1.77393561e+00,
        -1.70282631e+00,  2.93283465e+00,  2.22759482e+00],
       [ 7.69020821e+00,  3.36124741e+00, -3.65017178e-02,
        -2.07969347e+00, -2.70498317e+00, -1.82778604e+00],
       [-2.08612875e+00,  9.28131010e-01, -3.62474375e+00,
         1.78281448e+00,  2.13839949e+00,  3.83666948e+00],
       [-3.32414810e+00, -6.15868109e-01, -3.33441901e+00,
        -1.03665409e+00,  5.81030809e-01,  1.09955003e+00],
       [ 7.40735297e+00,  2.90543491e+00, -7.57643305e-01,
        -6.06171307e-01, -2.82451194e-01, -3.46491483e+00],
       [-5.17240417e+00, -2.35270342e+00, -4.196

Let's print the *explained_variance_ratio* in order to see how much of the variance is explained by the component that was found.

In [66]:
print(pca18.explained_variance_ratio_)
print(pca18.components_)

[0.2953193  0.10630624 0.06580693 0.05722253 0.04544696 0.04230237]
[[ 1.83314871e-17 -1.60992467e-01 -9.60237908e-02 -1.04670241e-02
  -1.14518185e-01 -1.54464633e-01  8.71339935e-02  1.30862902e-01
  -1.75443210e-01  1.45319719e-01 -9.22839098e-02  1.40037235e-01
   1.20670994e-01  7.43039829e-02 -1.57489013e-01  1.33285010e-01
   1.39703714e-01 -1.14998902e-01  7.90093648e-02  1.48282441e-01
  -9.64790625e-02 -1.13214533e-01 -6.98215459e-02 -1.56804700e-01
   8.92814545e-02 -9.48481584e-02 -1.61669490e-02 -2.55714859e-02
   2.42390334e-02 -1.29390506e-01 -1.21481196e-01 -7.18388003e-03
  -1.39728921e-01 -1.36435638e-01 -1.03846697e-01 -1.04398453e-01
   1.42845564e-01 -7.34820695e-03 -1.25460293e-01 -1.58430292e-01
  -1.46003454e-01  5.41400443e-02 -1.39841154e-01 -1.37854932e-01
   8.32814711e-02 -1.21562823e-01 -8.20620461e-02 -5.91525564e-02
  -6.14487455e-02 -8.74632834e-02 -1.05955145e-01 -1.44972292e-02
   2.84697593e-02 -5.64669476e-02 -1.50954110e-01 -5.60775807e-02
   4.185

In [67]:
# Get feature names
feature_names = [ 'Data_year', 'KOF_Globalis', 'Trade_open', 'FDI', 'VDEM_Libdem', 'GDPpc', 'Armedcon', 'Pol_terror', 'SDGI', 'Poverty', 'Cereal_yield', 'Stunting_u5s', 'Undernourish', 'Wasting_u5s', 'Life_expect', 'Maternal_mort', 'Neonatal_mort', 'Wellbeing', 'Tuberculosis', 'Infant_mort', 'Infant_vaccines', 'Literacy_15_24yrs', 'Primary_school', 'Yrs_of_school', 'Lack_contraception', 'F_school', 'M_school', 'F_parliam', 'Freshwater', 'Sanitation', 'Water_acc', 'Co2_fuel', 'Electric_acc', 'Fuel_acc', 'Growth_rate', 'ATMs', 'Child_lab', 'Unemploy', 'Infrastruct', 'Internet_use', 'Broadband', 'Inequality', 'Piped_water', 'Treated_waste', 'Climate_chg_vuln', 'Co2_energy_pc', 'Ocean_biodiv', 'Ocean_clean', 'Ocean_protect', 'Ocean_fisheries', 'Fish_overexploit', 'Terrestrial_protect', 'Forest_change', 'Species_survival', 'CPI', 'Gov_efficien', 'Homicides', 'Prison_pop', 'Property_rights', 'Safe_night', 'Regist_birth', 'Risk_masskill', 'Masskill_ongo', 'Masskill_ever', 'GDPpc_growth', 'Minority_rule', 'Ethnic_fract', 'Battle_deaths', 'Pol_cand_restr', 'Party_ban', 'Relig_freeMK', 'Polkill_apprvd', 'Freemv_M', 'Freemv_F', 'Free_discuss', 'Social_ineq', 'Soc_powerdist', 'Trade_openMK', 'Democ', 'Sexwrk_size', 'Sexwrk_HIV', 'Sexwrk_condom', 'Sexwrk_Syphilis', 'Physrights_indx', 'Extrajud_kill', 'Pol_impris', 'Torture', 'Polrights_indx', 'Free_assem', 'Freemv_foreign', 'Freemv_dom', 'Free_speech', 'Free_polit', 'Relig_freeCIRI', 'Econ_right_F', 'Pol_right_F', 'Indep_judic', 'Rape_prev', 'Rape_report', 'Rape_enclave', 'Rape_compl', 'Gender_equal', 'Hum_traff', 'AIDS_death', 'AIDS_Orph', 'Phys_secF', 'Work_rightCIRI','Country', 'Region']

# Loop over each principal component
for i in range(n_components):
    # Get loadings for this component
    loadings = pca18.components_[i]
    # Sort loadings by absolute value
    loadings_sorted = sorted(zip(loadings, feature_names), key=lambda x: abs(x[0]), reverse=True)
    # Print top 3 contributing features for this component
    print(f"Principal Component {i}:")
    for loading, feature_name in loadings_sorted[:4]:
        print(f"  {feature_name}: {loading:.3f}")

Principal Component 0:
  SDGI: -0.175
  KOF_Globalis: -0.161
  Internet_use: -0.158
  Life_expect: -0.157
Principal Component 1:
  Freshwater: 0.226
  Safe_night: 0.201
  Polrights_indx: -0.184
  Free_polit: -0.181
Principal Component 2:
  Gov_efficien: 0.320
  Property_rights: 0.275
  Torture: 0.267
  GDPpc: 0.265
Principal Component 3:
  Forest_change: 0.242
  Hum_traff: 0.241
  Democ: -0.217
  Trade_open: 0.207
Principal Component 4:
  Unemploy: -0.270
  Freshwater: -0.242
  Party_ban: -0.233
  Polkill_apprvd: -0.219
Principal Component 5:
  Growth_rate: -0.316
  Polkill_apprvd: 0.307
  Wasting_u5s: -0.243
  Democ: -0.204


Above, we can see the top 4 features for each of the 6 principal components based on how much they contribute with their loadings.<br>
Below, we use test set to evaluate PCA using a Linear Regression model.

In [68]:
X_test_pca = pca18.transform(X_val)
# Train a linear regression model on the transformed training set
model = LinearRegression()
model.fit(X_test_pca, y_val)

# Evaluate the performance of the model on the transformed test set
mae, r2 = mean_absolute_error(y_val, model.predict(X_test_pca)), model.score(X_test_pca, y_val)
print('Linear Model 2018:')
print(f'MAE: {mae}\nR^2: {r2}\n6 Cross Validation Scores: {cross_val_score(model, X_test_pca, y_val, cv=6)}')

Linear Model 2018:
MAE: 0.23738910114882952
R^2: 0.7284241852623536
6 Cross Validation Scores: [-1.37619054e+01 -4.45943257e+01 -2.78729249e+04 -8.18000482e+02
 -1.02670601e+01 -8.88357646e-01]


The last model for 2018 is by far the best model for predictions for 2018!