# MIT COLLABORATION PROJECT

## Introduction
Out-of-school children rate (SDG4.1.4) – Percentage of children or young people in the official age range for a given level of education who are not attending either pre-primary, primary, secondary, or higher levels of education. 

#### Unit of measure:	Percentage
#### Time frame for survey	
Household survey data from the past 10 years are used for the calculation of adjusted net attendance rate. For countries with multiple years of data, the most recent dataset is used.

### Glossary - the database contains the following
**ISO** Three-digit alphabetical codes International Standard ISO 3166-1 assigned by the International Organization for Standardization (ISO). The latest version is available online at http://www.iso.org/iso/home/standards/country_codes.htm. (column A)
**Countries and areas:**    The UNICEF Global databases contain a set of 202 countries and Kosovo under UNSC res. 1244* as reported on through the State of the World's Children Statistical Annex 2017 (column B)
    
**Data Source:**    Short name for data source, followed by the year(s) in which the data collection (e.g., survey interviews) took place (column P)
**Time period:**    Represents the year(s) in which the data collection (e.g. survey interviews) took place. (column Q)
    
**Region, Sub-region**  UNICEF regions (column C) and UNICEF Sub-regions (column D)
EAP East Asia and the Pacific
ECA Europe and Central Asia
EECA    Eastern Europe and Central Asia
ESA Eastern and Southern Africa
LAC Latin America and the Caribbean
MENA    Middle East and North Africa
NA  North America
SA  South Asia
SSA Sub-Saharan Africa
WCA West and Central Africa
**Development regions:**    Economies are currently divided into four income groupings: low, lower-middle, upper-middle, and high. Income is measured using **gross national income (GNI) per capita, in U.S. dollars**, converted from local currency using the World Bank Atlas method (column E).
    
**Regional Aggregations**   
Regional aggregates with less than 50% of the corresponding school-aged population coverage have been suppressed    
        
* All references to Kosovo in this dataset should be understood to be in the context of United Nations Security Council resolution 1244 (1999).     
​

In [47]:
# Importing Necessary Libraries for project
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from matplotlib import style
import seaborn as sns

In [48]:
# reading the data set
outSchoolRate = pd.read_csv("merged.csv", encoding='ISO-8859-1')
outSchoolRate.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,37.0,47.0,28.0,42.0,19.0,...,43.0,63.0,45.0,70.0,64.0,64.0,54.0,40.0,DHS 2015,2015.0
1,ALB,Albania,ECA,EECA,More Developed,2.0,2.0,3.0,4.0,1.0,...,12.0,16.0,9.0,27.0,11.0,11.0,5.0,5.0,DHS 2017-18,2018.0
2,DZA,Algeria,MENA,MENA,Less Developed,2.0,2.0,2.0,2.0,1.0,...,27.0,27.0,20.0,38.0,27.0,22.0,17.0,10.0,MICS 2019,2020.0
3,AND,Andorra,ECA,WE,More Developed,,,,,,...,,,,,,,,,,
4,AGO,Angola,SSA,ESA,Least Developed,22.0,22.0,21.0,35.0,14.0,...,21.0,53.0,19.0,58.0,50.0,27.0,17.0,9.0,DHS 2015-16,2016.0


In [49]:
outSchoolRate.columns

Index(['ISO3', 'Country', 'Region ', 'Sub-Region', 'Development Region ',
       'P_Total', 'Femal_P', 'Male_P', 'P_residence_Rural',
       'P_residence_Uraban ', 'P_Wealth_Porrest', 'P_wealth_Second',
       'P_wealth_Middle', 'P_wealth_Fourth', 'P_wealth_Richest',
       'P_Data_Source', 'P_Time_Period', 'LS_Total', 'Female_LS', 'Male_lS',
       'LS_residence_Rural', 'LS_residence_Uraban ', 'LS_Wealth_Porrest',
       'LS_wealth_Second', 'LS_wealth_Middle', 'LS_wealth_Fourth',
       'LS_wealth_Richest', 'LS_Data_Source', 'LS_Time_Period', 'US_Total',
       'Female_US', 'Male_US', 'USresidence_Rural', 'USresidence_Uraban ',
       'US_Wealth_Porrest', 'US_wealth_second', 'US_wealth_Middle',
       'US_wealth_Fourth', 'US_wealth_Richest', 'US_Data_Source',
       'US_Time_Period'],
      dtype='object')

## Data cleaning

In [50]:
# we have to get a copy from the original data, so that not to lose
#the original one in case of any changes
outSchoolRateDF= outSchoolRate.copy()

In [51]:
#checkingdata types of columns
outSchoolRateDF.dtypes

ISO3                     object
Country                  object
Region                   object
Sub-Region               object
Development Region       object
P_Total                 float64
Femal_P                 float64
Male_P                  float64
P_residence_Rural       float64
P_residence_Uraban      float64
P_Wealth_Porrest        float64
P_wealth_Second         float64
P_wealth_Middle         float64
P_wealth_Fourth         float64
P_wealth_Richest        float64
P_Data_Source            object
P_Time_Period           float64
LS_Total                float64
Female_LS               float64
Male_lS                 float64
LS_residence_Rural      float64
LS_residence_Uraban     float64
LS_Wealth_Porrest       float64
LS_wealth_Second        float64
LS_wealth_Middle        float64
LS_wealth_Fourth        float64
LS_wealth_Richest       float64
LS_Data_Source           object
LS_Time_Period          float64
US_Total                float64
Female_US               float64
Male_US 

In [52]:
outSchoolRateDF.shape

(203, 41)

In [53]:
#Checking which columns have null values
outSchoolRateDF.isnull().any()

ISO3                    False
Country                 False
Region                   True
Sub-Region               True
Development Region       True
P_Total                  True
Femal_P                  True
Male_P                   True
P_residence_Rural        True
P_residence_Uraban       True
P_Wealth_Porrest         True
P_wealth_Second          True
P_wealth_Middle          True
P_wealth_Fourth          True
P_wealth_Richest         True
P_Data_Source            True
P_Time_Period            True
LS_Total                 True
Female_LS                True
Male_lS                  True
LS_residence_Rural       True
LS_residence_Uraban      True
LS_Wealth_Porrest        True
LS_wealth_Second         True
LS_wealth_Middle         True
LS_wealth_Fourth         True
LS_wealth_Richest        True
LS_Data_Source           True
LS_Time_Period           True
US_Total                 True
Female_US                True
Male_US                  True
USresidence_Rural        True
USresidenc

In [54]:
#Checking what is the number of null values in each column
outSchoolRateDF.isnull().sum()

ISO3                     0
Country                  0
Region                   2
Sub-Region               3
Development Region       1
P_Total                 85
Femal_P                 85
Male_P                  85
P_residence_Rural       89
P_residence_Uraban      89
P_Wealth_Porrest        95
P_wealth_Second         95
P_wealth_Middle         95
P_wealth_Fourth         95
P_wealth_Richest        95
P_Data_Source           85
P_Time_Period           85
LS_Total                87
Female_LS               87
Male_lS                 87
LS_residence_Rural      90
LS_residence_Uraban     90
LS_Wealth_Porrest       96
LS_wealth_Second        96
LS_wealth_Middle        96
LS_wealth_Fourth        96
LS_wealth_Richest       96
LS_Data_Source          87
LS_Time_Period          87
US_Total                89
Female_US               89
Male_US                 89
USresidence_Rural       91
USresidence_Uraban      91
US_Wealth_Porrest       97
US_wealth_second        97
US_wealth_Middle        97
U

In [56]:
#Since there are missing values in Region and sub-region, and development regions, we need to redefine the variables
#and since these variables are string data types, we can easily modify them. 
outSchoolRateDF[outSchoolRateDF['Region '].isna() == True]

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
33,CAN,Canada,,,More Developed,,,,,,...,,,,,,,,,,
194,USA,United States,,,More Developed,,,,,,...,,,,,,,,,,


In [59]:
outSchoolRateDF[outSchoolRateDF['Development Region '].isna() == True]

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
94,XKX,Kosovo under UNSC res. 1244*,ECA,,,2.0,2.0,2.0,1.0,3.0,...,11.0,9.0,11.0,23.0,9.0,6.0,4.0,2.0,MICS 2019-20,2020.0


In [62]:
outSchoolRateDF[outSchoolRateDF['Sub-Region'].isna() == True]

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
33,CAN,Canada,,,More Developed,,,,,,...,,,,,,,,,,
94,XKX,Kosovo under UNSC res. 1244*,ECA,,,2.0,2.0,2.0,1.0,3.0,...,11.0,9.0,11.0,23.0,9.0,6.0,4.0,2.0,MICS 2019-20,2020.0
194,USA,United States,,,More Developed,,,,,,...,,,,,,,,,,


In [81]:
outSchoolRateDF.columns

Index(['ISO3', 'Country', 'Region ', 'Sub-Region', 'Development Region ',
       'P_Total', 'Femal_P', 'Male_P', 'P_residence_Rural',
       'P_residence_Uraban ', 'P_Wealth_Porrest', 'P_wealth_Second',
       'P_wealth_Middle', 'P_wealth_Fourth', 'P_wealth_Richest',
       'P_Data_Source', 'P_Time_Period', 'LS_Total', 'Female_LS', 'Male_lS',
       'LS_residence_Rural', 'LS_residence_Uraban ', 'LS_Wealth_Porrest',
       'LS_wealth_Second', 'LS_wealth_Middle', 'LS_wealth_Fourth',
       'LS_wealth_Richest', 'LS_Data_Source', 'LS_Time_Period', 'US_Total',
       'Female_US', 'Male_US', 'USresidence_Rural', 'USresidence_Uraban ',
       'US_Wealth_Porrest', 'US_wealth_second', 'US_wealth_Middle',
       'US_wealth_Fourth', 'US_wealth_Richest', 'US_Data_Source',
       'US_Time_Period'],
      dtype='object')

In [71]:
# redefine the variables

outSchoolRateDF.loc[33, 'Region '] = 'NA'
outSchoolRateDF.loc[33, 'Sub-Region'] = 'NA'
outSchoolRateDF.loc[194,'Region '] = 'NA'
outSchoolRateDF.loc[194,'Sub-Region'] = 'NA'
outSchoolRateDF.loc[94, 'Sub-Region'] = 'NA'
outSchoolRateDF.loc[94, 'Development Region '] = 'NA'

In [82]:
# let's recheck again
outSchoolRateDF.isna().sum()

ISO3                     0
Country                  0
Region                   0
Sub-Region               0
Development Region       0
P_Total                 85
Femal_P                 85
Male_P                  85
P_residence_Rural       89
P_residence_Uraban      89
P_Wealth_Porrest        95
P_wealth_Second         95
P_wealth_Middle         95
P_wealth_Fourth         95
P_wealth_Richest        95
P_Data_Source           85
P_Time_Period           85
LS_Total                87
Female_LS               87
Male_lS                 87
LS_residence_Rural      90
LS_residence_Uraban     90
LS_Wealth_Porrest       96
LS_wealth_Second        96
LS_wealth_Middle        96
LS_wealth_Fourth        96
LS_wealth_Richest       96
LS_Data_Source          87
LS_Time_Period          87
US_Total                89
Female_US               89
Male_US                 89
USresidence_Rural       91
USresidence_Uraban      91
US_Wealth_Porrest       97
US_wealth_second        97
US_wealth_Middle        97
U

In [75]:
print(outSchoolRateDF.loc[33])
print(outSchoolRateDF.loc[194])
print(outSchoolRateDF.loc[94])

ISO3                               CAN
Country                         Canada
Region                              NA
Sub-Region                          NA
Development Region      More Developed
P_Total                            NaN
Femal_P                            NaN
Male_P                             NaN
P_residence_Rural                  NaN
P_residence_Uraban                 NaN
P_Wealth_Porrest                   NaN
P_wealth_Second                    NaN
P_wealth_Middle                    NaN
P_wealth_Fourth                    NaN
P_wealth_Richest                   NaN
P_Data_Source                      NaN
P_Time_Period                      NaN
LS_Total                           NaN
Female_LS                          NaN
Male_lS                            NaN
LS_residence_Rural                 NaN
LS_residence_Uraban                NaN
LS_Wealth_Porrest                  NaN
LS_wealth_Second                   NaN
LS_wealth_Middle                   NaN
LS_wealth_Fourth         

In [83]:
# identify the dataset 
outSchoolRateDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 203 entries, 0 to 202
Data columns (total 41 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   ISO3                  203 non-null    object 
 1   Country               203 non-null    object 
 2   Region                203 non-null    object 
 3   Sub-Region            203 non-null    object 
 4   Development Region    203 non-null    object 
 5   P_Total               118 non-null    float64
 6   Femal_P               118 non-null    float64
 7   Male_P                118 non-null    float64
 8   P_residence_Rural     114 non-null    float64
 9   P_residence_Uraban    114 non-null    float64
 10  P_Wealth_Porrest      108 non-null    float64
 11  P_wealth_Second       108 non-null    float64
 12  P_wealth_Middle       108 non-null    float64
 13  P_wealth_Fourth       108 non-null    float64
 14  P_wealth_Richest      108 non-null    float64
 15  P_Data_Source         1

In [77]:
# describe statistics
outSchoolRateDF.describe()

Unnamed: 0,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,P_Wealth_Porrest,P_wealth_Second,P_wealth_Middle,P_wealth_Fourth,P_wealth_Richest,...,Female_US,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Time_Period
count,118.0,118.0,118.0,114.0,114.0,108.0,108.0,108.0,108.0,108.0,...,114.0,114.0,112.0,112.0,106.0,106.0,106.0,106.0,106.0,114.0
mean,9.940678,10.110169,9.838983,12.122807,5.929825,17.481481,12.796296,9.694444,7.055556,3.851852,...,30.307018,28.245614,35.401786,22.928571,46.349057,36.839623,31.622642,24.783019,16.198113,2016.298246
std,13.583104,14.394196,12.942985,16.288407,7.854562,20.77943,17.616241,14.774605,10.958574,6.246965,...,22.60438,18.078766,22.734448,15.152507,24.875707,24.973943,22.179979,19.911712,14.991058,2.71974
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2010.0
25%,2.0,2.0,2.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,...,13.0,15.0,17.75,12.0,28.0,17.25,14.25,11.0,5.0,2014.25
50%,4.0,3.0,4.0,4.5,3.0,8.0,4.0,3.0,3.0,2.0,...,25.0,25.0,31.5,19.5,42.0,32.0,25.0,18.0,10.5,2017.0
75%,14.75,13.75,14.0,19.0,6.75,32.0,20.25,12.5,7.0,4.0,...,45.75,42.0,54.25,32.25,68.0,58.5,44.5,36.75,24.0,2018.0
max,72.0,75.0,70.0,78.0,56.0,86.0,84.0,77.0,65.0,44.0,...,91.0,79.0,95.0,70.0,99.0,97.0,97.0,92.0,63.0,2020.0


We must determine how many missing values there are and whether they are missing at random before we may impute the missing values. The issue lies in certain rows when all of the data is completely missing. Furthermore, the data are not missing at random, and excluding them entirely from the dataset will undoubtedly induce some biases. 

However, because the dataset includes both least developed and developed countries with varying levels of completion rate, imputing data straight as a column will introduce outliers. There will undoubtedly be biases introduced into the dataset if we substitute the mean or median that are determined using these variables for the missing values. 

Definition: Stratified imputation is the process of imputing missing values within each stratum independently once the data has been divided into strata (e.g., by area, income group, or development status). This method lessens bias and helps account for variations in literacy rates among various populations.

In [86]:
# identify different economies in the dataset 
outSchoolRateDF['Development Region '].unique()

array(['Least Developed', 'More Developed', 'Less Developed',
       'Not Classified', 'NA'], dtype=object)

In [87]:
outSchoolRateDF[outSchoolRateDF['Development Region '] == 'NA']

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
94,XKX,Kosovo under UNSC res. 1244*,ECA,,,2.0,2.0,2.0,1.0,3.0,...,11.0,9.0,11.0,23.0,9.0,6.0,4.0,2.0,MICS 2019-20,2020.0


In [88]:
# group the countries based on their economies 

least_developed_countries = outSchoolRateDF[outSchoolRateDF['Development Region '] == 'Least Developed'].reset_index(drop = True)
less_developed_countries = outSchoolRateDF[outSchoolRateDF['Development Region '] == 'Less Developed'].reset_index(drop = True)
more_developed_countries = outSchoolRateDF[outSchoolRateDF['Development Region '] == 'More Developed'].reset_index(drop = True)
unclassified_countries = outSchoolRateDF[outSchoolRateDF['Development Region '] == 'Not Classified'].reset_index(drop = True)

In [89]:
least_developed_countries.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,37.0,47.0,28.0,42.0,19.0,...,43.0,63.0,45.0,70.0,64.0,64.0,54.0,40.0,DHS 2015,2015.0
1,AGO,Angola,SSA,ESA,Least Developed,22.0,22.0,21.0,35.0,14.0,...,21.0,53.0,19.0,58.0,50.0,27.0,17.0,9.0,DHS 2015-16,2016.0
2,BGD,Bangladesh,SA,SA,Least Developed,6.0,5.0,8.0,6.0,6.0,...,37.0,32.0,30.0,45.0,35.0,29.0,28.0,19.0,MICS 2019,2019.0
3,BEN,Benin,SSA,WCA,Least Developed,32.0,35.0,28.0,38.0,21.0,...,50.0,65.0,50.0,82.0,73.0,61.0,48.0,36.0,DHS 2017-18,2018.0
4,BTN,Bhutan,SA,SA,Least Developed,8.0,7.0,9.0,10.0,3.0,...,39.0,44.0,27.0,55.0,53.0,42.0,27.0,24.0,MICS 2010,2010.0


In [90]:
less_developed_countries.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,DZA,Algeria,MENA,MENA,Less Developed,2.0,2.0,2.0,2.0,1.0,...,27.0,27.0,20.0,38.0,27.0,22.0,17.0,10.0,MICS 2019,2020.0
1,ATG,Antigua and Barbuda,LAC,LAC,Less Developed,,,,,,...,,,,,,,,,,
2,ARG,Argentina,LAC,LAC,Less Developed,0.0,0.0,0.0,,,...,8.0,,,14.0,7.0,7.0,2.0,0.0,MICS 2019-20,2020.0
3,ARM,Armenia,ECA,EECA,Less Developed,5.0,4.0,5.0,6.0,3.0,...,9.0,9.0,5.0,12.0,6.0,10.0,5.0,2.0,DHS 2015-16,2016.0
4,AZE,Azerbaijan,ECA,EECA,Less Developed,,,,,,...,,,,,,,,,,


In [91]:
more_developed_countries.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,ALB,Albania,ECA,EECA,More Developed,2.0,2.0,3.0,4.0,1.0,...,12.0,16.0,9.0,27.0,11.0,11.0,5.0,5.0,DHS 2017-18,2018.0
1,AND,Andorra,ECA,WE,More Developed,,,,,,...,,,,,,,,,,
2,AUS,Australia,EAP,EAP,More Developed,,,,,,...,,,,,,,,,,
3,AUT,Austria,ECA,WE,More Developed,,,,,,...,,,,,,,,,,
4,BLR,Belarus,ECA,EECA,More Developed,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,1.0,2.0,0.0,0.0,2.0,0.0,MICS 2019,2020.0


In [92]:
unclassified_countries.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,AIA,Anguilla,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
1,VGB,British Virgin Islands,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
2,VAT,Holy See,ECA,WE,Not Classified,,,,,,...,,,,,,,,,,
3,MSR,Montserrat,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
4,TKL,Tokelau,EAP,EAP,Not Classified,,,,,,...,,,,,,,,,,


In [93]:
# indentify no. of countries in each straturm
print('No. of least developed countries: ', least_developed_countries.shape[0])
print('No. of less developed countries: ', less_developed_countries.shape[0])
print('No. of more developed countries: ', more_developed_countries.shape[0])
print('No. of unclassified countries: ', unclassified_countries.shape[0])
print("Total: ", least_developed_countries.shape[0]+less_developed_countries.shape[0]+more_developed_countries.shape[0]+unclassified_countries.shape[0])

No. of least developed countries:  47
No. of less developed countries:  99
No. of more developed countries:  50
No. of unclassified countries:  6
Total:  202


In [94]:
# identify the no. of missing rows in least_developed_countries data frame 
least_developed_countries.isna().sum()

ISO3                    0
Country                 0
Region                  0
Sub-Region              0
Development Region      0
P_Total                 5
Femal_P                 5
Male_P                  5
P_residence_Rural       5
P_residence_Uraban      5
P_Wealth_Porrest        5
P_wealth_Second         5
P_wealth_Middle         5
P_wealth_Fourth         5
P_wealth_Richest        5
P_Data_Source           5
P_Time_Period           5
LS_Total                5
Female_LS               5
Male_lS                 5
LS_residence_Rural      5
LS_residence_Uraban     5
LS_Wealth_Porrest       5
LS_wealth_Second        5
LS_wealth_Middle        5
LS_wealth_Fourth        5
LS_wealth_Richest       5
LS_Data_Source          5
LS_Time_Period          5
US_Total                5
Female_US               5
Male_US                 5
USresidence_Rural       5
USresidence_Uraban      5
US_Wealth_Porrest       5
US_wealth_second        5
US_wealth_Middle        5
US_wealth_Fourth        5
US_wealth_Ri

In [95]:
# identify median for each columns in least_developed_countries data frame
least_developed_countries.median(numeric_only=True)

P_Total                   17.5
Femal_P                   17.5
Male_P                    18.5
P_residence_Rural         23.5
P_residence_Uraban         9.0
P_Wealth_Porrest          33.0
P_wealth_Second           22.5
P_wealth_Middle           14.5
P_wealth_Fourth           10.0
P_wealth_Richest           5.0
P_Time_Period           2016.0
LS_Total                  23.0
Female_LS                 25.5
Male_lS                   20.5
LS_residence_Rural        28.5
LS_residence_Uraban       16.5
LS_Wealth_Porrest         38.5
LS_wealth_Second          29.0
LS_wealth_Middle          22.5
LS_wealth_Fourth          17.0
LS_wealth_Richest         12.0
LS_Time_Period          2016.0
US_Total                  51.0
Female_US                 54.0
Male_US                   43.5
USresidence_Rural         58.0
USresidence_Uraban        35.5
US_Wealth_Porrest         69.0
US_wealth_second          59.5
US_wealth_Middle          53.5
US_wealth_Fourth          44.0
US_wealth_Richest         24.0
US_Time_

In [96]:
# median imputation 
least_developed_countries = least_developed_countries.fillna(least_developed_countries.median(numeric_only=True))

In [115]:
# check the imputation
least_developed_countries.isnull().sum()

ISO3                    0
Country                 0
Region                  0
Sub-Region              0
Development Region      0
P_Total                 0
Femal_P                 0
Male_P                  0
P_residence_Rural       0
P_residence_Uraban      0
P_Wealth_Porrest        0
P_wealth_Second         0
P_wealth_Middle         0
P_wealth_Fourth         0
P_wealth_Richest        0
P_Data_Source           5
P_Time_Period           0
LS_Total                0
Female_LS               0
Male_lS                 0
LS_residence_Rural      0
LS_residence_Uraban     0
LS_Wealth_Porrest       0
LS_wealth_Second        0
LS_wealth_Middle        0
LS_wealth_Fourth        0
LS_wealth_Richest       0
LS_Data_Source          5
LS_Time_Period          0
US_Total                0
Female_US               0
Male_US                 0
USresidence_Rural       0
USresidence_Uraban      0
US_Wealth_Porrest       0
US_wealth_second        0
US_wealth_Middle        0
US_wealth_Fourth        0
US_wealth_Ri

In [98]:
# identify the no. of missing rows in less_developed_countries data frame 
less_developed_countries.isna().sum()

ISO3                     0
Country                  0
Region                   0
Sub-Region               0
Development Region       0
P_Total                 34
Femal_P                 34
Male_P                  34
P_residence_Rural       37
P_residence_Uraban      37
P_Wealth_Porrest        43
P_wealth_Second         43
P_wealth_Middle         43
P_wealth_Fourth         43
P_wealth_Richest        43
P_Data_Source           34
P_Time_Period           34
LS_Total                36
Female_LS               36
Male_lS                 36
LS_residence_Rural      38
LS_residence_Uraban     38
LS_Wealth_Porrest       44
LS_wealth_Second        44
LS_wealth_Middle        44
LS_wealth_Fourth        44
LS_wealth_Richest       44
LS_Data_Source          36
LS_Time_Period          36
US_Total                37
Female_US               37
Male_US                 37
USresidence_Rural       39
USresidence_Uraban      39
US_Wealth_Porrest       45
US_wealth_second        45
US_wealth_Middle        45
U

In [99]:
# identify median for each columns in less_developed_countries data frame
less_developed_countries.median(numeric_only=True)

P_Total                    2.0
Femal_P                    2.0
Male_P                     3.0
P_residence_Rural          2.0
P_residence_Uraban         2.0
P_Wealth_Porrest           3.5
P_wealth_Second            2.0
P_wealth_Middle            2.0
P_wealth_Fourth            2.0
P_wealth_Richest           1.0
P_Time_Period           2017.0
LS_Total                   4.0
Female_LS                  4.0
Male_lS                    5.0
LS_residence_Rural         6.0
LS_residence_Uraban        4.0
LS_Wealth_Porrest          9.0
LS_wealth_Second           5.0
LS_wealth_Middle           4.0
LS_wealth_Fourth           2.0
LS_wealth_Richest          2.0
LS_Time_Period          2017.0
US_Total                  20.0
Female_US                 16.5
Male_US                   21.0
USresidence_Rural         26.0
USresidence_Uraban        16.5
US_Wealth_Porrest         32.5
US_wealth_second          25.5
US_wealth_Middle          21.0
US_wealth_Fourth          13.0
US_wealth_Richest          8.0
US_Time_

In [100]:
# median imputation 
less_developed_countries = less_developed_countries.fillna(less_developed_countries.median(numeric_only=True))

In [101]:
# check the imputation
less_developed_countries.head(5)

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,DZA,Algeria,MENA,MENA,Less Developed,2.0,2.0,2.0,2.0,1.0,...,27.0,27.0,20.0,38.0,27.0,22.0,17.0,10.0,MICS 2019,2020.0
1,ATG,Antigua and Barbuda,LAC,LAC,Less Developed,2.0,2.0,3.0,2.0,2.0,...,21.0,26.0,16.5,32.5,25.5,21.0,13.0,8.0,,2017.0
2,ARG,Argentina,LAC,LAC,Less Developed,0.0,0.0,0.0,2.0,2.0,...,8.0,26.0,16.5,14.0,7.0,7.0,2.0,0.0,MICS 2019-20,2020.0
3,ARM,Armenia,ECA,EECA,Less Developed,5.0,4.0,5.0,6.0,3.0,...,9.0,9.0,5.0,12.0,6.0,10.0,5.0,2.0,DHS 2015-16,2016.0
4,AZE,Azerbaijan,ECA,EECA,Less Developed,2.0,2.0,3.0,2.0,2.0,...,21.0,26.0,16.5,32.5,25.5,21.0,13.0,8.0,,2017.0


In [117]:
less_developed_countries.isnull().sum()

ISO3                     0
Country                  0
Region                   0
Sub-Region               0
Development Region       0
P_Total                  0
Femal_P                  0
Male_P                   0
P_residence_Rural        0
P_residence_Uraban       0
P_Wealth_Porrest         0
P_wealth_Second          0
P_wealth_Middle          0
P_wealth_Fourth          0
P_wealth_Richest         0
P_Data_Source           34
P_Time_Period            0
LS_Total                 0
Female_LS                0
Male_lS                  0
LS_residence_Rural       0
LS_residence_Uraban      0
LS_Wealth_Porrest        0
LS_wealth_Second         0
LS_wealth_Middle         0
LS_wealth_Fourth         0
LS_wealth_Richest        0
LS_Data_Source          36
LS_Time_Period           0
US_Total                 0
Female_US                0
Male_US                  0
USresidence_Rural        0
USresidence_Uraban       0
US_Wealth_Porrest        0
US_wealth_second         0
US_wealth_Middle         0
U

In [102]:
# identify the no. of missing rows in more_developed_countries data frame 
more_developed_countries.isna().sum()

ISO3                     0
Country                  0
Region                   0
Sub-Region               0
Development Region       0
P_Total                 41
Femal_P                 41
Male_P                  41
P_residence_Rural       42
P_residence_Uraban      42
P_Wealth_Porrest        42
P_wealth_Second         42
P_wealth_Middle         42
P_wealth_Fourth         42
P_wealth_Richest        42
P_Data_Source           41
P_Time_Period           41
LS_Total                41
Female_LS               41
Male_lS                 41
LS_residence_Rural      42
LS_residence_Uraban     42
LS_Wealth_Porrest       42
LS_wealth_Second        42
LS_wealth_Middle        42
LS_wealth_Fourth        42
LS_wealth_Richest       42
LS_Data_Source          41
LS_Time_Period          41
US_Total                42
Female_US               42
Male_US                 42
USresidence_Rural       42
USresidence_Uraban      42
US_Wealth_Porrest       42
US_wealth_second        42
US_wealth_Middle        42
U

In [103]:
# identify median for each columns in more_developed_countries data frame
more_developed_countries.median(numeric_only=True)

P_Total                    2.0
Femal_P                    2.0
Male_P                     2.0
P_residence_Rural          2.0
P_residence_Uraban         1.0
P_Wealth_Porrest           2.5
P_wealth_Second            1.5
P_wealth_Middle            1.5
P_wealth_Fourth            1.0
P_wealth_Richest           0.5
P_Time_Period           2018.0
LS_Total                   1.0
Female_LS                  1.0
Male_lS                    1.0
LS_residence_Rural         0.5
LS_residence_Uraban        1.0
LS_Wealth_Porrest          4.0
LS_wealth_Second           0.5
LS_wealth_Middle           0.5
LS_wealth_Fourth           0.0
LS_wealth_Richest          0.0
LS_Time_Period          2018.0
US_Total                   7.5
Female_US                  6.0
Male_US                    8.5
USresidence_Rural          9.0
USresidence_Uraban         6.0
US_Wealth_Porrest         21.5
US_wealth_second           6.5
US_wealth_Middle           7.0
US_wealth_Fourth           2.5
US_wealth_Richest          2.0
US_Time_

In [104]:
# median imputation 
more_developed_countries = more_developed_countries.fillna(more_developed_countries.median(numeric_only=True))

In [105]:
# check the imputation
more_developed_countries.head(5)

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,ALB,Albania,ECA,EECA,More Developed,2.0,2.0,3.0,4.0,1.0,...,12.0,16.0,9.0,27.0,11.0,11.0,5.0,5.0,DHS 2017-18,2018.0
1,AND,Andorra,ECA,WE,More Developed,2.0,2.0,2.0,2.0,1.0,...,8.5,9.0,6.0,21.5,6.5,7.0,2.5,2.0,,2018.0
2,AUS,Australia,EAP,EAP,More Developed,2.0,2.0,2.0,2.0,1.0,...,8.5,9.0,6.0,21.5,6.5,7.0,2.5,2.0,,2018.0
3,AUT,Austria,ECA,WE,More Developed,2.0,2.0,2.0,2.0,1.0,...,8.5,9.0,6.0,21.5,6.5,7.0,2.5,2.0,,2018.0
4,BLR,Belarus,ECA,EECA,More Developed,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,1.0,2.0,0.0,0.0,2.0,0.0,MICS 2019,2020.0


In [119]:
more_developed_countries.isnull().sum()

ISO3                     0
Country                  0
Region                   0
Sub-Region               0
Development Region       0
P_Total                  0
Femal_P                  0
Male_P                   0
P_residence_Rural        0
P_residence_Uraban       0
P_Wealth_Porrest         0
P_wealth_Second          0
P_wealth_Middle          0
P_wealth_Fourth          0
P_wealth_Richest         0
P_Data_Source           41
P_Time_Period            0
LS_Total                 0
Female_LS                0
Male_lS                  0
LS_residence_Rural       0
LS_residence_Uraban      0
LS_Wealth_Porrest        0
LS_wealth_Second         0
LS_wealth_Middle         0
LS_wealth_Fourth         0
LS_wealth_Richest        0
LS_Data_Source          41
LS_Time_Period           0
US_Total                 0
Female_US                0
Male_US                  0
USresidence_Rural        0
USresidence_Uraban       0
US_Wealth_Porrest        0
US_wealth_second         0
US_wealth_Middle         0
U

In [106]:
unclassified_countries

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,AIA,Anguilla,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
1,VGB,British Virgin Islands,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
2,VAT,Holy See,ECA,WE,Not Classified,,,,,,...,,,,,,,,,,
3,MSR,Montserrat,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
4,TKL,Tokelau,EAP,EAP,Not Classified,,,,,,...,,,,,,,,,,
5,TCA,Turks and Caicos Islands,LAC,LAC,Not Classified,2.0,1.0,4.0,0.0,2.0,...,8.0,0.0,11.0,39.0,3.0,21.0,16.0,1.0,MICS 2019-20,2020.0


In [108]:
# concatenate all together back to form a full dataframe
conc_df= pd.concat([least_developed_countries, less_developed_countries, more_developed_countries, unclassified_countries], 
                        axis = 0,
                       ignore_index = True)

In [109]:
conc_df.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,37.0,47.0,28.0,42.0,19.0,...,43.0,63.0,45.0,70.0,64.0,64.0,54.0,40.0,DHS 2015,2015.0
1,AGO,Angola,SSA,ESA,Least Developed,22.0,22.0,21.0,35.0,14.0,...,21.0,53.0,19.0,58.0,50.0,27.0,17.0,9.0,DHS 2015-16,2016.0
2,BGD,Bangladesh,SA,SA,Least Developed,6.0,5.0,8.0,6.0,6.0,...,37.0,32.0,30.0,45.0,35.0,29.0,28.0,19.0,MICS 2019,2019.0
3,BEN,Benin,SSA,WCA,Least Developed,32.0,35.0,28.0,38.0,21.0,...,50.0,65.0,50.0,82.0,73.0,61.0,48.0,36.0,DHS 2017-18,2018.0
4,BTN,Bhutan,SA,SA,Least Developed,8.0,7.0,9.0,10.0,3.0,...,39.0,44.0,27.0,55.0,53.0,42.0,27.0,24.0,MICS 2010,2010.0


In [110]:
# show the last 6 rows 
conc_df.tail(6)

Unnamed: 0,ISO3,Country,Region,Sub-Region,Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,...,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
196,AIA,Anguilla,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
197,VGB,British Virgin Islands,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
198,VAT,Holy See,ECA,WE,Not Classified,,,,,,...,,,,,,,,,,
199,MSR,Montserrat,LAC,LAC,Not Classified,,,,,,...,,,,,,,,,,
200,TKL,Tokelau,EAP,EAP,Not Classified,,,,,,...,,,,,,,,,,
201,TCA,Turks and Caicos Islands,LAC,LAC,Not Classified,2.0,1.0,4.0,0.0,2.0,...,8.0,0.0,11.0,39.0,3.0,21.0,16.0,1.0,MICS 2019-20,2020.0


In [114]:
conc_df.isnull().sum()

ISO3                     0
Country                  0
Region                   0
Sub-Region               0
Development Region       0
P_Total                  5
Femal_P                  5
Male_P                   5
P_residence_Rural        5
P_residence_Uraban       5
P_Wealth_Porrest         5
P_wealth_Second          5
P_wealth_Middle          5
P_wealth_Fourth          5
P_wealth_Richest         5
P_Data_Source           85
P_Time_Period            5
LS_Total                 5
Female_LS                5
Male_lS                  5
LS_residence_Rural       5
LS_residence_Uraban      5
LS_Wealth_Porrest        5
LS_wealth_Second         5
LS_wealth_Middle         5
LS_wealth_Fourth         5
LS_wealth_Richest        5
LS_Data_Source          87
LS_Time_Period           5
US_Total                 5
Female_US                5
Male_US                  5
USresidence_Rural        5
USresidence_Uraban       5
US_Wealth_Porrest        5
US_wealth_second         5
US_wealth_Middle         5
U

In [111]:
conc_df.shape

(202, 41)

In [112]:
# sort the dataframe
conc_df = conc_df.sort_values(by=['Country'], ignore_index = True )

In [113]:
conc_df.to_csv('OutofSchoolRate.csv', index = False)

In [120]:
conc_df.columns

Index(['ISO3', 'Country', 'Region ', 'Sub-Region', 'Development Region ',
       'P_Total', 'Femal_P', 'Male_P', 'P_residence_Rural',
       'P_residence_Uraban ', 'P_Wealth_Porrest', 'P_wealth_Second',
       'P_wealth_Middle', 'P_wealth_Fourth', 'P_wealth_Richest',
       'P_Data_Source', 'P_Time_Period', 'LS_Total', 'Female_LS', 'Male_lS',
       'LS_residence_Rural', 'LS_residence_Uraban ', 'LS_Wealth_Porrest',
       'LS_wealth_Second', 'LS_wealth_Middle', 'LS_wealth_Fourth',
       'LS_wealth_Richest', 'LS_Data_Source', 'LS_Time_Period', 'US_Total',
       'Female_US', 'Male_US', 'USresidence_Rural', 'USresidence_Uraban ',
       'US_Wealth_Porrest', 'US_wealth_second', 'US_wealth_Middle',
       'US_wealth_Fourth', 'US_wealth_Richest', 'US_Data_Source',
       'US_Time_Period'],
      dtype='object')

In [121]:
new_column_names = {'P_Total': 'OOR_Primary_Total',
                    'Femal_P':'OOR_Primary_Femal',
                    'Male_P':'OOR_Primary_Male',
                    'P_residence_Rural':'OOR_Primary_residence_Rural',
                    'P_residence_Uraban ':'OOR_Primary_residence_Uraban',
                    'P_Wealth_Porrest':'OOR_Primary_Wealth_Porrest',
                    'P_wealth_Second':'OOR_Primary_Wealth_Second',
                    'P_wealth_Middle':'OOR_Primary_Wealth_Middle',
                    'P_wealth_Fourth':'OOR_Primary_Wealth_Fourth',
                    'P_wealth_Richest':'OOR_Primary_Wealth_Richest',
                    'LS_Total':'OOR_LowerSecondary_Total',
                    'Female_LS':'OOR_LowerSecondary_Female',
                    'Male_lS':'OOR_LowerSecondary_Male',
                    'LS_residence_Rural':'OOR_LowerSecondary__residence_Rural',
                    'LS_residence_Uraban ':'OOR_LowerSecondary__residence_Uraban',
                    'LS_Wealth_Porrest':'OOR_LowerSecondary__Wealth_Porrest',
                    'LS_wealth_Second':'OOR_LowerSecondary__Wealth_Second',
                    'LS_wealth_Middle':'OOR_LowerSecondary__Wealth_Middle',
                    'LS_wealth_Fourth':'OOR_LowerSecondary__Wealth_Fourth',
                    'LS_wealth_Richest':'OOR_LowerSecondary__Wealth_Richest',
                    'US_Total':'OOR_UpperSecondary_Total',
                    'Female_US':'OOR_UpperSecondary_Female',
                    'Male_US':'OOR_UpperSecondary_Male',
                    'USresidence_Rural':'OOR_UpperSecondary_residence_Rural',
                    'USresidence_Uraban ':'OOR_UpperSecondary_residence_Urban',
                    'US_Wealth_Porrest':'OOR_UpperSecondary_Wealth_Porrest',
                    'US_wealth_second':'OOR_UpperSecondary_Wealth_Second',
                    'US_wealth_Middle':'OOR_UpperSecondary_Wealth_Middle',
                    'US_wealth_Fourth':'OOR_UpperSecondary_Wealth_Fourth',
                    'US_wealth_Richest':'OOR_UpperSecondary_Wealth_Richest'}
conc_df.rename(columns=new_column_names, inplace=True)

In [122]:
conc_df.columns

Index(['ISO3', 'Country', 'Region ', 'Sub-Region', 'Development Region ',
       'OOR_Primary_Total', 'OOR_Primary_Femal', 'OOR_Primary_Male',
       'OOR_Primary_residence_Rural', 'OOR_Primary_residence_Uraban',
       'OOR_Primary_Wealth_Porrest', 'OOR_Primary_Wealth_Second',
       'OOR_Primary_Wealth_Middle', 'OOR_Primary_Wealth_Fourth',
       'OOR_Primary_Wealth_Richest', 'P_Data_Source', 'P_Time_Period',
       'OOR_LowerSecondary_Total', 'OOR_LowerSecondary_Female',
       'OOR_LowerSecondary_Male', 'OOR_LowerSecondary__residence_Rural',
       'OOR_LowerSecondary__residence_Uraban',
       'OOR_LowerSecondary__Wealth_Porrest',
       'OOR_LowerSecondary__Wealth_Second',
       'OOR_LowerSecondary__Wealth_Middle',
       'OOR_LowerSecondary__Wealth_Fourth',
       'OOR_LowerSecondary__Wealth_Richest', 'LS_Data_Source',
       'LS_Time_Period', 'OOR_UpperSecondary_Total',
       'OOR_UpperSecondary_Female', 'OOR_UpperSecondary_Male',
       'OOR_UpperSecondary_residence_Rural',


In [123]:
OOR_DF = conc_df.copy()

In [124]:
#Dropping unnecessary Columns
columns_to_drop = ['P_Data_Source', 'P_Time_Period','LS_Data_Source','LS_Time_Period','US_Data_Source','US_Time_Period']
OOR_DF.drop(columns=columns_to_drop, inplace=True)

In [125]:
OOR_DF.columns

Index(['ISO3', 'Country', 'Region ', 'Sub-Region', 'Development Region ',
       'OOR_Primary_Total', 'OOR_Primary_Femal', 'OOR_Primary_Male',
       'OOR_Primary_residence_Rural', 'OOR_Primary_residence_Uraban',
       'OOR_Primary_Wealth_Porrest', 'OOR_Primary_Wealth_Second',
       'OOR_Primary_Wealth_Middle', 'OOR_Primary_Wealth_Fourth',
       'OOR_Primary_Wealth_Richest', 'OOR_LowerSecondary_Total',
       'OOR_LowerSecondary_Female', 'OOR_LowerSecondary_Male',
       'OOR_LowerSecondary__residence_Rural',
       'OOR_LowerSecondary__residence_Uraban',
       'OOR_LowerSecondary__Wealth_Porrest',
       'OOR_LowerSecondary__Wealth_Second',
       'OOR_LowerSecondary__Wealth_Middle',
       'OOR_LowerSecondary__Wealth_Fourth',
       'OOR_LowerSecondary__Wealth_Richest', 'OOR_UpperSecondary_Total',
       'OOR_UpperSecondary_Female', 'OOR_UpperSecondary_Male',
       'OOR_UpperSecondary_residence_Rural',
       'OOR_UpperSecondary_residence_Urban',
       'OOR_UpperSecondary_Wealt

In [126]:
OOR_DF.isnull().sum()

ISO3                                    0
Country                                 0
Region                                  0
Sub-Region                              0
Development Region                      0
OOR_Primary_Total                       5
OOR_Primary_Femal                       5
OOR_Primary_Male                        5
OOR_Primary_residence_Rural             5
OOR_Primary_residence_Uraban            5
OOR_Primary_Wealth_Porrest              5
OOR_Primary_Wealth_Second               5
OOR_Primary_Wealth_Middle               5
OOR_Primary_Wealth_Fourth               5
OOR_Primary_Wealth_Richest              5
OOR_LowerSecondary_Total                5
OOR_LowerSecondary_Female               5
OOR_LowerSecondary_Male                 5
OOR_LowerSecondary__residence_Rural     5
OOR_LowerSecondary__residence_Uraban    5
OOR_LowerSecondary__Wealth_Porrest      5
OOR_LowerSecondary__Wealth_Second       5
OOR_LowerSecondary__Wealth_Middle       5
OOR_LowerSecondary__Wealth_Fourth 

In [127]:
OOR_DF.to_csv('OutofSchoolRate_CleanedDF.csv', index = False)