# MIT COLLABORATION PROJECT


## Introduction
Out-of-school children rate (SDG4.1.4) – Percentage of children or young people in the official age range for a given level of education who are not attending either pre-primary, primary, secondary, or higher levels of education. 

#### Unit of measure:	Percentage
#### Time frame for survey	
Household survey data from the past 10 years are used for the calculation of adjusted net attendance rate. For countries with multiple years of data, the most recent dataset is used.


### Glossary - the database contains the following
**ISO**	Three-digit alphabetical codes International Standard ISO 3166-1 assigned by the International Organization for Standardization (ISO). The latest version is available online at http://www.iso.org/iso/home/standards/country_codes.htm. (column A)
**Countries and areas:**	The UNICEF Global databases contain a set of 202 countries and Kosovo under UNSC res. 1244* as reported on through the State of the World's Children Statistical Annex 2017 (column B)
	
**Data Source:**	Short name for data source, followed by the year(s) in which the data collection (e.g., survey interviews) took place (column P)
**Time period:**	Represents the year(s) in which the data collection (e.g. survey interviews) took place. (column Q)
	
**Region, Sub-region**	UNICEF regions (column C) and UNICEF Sub-regions (column D)
EAP	East Asia and the Pacific
ECA	Europe and Central Asia
EECA	Eastern Europe and Central Asia
ESA	Eastern and Southern Africa
LAC	Latin America and the Caribbean
MENA	Middle East and North Africa
NA	North America
SA	South Asia
SSA	Sub-Saharan Africa
WCA	West and Central Africa
**Development regions:**	Economies are currently divided into four income groupings: low, lower-middle, upper-middle, and high. Income is measured using **gross national income (GNI) per capita, in U.S. dollars**, converted from local currency using the World Bank Atlas method (column E).
	
**Regional Aggregations**	
Regional aggregates with less than 50% of the corresponding school-aged population coverage have been suppressed	
		
* All references to Kosovo in this dataset should be understood to be in the context of United Nations Security Council resolution 1244 (1999). 	


In [4]:
# Importing Necessary Libraries for project
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from matplotlib import style
import seaborn as sns

# Lower-Secondary Level 

In [42]:
# reading the data set
Lower_Sec_level = pd.read_csv("Lower secondary_level1.csv", encoding='ISO-8859-1')
Lower_Sec_level.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,LS_Development Region,LS_Total,Female_LS,Male_lS,LS_residence_Rural,LS_residence_Uraban,LS_Wealth_Porrest,LS_wealth_Second,LS_wealth_Middle,LS_wealth_Fourth,LS_wealth_Richest,LS_Data_Source,LS_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,40.0,56.0,25.0,46.0,23.0,50.0,48.0,49.0,33.0,21.0,DHS 2015,2015.0
1,ALB,Albania,ECA,EECA,More Developed,2.0,2.0,2.0,2.0,2.0,4.0,2.0,2.0,1.0,1.0,DHS 2017-18,2018.0
2,DZA,Algeria,MENA,MENA,Less Developed,4.0,4.0,4.0,6.0,3.0,9.0,4.0,3.0,2.0,1.0,MICS 2019,2020.0
3,AND,Andorra,ECA,WE,More Developed,,,,,,,,,,,,
4,AGO,Angola,SSA,ESA,Least Developed,15.0,17.0,14.0,28.0,10.0,31.0,26.0,16.0,6.0,4.0,DHS 2015-16,2016.0


In [43]:
Lower_Sec_level.dtypes

ISO3                       object
Country                    object
Region                     object
Sub-Region                 object
LS_Development Region      object
LS_Total                  float64
Female_LS                 float64
Male_lS                   float64
LS_residence_Rural        float64
LS_residence_Uraban       float64
LS_Wealth_Porrest         float64
LS_wealth_Second          float64
LS_wealth_Middle          float64
LS_wealth_Fourth          float64
LS_wealth_Richest         float64
LS_Data_Source             object
LS_Time_Period            float64
dtype: object

In [44]:
#checking for null/missing values
Lower_Sec_level.isnull().any().any()

True

In [46]:
Lower_Sec_level.isnull().any()

ISO3                      False
Country                   False
Region                     True
Sub-Region                 True
LS_Development Region      True
LS_Total                   True
Female_LS                  True
Male_lS                    True
LS_residence_Rural         True
LS_residence_Uraban        True
LS_Wealth_Porrest          True
LS_wealth_Second           True
LS_wealth_Middle           True
LS_wealth_Fourth           True
LS_wealth_Richest          True
LS_Data_Source             True
LS_Time_Period             True
dtype: bool

In [47]:
Lower_Sec_level.isnull().sum()

ISO3                       0
Country                    0
Region                     2
Sub-Region                 3
LS_Development Region      1
LS_Total                  87
Female_LS                 87
Male_lS                   87
LS_residence_Rural        90
LS_residence_Uraban       90
LS_Wealth_Porrest         96
LS_wealth_Second          96
LS_wealth_Middle          96
LS_wealth_Fourth          96
LS_wealth_Richest         96
LS_Data_Source            87
LS_Time_Period            87
dtype: int64

In [72]:
Lower_Sec_level.isnull().sum() / (len(Lower_Sec_level))*100

ISO3                       0.000000
Country                    0.000000
Region                     0.985222
Sub-Region                 1.477833
LS_Development Region      0.492611
LS_Total                  42.857143
Female_LS                 42.857143
Male_lS                   42.857143
LS_residence_Rural        44.334975
LS_residence_Uraban       44.334975
LS_Wealth_Porrest         47.290640
LS_wealth_Second          47.290640
LS_wealth_Middle          47.290640
LS_wealth_Fourth          47.290640
LS_wealth_Richest         47.290640
LS_Data_Source            42.857143
LS_Time_Period            42.857143
dtype: float64

In [49]:
Lower_Sec_level_clean = Lower_Sec_level.copy()

In [50]:
Lower_Sec_level.columns

Index(['ISO3', 'Country', 'Region ', 'Sub-Region', 'LS_Development Region ',
       'LS_Total', 'Female_LS', 'Male_lS', 'LS_residence_Rural',
       'LS_residence_Uraban ', 'LS_Wealth_Porrest', 'LS_wealth_Second',
       'LS_wealth_Middle', 'LS_wealth_Fourth', 'LS_wealth_Richest',
       'LS_Data_Source', 'LS_Time_Period'],
      dtype='object')

In [51]:
Lower_Sec_level["LS_Development Region "].unique()

array(['Least Developed', 'More Developed', 'Less Developed',
       'Not Classified', nan], dtype=object)

In [52]:

developing_countries = Lower_Sec_level[Lower_Sec_level['LS_Development Region '].isin(['Least Developed', 'Less Developed'])]

In [53]:
len(developing_countries)

146

In [54]:
developing_countries1=  Lower_Sec_level[Lower_Sec_level['LS_Development Region '] == 'Least Developed']


In [55]:
len(developing_countries1)

47

In [56]:
developing_countries1.isnull().any()

ISO3                      False
Country                   False
Region                    False
Sub-Region                False
LS_Development Region     False
LS_Total                   True
Female_LS                  True
Male_lS                    True
LS_residence_Rural         True
LS_residence_Uraban        True
LS_Wealth_Porrest          True
LS_wealth_Second           True
LS_wealth_Middle           True
LS_wealth_Fourth           True
LS_wealth_Richest          True
LS_Data_Source             True
LS_Time_Period             True
dtype: bool

In [57]:
developing_countries1.isnull().sum()

ISO3                      0
Country                   0
Region                    0
Sub-Region                0
LS_Development Region     0
LS_Total                  5
Female_LS                 5
Male_lS                   5
LS_residence_Rural        5
LS_residence_Uraban       5
LS_Wealth_Porrest         5
LS_wealth_Second          5
LS_wealth_Middle          5
LS_wealth_Fourth          5
LS_wealth_Richest         5
LS_Data_Source            5
LS_Time_Period            5
dtype: int64

In [58]:
developing_countries1["Country"].unique()

array(['Afghanistan', 'Angola', 'Bangladesh', 'Benin', 'Bhutan',
       'Burkina Faso', 'Burundi', 'Cambodia', 'Central African Republic',
       'Chad', 'Comoros', 'Democratic Republic of the Congo', 'Djibouti',
       'Eritrea', 'Ethiopia', 'Gambia', 'Guinea', 'Guinea-Bissau',
       'Haiti', 'Kiribati', "Lao People's Democratic Republic", 'Lesotho',
       'Liberia', 'Madagascar', 'Malawi', 'Mali', 'Mauritania',
       'Mozambique', 'Myanmar', 'Nepal', 'Niger', 'Rwanda',
       'Sao Tome and Principe', 'Senegal', 'Sierra Leone',
       'Solomon Islands', 'Somalia', 'South Sudan', 'Sudan',
       'Timor-Leste', 'Togo', 'Tuvalu', 'Uganda',
       'United Republic of Tanzania', 'Vanuatu', 'Yemen', 'Zambia'],
      dtype=object)

In [59]:
developing_countries.isnull().sum()

ISO3                       0
Country                    0
Region                     0
Sub-Region                 0
LS_Development Region      0
LS_Total                  41
Female_LS                 41
Male_lS                   41
LS_residence_Rural        43
LS_residence_Uraban       43
LS_Wealth_Porrest         49
LS_wealth_Second          49
LS_wealth_Middle          49
LS_wealth_Fourth          49
LS_wealth_Richest         49
LS_Data_Source            41
LS_Time_Period            41
dtype: int64

In [60]:
developing_countries["Country"].unique()

array(['Afghanistan', 'Algeria', 'Angola', 'Antigua and Barbuda',
       'Argentina', 'Armenia', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belize', 'Benin', 'Bhutan',
       'Bolivia (Plurinational State of)', 'Botswana', 'Brazil',
       'Brunei Darussalam', 'Burkina Faso', 'Burundi', 'Cabo Verde',
       'Cambodia', 'Cameroon', 'Central African Republic', 'Chad',
       'Chile', 'China', 'Colombia', 'Comoros', 'Congo', 'Cook Islands',
       'Costa Rica', 'Cuba', "Côte d'Ivoire",
       "Democratic People's Republic of Korea",
       'Democratic Republic of the Congo', 'Djibouti', 'Dominica',
       'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
       'Equatorial Guinea', 'Eritrea', 'Eswatini', 'Ethiopia', 'Fiji',
       'Gabon', 'Gambia', 'Georgia', 'Ghana', 'Grenada', 'Guatemala',
       'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'India',
       'Indonesia', 'Iran (Islamic Republic of)', 'Iraq', 'Jamaica',
       'Jordan', 'Kaza

In [61]:
# Assuming 'prim_level' is your DataFrame
rows_with_missing_values = developing_countries1[developing_countries1.isnull().any(axis=1)]

# Display the rows with missing values
rows_with_missing_values


Unnamed: 0,ISO3,Country,Region,Sub-Region,LS_Development Region,LS_Total,Female_LS,Male_lS,LS_residence_Rural,LS_residence_Uraban,LS_Wealth_Porrest,LS_wealth_Second,LS_wealth_Middle,LS_wealth_Fourth,LS_wealth_Richest,LS_Data_Source,LS_Time_Period
51,DJI,Djibouti,SSA,ESA,Least Developed,,,,,,,,,,,,
58,ERI,Eritrea,SSA,ESA,Least Developed,,,,,,,,,,,,
165,SLB,Solomon Islands,EAP,EAP,Least Developed,,,,,,,,,,,,
166,SOM,Somalia,SSA,ESA,Least Developed,,,,,,,,,,,,
197,VUT,Vanuatu,EAP,EAP,Least Developed,,,,,,,,,,,,


In [62]:
developing_countries_without_na = developing_countries1.dropna()

In [63]:
developing_countries_without_na.isnull().any().any()

False

In [64]:
developing_countries_without_na["Country"].unique()

array(['Afghanistan', 'Angola', 'Bangladesh', 'Benin', 'Bhutan',
       'Burkina Faso', 'Burundi', 'Cambodia', 'Central African Republic',
       'Chad', 'Comoros', 'Democratic Republic of the Congo', 'Ethiopia',
       'Gambia', 'Guinea', 'Guinea-Bissau', 'Haiti', 'Kiribati',
       "Lao People's Democratic Republic", 'Lesotho', 'Liberia',
       'Madagascar', 'Malawi', 'Mali', 'Mauritania', 'Mozambique',
       'Myanmar', 'Nepal', 'Niger', 'Rwanda', 'Sao Tome and Principe',
       'Senegal', 'Sierra Leone', 'South Sudan', 'Sudan', 'Timor-Leste',
       'Togo', 'Tuvalu', 'Uganda', 'United Republic of Tanzania', 'Yemen',
       'Zambia'], dtype=object)

In [110]:
developing_countries_without_na_reset = developing_countries_without_na.reset_index(drop=True)

In [111]:
developing_countries_without_na_reset

Unnamed: 0,ISO3,Country,Region,Sub-Region,LS_Development Region,LS_Total,Female_LS,Male_lS,LS_residence_Rural,LS_residence_Uraban,LS_Wealth_Porrest,LS_wealth_Second,LS_wealth_Middle,LS_wealth_Fourth,LS_wealth_Richest,LS_Data_Source,LS_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,40.0,56.0,25.0,46.0,23.0,50.0,48.0,49.0,33.0,21.0,DHS 2015,2015.0
1,AGO,Angola,SSA,ESA,Least Developed,15.0,17.0,14.0,28.0,10.0,31.0,26.0,16.0,6.0,4.0,DHS 2015-16,2016.0
2,BGD,Bangladesh,SA,SA,Least Developed,13.0,8.0,18.0,13.0,12.0,19.0,15.0,11.0,11.0,7.0,MICS 2019,2019.0
3,BEN,Benin,SSA,WCA,Least Developed,41.0,47.0,36.0,48.0,31.0,65.0,52.0,42.0,31.0,18.0,DHS 2017-18,2018.0
4,BTN,Bhutan,SA,SA,Least Developed,19.0,18.0,20.0,23.0,10.0,35.0,25.0,18.0,10.0,11.0,MICS 2010,2010.0
5,BFA,Burkina Faso,SSA,WCA,Least Developed,57.0,59.0,56.0,65.0,31.0,77.0,67.0,62.0,49.0,32.0,DHS 2010,2010.0
6,BDI,Burundi,SSA,ESA,Least Developed,29.0,28.0,31.0,31.0,20.0,49.0,35.0,26.0,20.0,22.0,DHS 2016-17,2017.0
7,KHM,Cambodia,EAP,EAP,Least Developed,25.0,25.0,26.0,26.0,20.0,40.0,29.0,24.0,17.0,15.0,DHS 2014,2014.0
8,CAF,Central African Republic,SSA,WCA,Least Developed,24.0,30.0,18.0,33.0,11.0,40.0,35.0,28.0,16.0,9.0,MICS 2018-19,2019.0
9,TCD,Chad,SSA,WCA,Least Developed,53.0,58.0,47.0,57.0,33.0,77.0,70.0,57.0,36.0,27.0,MICS 2019,2019.0


In [113]:
developing_countries_without_na_reset.to_csv('LowerSecodndaryCleanedDevelopingCountriesOnly.csv')

## Primary_Level 

In [66]:
# reading the data set
prim_level = pd.read_csv("Primary level.csv", encoding='ISO-8859-1')
prim_level.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,P_Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,P_Wealth_Porrest,P_wealth_Second,P_wealth_Middle,P_wealth_Fourth,P_wealth_Richest,P_Data_Source,P_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,37.0,47.0,28.0,42.0,19.0,42.0,47.0,46.0,32.0,16.0,DHS 2015,2015.0
1,ALB,Albania,ECA,EECA,More Developed,2.0,2.0,3.0,4.0,1.0,4.0,3.0,2.0,2.0,1.0,DHS 2017-18,2018.0
2,DZA,Algeria,MENA,MENA,Less Developed,2.0,2.0,2.0,2.0,1.0,4.0,1.0,2.0,1.0,1.0,MICS 2019,2020.0
3,AND,Andorra,ECA,WE,More Developed,,,,,,,,,,,,
4,AGO,Angola,SSA,ESA,Least Developed,22.0,22.0,21.0,35.0,14.0,39.0,33.0,19.0,12.0,5.0,DHS 2015-16,2016.0


In [67]:
prim_level.dtypes

ISO3                      object
Country                   object
Region                    object
Sub-Region                object
P_Development Region      object
P_Total                  float64
Femal_P                  float64
Male_P                   float64
P_residence_Rural        float64
P_residence_Uraban       float64
P_Wealth_Porrest         float64
P_wealth_Second          float64
P_wealth_Middle          float64
P_wealth_Fourth          float64
P_wealth_Richest         float64
P_Data_Source             object
P_Time_Period            float64
dtype: object

In [68]:
prim_level.isnull().any().any()

True

In [69]:
prim_level.isnull().any()

ISO3                     False
Country                  False
Region                    True
Sub-Region                True
P_Development Region      True
P_Total                   True
Femal_P                   True
Male_P                    True
P_residence_Rural         True
P_residence_Uraban        True
P_Wealth_Porrest          True
P_wealth_Second           True
P_wealth_Middle           True
P_wealth_Fourth           True
P_wealth_Richest          True
P_Data_Source             True
P_Time_Period             True
dtype: bool

In [70]:
prim_level.isnull().sum()

ISO3                      0
Country                   0
Region                    2
Sub-Region                3
P_Development Region      1
P_Total                  85
Femal_P                  85
Male_P                   85
P_residence_Rural        89
P_residence_Uraban       89
P_Wealth_Porrest         95
P_wealth_Second          95
P_wealth_Middle          95
P_wealth_Fourth          95
P_wealth_Richest         95
P_Data_Source            85
P_Time_Period            85
dtype: int64

In [71]:
prim_level.isnull().sum() / (len(prim_level))*100

ISO3                      0.000000
Country                   0.000000
Region                    0.985222
Sub-Region                1.477833
P_Development Region      0.492611
P_Total                  41.871921
Femal_P                  41.871921
Male_P                   41.871921
P_residence_Rural        43.842365
P_residence_Uraban       43.842365
P_Wealth_Porrest         46.798030
P_wealth_Second          46.798030
P_wealth_Middle          46.798030
P_wealth_Fourth          46.798030
P_wealth_Richest         46.798030
P_Data_Source            41.871921
P_Time_Period            41.871921
dtype: float64

In [73]:
developing_countriesP= prim_level[prim_level['P_Development Region '] == 'Least Developed']

In [77]:
developing_countriesP.isnull().any().any()

True

In [78]:
developing_countriesP.isnull().sum()

ISO3                     0
Country                  0
Region                   0
Sub-Region               0
P_Development Region     0
P_Total                  5
Femal_P                  5
Male_P                   5
P_residence_Rural        5
P_residence_Uraban       5
P_Wealth_Porrest         5
P_wealth_Second          5
P_wealth_Middle          5
P_wealth_Fourth          5
P_wealth_Richest         5
P_Data_Source            5
P_Time_Period            5
dtype: int64

In [79]:
rows_with_missing_valuesP = developing_countriesP[developing_countriesP.isnull().any(axis=1)]

# Display the rows with missing values
rows_with_missing_valuesP

Unnamed: 0,ISO3,Country,Region,Sub-Region,P_Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,P_Wealth_Porrest,P_wealth_Second,P_wealth_Middle,P_wealth_Fourth,P_wealth_Richest,P_Data_Source,P_Time_Period
51,DJI,Djibouti,SSA,ESA,Least Developed,,,,,,,,,,,,
58,ERI,Eritrea,SSA,ESA,Least Developed,,,,,,,,,,,,
165,SLB,Solomon Islands,EAP,EAP,Least Developed,,,,,,,,,,,,
166,SOM,Somalia,SSA,ESA,Least Developed,,,,,,,,,,,,
197,VUT,Vanuatu,EAP,EAP,Least Developed,,,,,,,,,,,,


In [80]:
developing_countriesP_without_na = developing_countriesP.dropna()

In [81]:
developing_countriesP_without_na.isnull().any().any()

False

In [82]:
developing_countriesP_without_na.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,P_Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,P_Wealth_Porrest,P_wealth_Second,P_wealth_Middle,P_wealth_Fourth,P_wealth_Richest,P_Data_Source,P_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,37.0,47.0,28.0,42.0,19.0,42.0,47.0,46.0,32.0,16.0,DHS 2015,2015.0
4,AGO,Angola,SSA,ESA,Least Developed,22.0,22.0,21.0,35.0,14.0,39.0,33.0,19.0,12.0,5.0,DHS 2015-16,2016.0
14,BGD,Bangladesh,SA,SA,Least Developed,6.0,5.0,8.0,6.0,6.0,9.0,7.0,5.0,6.0,4.0,MICS 2019,2019.0
19,BEN,Benin,SSA,WCA,Least Developed,32.0,35.0,28.0,38.0,21.0,59.0,39.0,30.0,15.0,7.0,DHS 2017-18,2018.0
20,BTN,Bhutan,SA,SA,Least Developed,8.0,7.0,9.0,10.0,3.0,15.0,11.0,6.0,4.0,3.0,MICS 2010,2010.0


In [114]:
developing_countriesP_without_na_reset = developing_countriesP_without_na.reset_index(drop=True)

In [116]:
developing_countriesP_without_na_reset.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,P_Development Region,P_Total,Femal_P,Male_P,P_residence_Rural,P_residence_Uraban,P_Wealth_Porrest,P_wealth_Second,P_wealth_Middle,P_wealth_Fourth,P_wealth_Richest,P_Data_Source,P_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,37.0,47.0,28.0,42.0,19.0,42.0,47.0,46.0,32.0,16.0,DHS 2015,2015.0
1,AGO,Angola,SSA,ESA,Least Developed,22.0,22.0,21.0,35.0,14.0,39.0,33.0,19.0,12.0,5.0,DHS 2015-16,2016.0
2,BGD,Bangladesh,SA,SA,Least Developed,6.0,5.0,8.0,6.0,6.0,9.0,7.0,5.0,6.0,4.0,MICS 2019,2019.0
3,BEN,Benin,SSA,WCA,Least Developed,32.0,35.0,28.0,38.0,21.0,59.0,39.0,30.0,15.0,7.0,DHS 2017-18,2018.0
4,BTN,Bhutan,SA,SA,Least Developed,8.0,7.0,9.0,10.0,3.0,15.0,11.0,6.0,4.0,3.0,MICS 2010,2010.0


In [117]:
developing_countriesP_without_na_reset.to_csv('PrimaryCleanedDevelopingCountriesOnly.csv')

# Upper Secondary Level

In [84]:
# reading the data set
UpSec_level = pd.read_csv("Upper secondary_level.csv", encoding='ISO-8859-1')
UpSec_level.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,US_Development Region,US_Total,Female_US,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,58.0,73.0,43.0,63.0,45.0,70.0,64.0,64.0,54.0,40.0,DHS 2015,2015.0
1,ALB,Albania,ECA,EECA,More Developed,12.0,12.0,12.0,16.0,9.0,27.0,11.0,11.0,5.0,5.0,DHS 2017-18,2018.0
2,DZA,Algeria,MENA,MENA,Less Developed,23.0,18.0,27.0,27.0,20.0,38.0,27.0,22.0,17.0,10.0,MICS 2019,2020.0
3,AND,Andorra,ECA,WE,More Developed,,,,,,,,,,,,
4,AGO,Angola,SSA,ESA,Least Developed,29.0,35.0,21.0,53.0,19.0,58.0,50.0,27.0,17.0,9.0,DHS 2015-16,2016.0


In [87]:
UpSec_level.shape

(203, 17)

In [89]:
UpSec_level.dtypes

ISO3                       object
Country                    object
Region                     object
Sub-Region                 object
US_Development Region      object
US_Total                  float64
Female_US                 float64
Male_US                   float64
USresidence_Rural         float64
USresidence_Uraban        float64
US_Wealth_Porrest         float64
US_wealth_second          float64
US_wealth_Middle          float64
US_wealth_Fourth          float64
US_wealth_Richest         float64
US_Data_Source             object
US_Time_Period            float64
dtype: object

In [90]:
UpSec_level.isnull().any().any()

True

In [91]:
UpSec_level.isnull().sum()

ISO3                       0
Country                    0
Region                     2
Sub-Region                 3
US_Development Region      1
US_Total                  89
Female_US                 89
Male_US                   89
USresidence_Rural         91
USresidence_Uraban        91
US_Wealth_Porrest         97
US_wealth_second          97
US_wealth_Middle          97
US_wealth_Fourth          97
US_wealth_Richest         97
US_Data_Source            89
US_Time_Period            89
dtype: int64

In [93]:
UpSec_level["US_Development Region "].unique()

array(['Least Developed', 'More Developed', 'Less Developed',
       'Not Classified', nan], dtype=object)

In [92]:
UpSec_level.isnull().sum() / (len(prim_level))*100

ISO3                       0.000000
Country                    0.000000
Region                     0.985222
Sub-Region                 1.477833
US_Development Region      0.492611
US_Total                  43.842365
Female_US                 43.842365
Male_US                   43.842365
USresidence_Rural         44.827586
USresidence_Uraban        44.827586
US_Wealth_Porrest         47.783251
US_wealth_second          47.783251
US_wealth_Middle          47.783251
US_wealth_Fourth          47.783251
US_wealth_Richest         47.783251
US_Data_Source            43.842365
US_Time_Period            43.842365
dtype: float64

In [94]:
developing_countriesUpSec= UpSec_level[UpSec_level['US_Development Region '] == 'Least Developed']

In [97]:
developing_countriesUpSec["Country"].unique()

array(['Afghanistan', 'Angola', 'Bangladesh', 'Benin', 'Bhutan',
       'Burkina Faso', 'Burundi', 'Cambodia', 'Central African Republic',
       'Chad', 'Comoros', 'Democratic Republic of the Congo', 'Djibouti',
       'Eritrea', 'Ethiopia', 'Gambia', 'Guinea', 'Guinea-Bissau',
       'Haiti', 'Kiribati', "Lao People's Democratic Republic", 'Lesotho',
       'Liberia', 'Madagascar', 'Malawi', 'Mali', 'Mauritania',
       'Mozambique', 'Myanmar', 'Nepal', 'Niger', 'Rwanda',
       'Sao Tome and Principe', 'Senegal', 'Sierra Leone',
       'Solomon Islands', 'Somalia', 'South Sudan', 'Sudan',
       'Timor-Leste', 'Togo', 'Tuvalu', 'Uganda',
       'United Republic of Tanzania', 'Vanuatu', 'Yemen', 'Zambia'],
      dtype=object)

In [98]:
developing_countriesUpSec.shape

(47, 17)

In [100]:
developing_countriesUpSec.isnull().any().any()

True

In [101]:
developing_countriesUpSec.isnull().sum()

ISO3                      0
Country                   0
Region                    0
Sub-Region                0
US_Development Region     0
US_Total                  5
Female_US                 5
Male_US                   5
USresidence_Rural         5
USresidence_Uraban        5
US_Wealth_Porrest         5
US_wealth_second          5
US_wealth_Middle          5
US_wealth_Fourth          5
US_wealth_Richest         5
US_Data_Source            5
US_Time_Period            5
dtype: int64

In [102]:
rows_with_missing_valuesUpSex = developing_countriesUpSec[developing_countriesUpSec.isnull().any(axis=1)]

# Display the rows with missing values
rows_with_missing_valuesUpSex

Unnamed: 0,ISO3,Country,Region,Sub-Region,US_Development Region,US_Total,Female_US,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
51,DJI,Djibouti,SSA,ESA,Least Developed,,,,,,,,,,,,
58,ERI,Eritrea,SSA,ESA,Least Developed,,,,,,,,,,,,
165,SLB,Solomon Islands,EAP,EAP,Least Developed,,,,,,,,,,,,
166,SOM,Somalia,SSA,ESA,Least Developed,,,,,,,,,,,,
197,VUT,Vanuatu,EAP,EAP,Least Developed,,,,,,,,,,,,


In [103]:
developing_countriesUpSec_without_na = developing_countriesUpSec.dropna()

In [104]:
developing_countriesUpSec_without_na.isnull().any().any()

False

In [106]:
developing_countriesUpSec_without_na.shape

(42, 17)

In [118]:
developing_countriesUpSec_without_na_reset = developing_countriesUpSec_without_na.reset_index(drop=True)

In [119]:
developing_countriesUpSec_without_na_reset.head()

Unnamed: 0,ISO3,Country,Region,Sub-Region,US_Development Region,US_Total,Female_US,Male_US,USresidence_Rural,USresidence_Uraban,US_Wealth_Porrest,US_wealth_second,US_wealth_Middle,US_wealth_Fourth,US_wealth_Richest,US_Data_Source,US_Time_Period
0,AFG,Afghanistan,SA,SA,Least Developed,58.0,73.0,43.0,63.0,45.0,70.0,64.0,64.0,54.0,40.0,DHS 2015,2015.0
1,AGO,Angola,SSA,ESA,Least Developed,29.0,35.0,21.0,53.0,19.0,58.0,50.0,27.0,17.0,9.0,DHS 2015-16,2016.0
2,BGD,Bangladesh,SA,SA,Least Developed,31.0,26.0,37.0,32.0,30.0,45.0,35.0,29.0,28.0,19.0,MICS 2019,2019.0
3,BEN,Benin,SSA,WCA,Least Developed,58.0,66.0,50.0,65.0,50.0,82.0,73.0,61.0,48.0,36.0,DHS 2017-18,2018.0
4,BTN,Bhutan,SA,SA,Least Developed,39.0,40.0,39.0,44.0,27.0,55.0,53.0,42.0,27.0,24.0,MICS 2010,2010.0


In [120]:
developing_countriesUpSec_without_na_reset.to_csv('UperSecondaryCleanedDevelopingCountriesOnly.csv')