# <b> EDA on Entire World Education Dataset</b>

### <b>About Dataset</b>

This carefully selected collection of information provides a wide-ranging look at education worldwide. It gives important insights into how education varies in different countries and regions. The dataset covers various aspects of education, such as the number of students not attending school, the percentage of students completing their education, how well students are performing, literacy rates, birth rates, and the number of students in primary and higher education. This dataset is a valuable resource for researchers, teachers, and policymakers. It helps them understand and improve education systems globally.

### <b>Variables:</b>

- **Countries and Areas:** Name of the countries and areas.

- **Latitude:** Latitude coordinates of the geographical location.

- **Longitude:** Longitude coordinates of the geographical location.

- **OOSR_Pre0Primary_Age_Male:** Out-of-school rate for pre-primary age males.

- **OOSR_Pre0Primary_Age_Female:** Out-of-school rate for pre-primary age females.

- **OOSR_Primary_Age_Male:** Out-of-school rate for primary age males.

- **OOSR_Primary_Age_Female:** Out-of-school rate for primary age females.

- **OOSR_Lower_Secondary_Age_Male:** Out-of-school rate for lower secondary age males.

- **OOSR_Lower_Secondary_Age_Female:** Out-of-school rate for lower secondary age females.

- **OOSR_Upper_Secondary_Age_Male:** Out-of-school rate for upper secondary age males.

- **OOSR_Upper_Secondary_Age_Female:** Out-of-school rate for upper secondary age females.

- **Completion_Rate_Primary_Male:** Completion rate for primary education among males.

- **Completion_Rate_Primary_Female:** Completion rate for primary education among females.

- **Completion_Rate_Lower_Secondary_Male:** Completion rate for lower secondary education among males.

- **Completion_Rate_Lower_Secondary_Female:** Completion rate for lower secondary education among females.

- **Completion_Rate_Upper_Secondary_Male:** Completion rate for upper secondary education among males.

- **Completion_Rate_Upper_Secondary_Female:** Completion rate for upper secondary education among females.

- **Grade_2_3_Proficiency_Reading:** Proficiency in reading for grade 2-3 students.

- **Grade_2_3_Proficiency_Math:** Proficiency in math for grade 2-3 students.

- **Primary_End_Proficiency_Reading:** Proficiency in reading at the end of primary education.

- **Primary_End_Proficiency_Math:** Proficiency in math at the end of primary education.

- **Lower_Secondary_End_Proficiency_Reading:** Proficiency in reading at the end of lower secondary education.

- **Lower_Secondary_End_Proficiency_Math:** Proficiency in math at the end of lower secondary education.

- **Youth_15_24_Literacy_Rate_Male:** Literacy rate among male youths aged 15-24.
- **Youth_15_24_Literacy_Rate_Female:** Literacy rate among female youths aged 15-24.
- **Birth_Rate:** Birth rate in the respective countries/areas.
- **Gross_Primary_Education_Enrollment:** Gross enrollment in primary education.
- **Gross_Tertiary_Education_Enrollment:** Gross enrollment in tertiary education.
- **Unemployment_Rate:** Unemployment rate in the respective countries/areas.

# Importing Libraries

In [34]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Loading dataset

In [35]:
world_education_data = pd.read_csv('./dataset/global_education_data.csv')

# Data identification and cleaning

In [36]:
world_education_data

Unnamed: 0,Countries and areas,Latitude,Longitude,OOSR_Pre0Primary_Age_Male,OOSR_Pre0Primary_Age_Female,OOSR_Primary_Age_Male,OOSR_Primary_Age_Female,OOSR_Lower_Secondary_Age_Male,OOSR_Lower_Secondary_Age_Female,OOSR_Upper_Secondary_Age_Male,...,Primary_End_Proficiency_Reading,Primary_End_Proficiency_Math,Lower_Secondary_End_Proficiency_Reading,Lower_Secondary_End_Proficiency_Math,Youth_15_24_Literacy_Rate_Male,Youth_15_24_Literacy_Rate_Female,Birth_Rate,Gross_Primary_Education_Enrollment,Gross_Tertiary_Education_Enrollment,Unemployment_Rate
0,Afghanistan,33.939110,67.709953,0,0,0,0,0,0,44,...,13,11,0,0,74,56,32.49,104.0,9.7,11.12
1,Albania,41.153332,20.168331,4,2,6,3,6,1,21,...,0,0,48,58,99,100,11.78,107.0,55.0,12.33
2,Algeria,28.033886,1.659626,0,0,0,0,0,0,0,...,0,0,21,19,98,97,24.28,109.9,51.4,11.70
3,Andorra,42.506285,1.521801,0,0,0,0,0,0,0,...,0,0,0,0,0,0,7.20,106.4,0.0,0.00
4,Angola,11.202692,17.873887,31,39,0,0,0,0,0,...,0,0,0,0,0,0,40.73,113.5,9.3,6.89
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197,Venezuela,6.423750,66.589730,14,14,10,10,15,13,28,...,0,0,0,0,0,0,17.88,97.2,79.3,8.80
198,Vietnam,14.058324,108.277199,0,0,0,0,0,0,0,...,55,51,86,81,98,98,16.75,110.6,28.5,2.01
199,Yemen,15.552727,48.516388,96,96,10,21,23,34,46,...,0,0,0,0,0,0,30.45,93.6,10.2,12.91
200,Zambia,13.133897,27.849332,0,0,17,13,0,0,0,...,0,0,5,2,93,92,36.19,98.7,4.1,11.43


In [37]:
world_education_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 202 entries, 0 to 201
Data columns (total 29 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   Countries and areas                      202 non-null    object 
 1   Latitude                                 202 non-null    float64
 2   Longitude                                202 non-null    float64
 3   OOSR_Pre0Primary_Age_Male                202 non-null    int64  
 4   OOSR_Pre0Primary_Age_Female              202 non-null    int64  
 5   OOSR_Primary_Age_Male                    202 non-null    int64  
 6   OOSR_Primary_Age_Female                  202 non-null    int64  
 7   OOSR_Lower_Secondary_Age_Male            202 non-null    int64  
 8   OOSR_Lower_Secondary_Age_Female          202 non-null    int64  
 9   OOSR_Upper_Secondary_Age_Male            202 non-null    int64  
 10  OOSR_Upper_Secondary_Age_Female          202 non-n

In [38]:
# Renaming all columns in lower case letters
for col in world_education_data.columns:
    world_education_data.rename(columns={col : col.lower()}, inplace=True)
    
world_education_data.head()

Unnamed: 0,countries and areas,latitude,longitude,oosr_pre0primary_age_male,oosr_pre0primary_age_female,oosr_primary_age_male,oosr_primary_age_female,oosr_lower_secondary_age_male,oosr_lower_secondary_age_female,oosr_upper_secondary_age_male,...,primary_end_proficiency_reading,primary_end_proficiency_math,lower_secondary_end_proficiency_reading,lower_secondary_end_proficiency_math,youth_15_24_literacy_rate_male,youth_15_24_literacy_rate_female,birth_rate,gross_primary_education_enrollment,gross_tertiary_education_enrollment,unemployment_rate
0,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,44,...,13,11,0,0,74,56,32.49,104.0,9.7,11.12
1,Albania,41.153332,20.168331,4,2,6,3,6,1,21,...,0,0,48,58,99,100,11.78,107.0,55.0,12.33
2,Algeria,28.033886,1.659626,0,0,0,0,0,0,0,...,0,0,21,19,98,97,24.28,109.9,51.4,11.7
3,Andorra,42.506285,1.521801,0,0,0,0,0,0,0,...,0,0,0,0,0,0,7.2,106.4,0.0,0.0
4,Angola,11.202692,17.873887,31,39,0,0,0,0,0,...,0,0,0,0,0,0,40.73,113.5,9.3,6.89


In [39]:
# Renaming country name columns

world_education_data.rename(columns={'countries and areas' : 'country_name'}, inplace=True)

In [40]:
world_education_data.columns

Index(['country_name', 'latitude ', 'longitude', 'oosr_pre0primary_age_male',
       'oosr_pre0primary_age_female', 'oosr_primary_age_male',
       'oosr_primary_age_female', 'oosr_lower_secondary_age_male',
       'oosr_lower_secondary_age_female', 'oosr_upper_secondary_age_male',
       'oosr_upper_secondary_age_female', 'completion_rate_primary_male',
       'completion_rate_primary_female',
       'completion_rate_lower_secondary_male',
       'completion_rate_lower_secondary_female',
       'completion_rate_upper_secondary_male',
       'completion_rate_upper_secondary_female',
       'grade_2_3_proficiency_reading', 'grade_2_3_proficiency_math',
       'primary_end_proficiency_reading', 'primary_end_proficiency_math',
       'lower_secondary_end_proficiency_reading',
       'lower_secondary_end_proficiency_math',
       'youth_15_24_literacy_rate_male', 'youth_15_24_literacy_rate_female',
       'birth_rate', 'gross_primary_education_enrollment',
       'gross_tertiary_education_

In [41]:
world_education_data.isna().sum()

country_name                               0
latitude                                   0
longitude                                  0
oosr_pre0primary_age_male                  0
oosr_pre0primary_age_female                0
oosr_primary_age_male                      0
oosr_primary_age_female                    0
oosr_lower_secondary_age_male              0
oosr_lower_secondary_age_female            0
oosr_upper_secondary_age_male              0
oosr_upper_secondary_age_female            0
completion_rate_primary_male               0
completion_rate_primary_female             0
completion_rate_lower_secondary_male       0
completion_rate_lower_secondary_female     0
completion_rate_upper_secondary_male       0
completion_rate_upper_secondary_female     0
grade_2_3_proficiency_reading              0
grade_2_3_proficiency_math                 0
primary_end_proficiency_reading            0
primary_end_proficiency_math               0
lower_secondary_end_proficiency_reading    0
lower_seco

In [42]:
world_education_data.describe()

Unnamed: 0,latitude,longitude,oosr_pre0primary_age_male,oosr_pre0primary_age_female,oosr_primary_age_male,oosr_primary_age_female,oosr_lower_secondary_age_male,oosr_lower_secondary_age_female,oosr_upper_secondary_age_male,oosr_upper_secondary_age_female,...,primary_end_proficiency_reading,primary_end_proficiency_math,lower_secondary_end_proficiency_reading,lower_secondary_end_proficiency_math,youth_15_24_literacy_rate_male,youth_15_24_literacy_rate_female,birth_rate,gross_primary_education_enrollment,gross_tertiary_education_enrollment,unemployment_rate
count,202.0,202.0,202.0,202.0,202.0,202.0,202.0,202.0,202.0,202.0,...,202.0,202.0,202.0,202.0,202.0,202.0,202.0,202.0,202.0,202.0
mean,25.081422,55.166928,19.658416,19.282178,5.282178,5.569307,8.707921,8.831683,20.292079,19.975248,...,10.717822,10.376238,25.787129,24.450495,35.80198,35.084158,18.91401,94.942574,34.392574,6.0
std,16.813639,45.976287,25.007604,25.171147,9.396442,10.383092,13.258203,14.724717,21.485592,23.140376,...,24.866101,22.484423,33.181384,31.965467,45.535186,45.249643,10.828184,29.769338,29.978206,5.273136
min,0.023559,0.824782,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,11.685062,18.665678,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,10.355,97.2,9.0,2.3025
50%,21.207861,43.518091,9.0,7.0,1.0,1.0,2.0,2.0,15.0,12.0,...,0.0,0.0,0.0,0.0,0.0,0.0,17.55,101.85,24.85,4.585
75%,39.901792,77.684945,31.0,30.0,6.0,6.75,12.75,10.75,32.75,30.0,...,0.0,0.0,56.75,50.75,94.0,96.75,27.6925,107.3,59.975,8.655
max,64.963051,178.065032,96.0,96.0,58.0,67.0,61.0,70.0,84.0,89.0,...,99.0,89.0,89.0,94.0,100.0,100.0,46.08,142.5,136.6,28.18


In [43]:
world_education_data[world_education_data['country_name'] == "Pakistan"]

Unnamed: 0,country_name,latitude,longitude,oosr_pre0primary_age_male,oosr_pre0primary_age_female,oosr_primary_age_male,oosr_primary_age_female,oosr_lower_secondary_age_male,oosr_lower_secondary_age_female,oosr_upper_secondary_age_male,...,primary_end_proficiency_reading,primary_end_proficiency_math,lower_secondary_end_proficiency_reading,lower_secondary_end_proficiency_math,youth_15_24_literacy_rate_male,youth_15_24_literacy_rate_female,birth_rate,gross_primary_education_enrollment,gross_tertiary_education_enrollment,unemployment_rate
135,Pakistan,30.375321,69.345116,0,14,0,0,0,0,0,...,0,0,0,0,0,0,28.25,94.3,9.0,4.45
