# <u>**ACADEMY**</u> : Educational Systems Data Analysis

## Author
> Mohamed Ali EL HAMECH 

- @MasterCodeDevelop (Github)
- E-mail : master.code.develop@gmail.com
- Affiliation : ACADEMY #OpenClassRooms
<p><img align="left" src="https://github-readme-stats.vercel.app/api/top-langs?username=mastercodedevelop&show_icons=true&locale=en&layout=compact" alt="mastercodedevelop" />

## Introduction:
>In the rapidly evolving landscape of online education, ACADEMY, a burgeoning EdTech start-up, has been making significant strides by offering high-quality online training content tailored for high school and university students. As part of its strategic vision, ACADEMY is keenly exploring opportunities for international expansion, aiming to tap into markets with a robust educational framework and a potential clientele for its services.

_The objective of this analysis is twofold. Firstly, it seeks to delve into global educational data, sourced from the World Bank, to ascertain the viability and potential of various countries as prospective markets for ACADEMY. This dataset, curated by the "EdStats All Indicator Query" of the World Bank, boasts a comprehensive collection of over 4,000 international indicators. These indicators span a range of metrics, from access to education and graduation rates to insights about educators and educational expenditures._

_Secondly, this analysis aims to provide a clear, data-driven narrative that would aid ACADEMY's decision-makers in charting the company's international trajectory. By evaluating the quality of the dataset, understanding its breadth and depth, and extracting relevant insights, we hope to offer a roadmap that aligns with ACADEMY's mission and vision._

## Import Necessary Libraries
> Before you can work with the data, you need to import the necessary libraries.

In [38]:
import pandas as pd
import numpy as np

## Load the Data
> Importing the dataset into the environment for analysis and exploration.

In [39]:
data = pd.read_csv('./data/EdStatsData.csv')

## Initial Data Exploration
> Diving into the dataset to understand its structure, content, and characteristics.

### Preview of the First and Last Rows of the Dataset
> View of the dataset's start & end, providing a quick data structure glimpse.

In [40]:
print("First 5 rows of the dataset:")
data.head()

First 5 rows of the dataset:


Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1970,1971,1972,1973,1974,1975,...,2060,2065,2070,2075,2080,2085,2090,2095,2100,Unnamed: 69
0,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2,,,,,,,...,,,,,,,,,,
1,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.F,,,,,,,...,,,,,,,,,,
2,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.GPI,,,,,,,...,,,,,,,,,,
3,Arab World,ARB,"Adjusted net enrolment rate, lower secondary, ...",UIS.NERA.2.M,,,,,,,...,,,,,,,,,,
4,Arab World,ARB,"Adjusted net enrolment rate, primary, both sex...",SE.PRM.TENR,54.822121,54.894138,56.209438,57.267109,57.991138,59.36554,...,,,,,,,,,,


In [41]:
print("Last 5 rows of the dataset:")
print(data.tail())

Last 5 rows of the dataset:
       Country Name Country Code  \
886925     Zimbabwe          ZWE   
886926     Zimbabwe          ZWE   
886927     Zimbabwe          ZWE   
886928     Zimbabwe          ZWE   
886929     Zimbabwe          ZWE   

                                           Indicator Name  \
886925  Youth illiterate population, 15-24 years, male...   
886926  Youth literacy rate, population 15-24 years, b...   
886927  Youth literacy rate, population 15-24 years, f...   
886928  Youth literacy rate, population 15-24 years, g...   
886929  Youth literacy rate, population 15-24 years, m...   

              Indicator Code  1970  1971  1972  1973  1974  1975  ...  2060  \
886925      UIS.LP.AG15T24.M   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   
886926     SE.ADT.1524.LT.ZS   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   
886927  SE.ADT.1524.LT.FE.ZS   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   
886928  SE.ADT.1524.LT.FM.ZS   NaN   NaN   NaN   NaN   NaN   NaN  ...   

### General information about the DataFrame
> This includes the data type, number of non-zero values, etc.

In [42]:
print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 886930 entries, 0 to 886929
Data columns (total 70 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   Country Name    886930 non-null  object 
 1   Country Code    886930 non-null  object 
 2   Indicator Name  886930 non-null  object 
 3   Indicator Code  886930 non-null  object 
 4   1970            72288 non-null   float64
 5   1971            35537 non-null   float64
 6   1972            35619 non-null   float64
 7   1973            35545 non-null   float64
 8   1974            35730 non-null   float64
 9   1975            87306 non-null   float64
 10  1976            37483 non-null   float64
 11  1977            37574 non-null   float64
 12  1978            37576 non-null   float64
 13  1979            36809 non-null   float64
 14  1980            89122 non-null   float64
 15  1981            38777 non-null   float64
 16  1982            37511 non-null   float64
 17  1983      

### Missing Values & Percentages
> Overview of missing data in columns, presented as counts & percentages

In [43]:
# calculate the sum of missing values for each column
missing_values = data.isnull().sum()

# calculate the percentage of missing values for each column
missing_percentages = data.isnull().mean() * 100

# Create the DataFrame with missing values and missing percentages
missing_data = pd.DataFrame({'Missing Values': missing_values, 'Missing Percentages': missing_percentages})

# Sort the DataFrame by percentage of missing data in descending order
missing_data = missing_data[missing_data['Missing Values'] != 0].sort_values(by='Missing Percentages', ascending=False)

print(missing_data)

             Missing Values  Missing Percentages
Unnamed: 69          886930           100.000000
2017                 886787            99.983877
2016                 870470            98.144160
1971                 851393            95.993258
1973                 851385            95.992356
...                     ...                  ...
2011                 740918            83.537370
2012                 739666            83.396209
2000                 710254            80.080051
2005                 702822            79.242105
2010                 644488            72.665036

[66 rows x 2 columns]
