<a href="https://colab.research.google.com/github/Ishani-Patel/data690_fall2022/blob/main/data690_world_dev/Individual_Project_Part_B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Individual Project Part B - Ishani Mayur Patel

# Better Education !=  Better Employment

#### What is it that you are investigating/exploring/analyzing (provide sufficient background information)?
Sometimes a greater education may not result in a better job. There are several stages of education. Primary is the starting point, followed by secondary, and so on. The majority of children who start in primary school drop out before they reach secondary school. One of the foundational elements of growth is education. In my study, I'll be analyzing schooling statistics from various nations.

The following are the names of the nations:
- India
- China
- New Zealand
- Australia
- Bangladesh
- United States

[UN List of least developed countries](https://unctad.org/topic/least-developed-countries/list)

#### Why is it important to you and/or to others?
As a citizen of one of the developing countries confronted by the aforementioned issues I believe it is crucial that we identify the primary cause of education abandonment in between. Problems are presented differently to each gender. If the cause can be identified and addressed, the entire system may be affected. Greater education could result in more employment. More development might result from more jobs. 

#### What questions do you have in mind and would like to answer?
Some of the questions that I have are as follows:
- What is the male-to-female ratio in primary and secondary education?
- How many of them continue on to secondary school?
- What is the employment ratio for these two genders?
- Is there a link between education and employment?
- Is there a link between employment and a country's development?

#### Where do you get the data and charts to help answer your questions (give references/credits)?  
Data may be obtained from official websites of countries. Data may be gathered from websites without download functionality using a Web scraper. You can utilize information from the World Development Explorer. Websites like Kaggle, Google Datasets, Amazon Datasets, and others have historical data available.


#### What process/step you use to analyze the situation/issue?
Project steps include:
- Gather information from a variety of sources.
- Analyze the facts in light of the provided hypothesis.
- Prove if the hypothesis is correct or incorrect.
- Comparing statistics between genders
- Compare statistics from different nations.
- Determine the connection between the data's characteristics (Primary Education, Secondary Education, Employment Ratio, Development Related Data)
- Determine how each characteristic impacts the others.

#### Import necessary Libraries

In [1]:
import pandas as pd
import plotly.express as px

### Importing the data

In [2]:
# Importing the data
final_df = pd.read_csv("https://raw.githubusercontent.com/Ishani-Patel/data690_fall2022/main/data690_world_dev/data/final%20data%20wide.csv")

In [3]:
final_df.columns

Index(['Unnamed: 0', 'Year', 'Country Code', 'Country Name', 'Region',
       'Income Group', 'Lending Type', 'SE.PRM.CMPL.FE.ZS',
       'SE.PRM.CMPL.MA.ZS', 'SE.PRM.CMPL.ZS', 'SE.PRM.ENRR', 'SE.SEC.ENRR',
       'SE.TER.ENRR', 'SE.TOT.ENRR', 'SL.UEM.TOTL.ZS'],
      dtype='object')

### Indicator Description:

1. 'Year'
2. 'Country Code'
3. 'Country Name'
4. 'Region'
5. 'Income Group'
6. 'Lending Type'
7. 'SE.PRM.CMPL.FE.ZS' : Gross graduation ratio, primary, female (%)
8. 'SE.PRM.CMPL.MA.ZS' : Gross graduation ratio, primary, male (%)
9. 'SE.PRM.CMPL.ZS' : Gross graduation ratio, primary, total (%)
10. 'SE.PRM.ENRR' : School enrollment, primary (% gross)
11. 'SE.SEC.ENRR' : School enrollment, secondary (% gross)
12. 'SE.TER.ENRR' : School enrollment, tertiary (% gross)
13. 'SE.TOT.ENRR	' : Gross Enrollment Ratio, primary to tertiary, both sexes (%)
14. 'SL.UEM.TOTL.ZS' : Unemployment, total (% of total labor force) (modeled ILO estimate) 

In [4]:
final_df.head()

Unnamed: 0.1,Unnamed: 0,Year,Country Code,Country Name,Region,Income Group,Lending Type,SE.PRM.CMPL.FE.ZS,SE.PRM.CMPL.MA.ZS,SE.PRM.CMPL.ZS,SE.PRM.ENRR,SE.SEC.ENRR,SE.TER.ENRR,SE.TOT.ENRR,SL.UEM.TOTL.ZS
0,0,2010,ABW,Aruba,Latin America & Caribbean,High income,Not classified,92.45283,96.56539,94.52969,113.794296,95.82354,37.34573,86.5538,
1,1,2010,AFG,Afghanistan,South Asia,Low income,IDA,,,,100.071709,50.567249,,,11.352
2,2,2010,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,29.60625,39.33934,34.43853,105.781036,26.25922,,54.82558,9.43
3,3,2010,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,87.725,87.6419,87.68161,93.490471,88.103889,44.549252,76.49879,14.09
4,4,2010,ARE,United Arab Emirates,Middle East & North Africa,High income,Not classified,84.58097,79.45609,81.87095,,,,,2.481


In [5]:
final_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2199 entries, 0 to 2198
Data columns (total 15 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Unnamed: 0         2199 non-null   int64  
 1   Year               2199 non-null   int64  
 2   Country Code       2199 non-null   object 
 3   Country Name       2199 non-null   object 
 4   Region             2199 non-null   object 
 5   Income Group       2199 non-null   object 
 6   Lending Type       2199 non-null   object 
 7   SE.PRM.CMPL.FE.ZS  851 non-null    float64
 8   SE.PRM.CMPL.MA.ZS  851 non-null    float64
 9   SE.PRM.CMPL.ZS     872 non-null    float64
 10  SE.PRM.ENRR        1620 non-null   float64
 11  SE.SEC.ENRR        1382 non-null   float64
 12  SE.TER.ENRR        1389 non-null   float64
 13  SE.TOT.ENRR        962 non-null    float64
 14  SL.UEM.TOTL.ZS     2057 non-null   float64
dtypes: float64(8), int64(2), object(5)
memory usage: 257.8+ KB


### Cleaning Data

In [6]:
final_df.drop(columns=['Unnamed: 0'], inplace = True)

In [7]:
final_df1 = final_df.groupby(['Year', 'Region']).transform(lambda x: x.fillna(x.mean())) #group by using year and region and fill the null data with the mean 

  """Entry point for launching an IPython kernel.


In [8]:
final_df.update(final_df1)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2199 entries, 0 to 2198
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Year               2199 non-null   int64  
 1   Country Code       2199 non-null   object 
 2   Country Name       2199 non-null   object 
 3   Region             2199 non-null   object 
 4   Income Group       2199 non-null   object 
 5   Lending Type       2199 non-null   object 
 6   SE.PRM.CMPL.FE.ZS  1896 non-null   float64
 7   SE.PRM.CMPL.MA.ZS  1896 non-null   float64
 8   SE.PRM.CMPL.ZS     1896 non-null   float64
 9   SE.PRM.ENRR        2197 non-null   float64
 10  SE.SEC.ENRR        2197 non-null   float64
 11  SE.TER.ENRR        2197 non-null   float64
 12  SE.TOT.ENRR        1996 non-null   float64
 13  SL.UEM.TOTL.ZS     2199 non-null   float64
dtypes: float64(8), int64(1), object(5)
memory usage: 240.6+ KB


In [9]:
final_df.dropna(inplace = True)

In [10]:
final_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1896 entries, 0 to 1997
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Year               1896 non-null   int64  
 1   Country Code       1896 non-null   object 
 2   Country Name       1896 non-null   object 
 3   Region             1896 non-null   object 
 4   Income Group       1896 non-null   object 
 5   Lending Type       1896 non-null   object 
 6   SE.PRM.CMPL.FE.ZS  1896 non-null   float64
 7   SE.PRM.CMPL.MA.ZS  1896 non-null   float64
 8   SE.PRM.CMPL.ZS     1896 non-null   float64
 9   SE.PRM.ENRR        1896 non-null   float64
 10  SE.SEC.ENRR        1896 non-null   float64
 11  SE.TER.ENRR        1896 non-null   float64
 12  SE.TOT.ENRR        1896 non-null   float64
 13  SL.UEM.TOTL.ZS     1896 non-null   float64
dtypes: float64(8), int64(1), object(5)
memory usage: 222.2+ KB


- Creating New DataFrame for the selected Countries

In [11]:
df_countries = final_df[final_df['Country Name'].isin(['Australia','India','New Zealand','United States','Afganistan','Angola','Benin','Bangladesh','Bhutan','China'])]

### Data Visualization 

In [12]:
# School Enrollment Primary (Time-Series Analysis)

fig = px.line(
    df_countries,
    x="Year",
    y="SE.PRM.ENRR",
    title="School Enrollment Primary over time", 
    color="Country Name",
    template="plotly_dark"
)

fig.update_layout(showlegend=True)

fig.show()

In [13]:
# School Enrollment Secondary (Time-Series Analysis)

fig = px.line(
    df_countries,
    x="Year",
    y="SE.SEC.ENRR",
    title="School Enrollment Secondary over time", 
    color="Country Name",
    template="plotly_dark"
)

fig.update_layout(showlegend=True)

fig.show()

In [14]:
# School Enrollment Tertiary (Time-Series Analysis)

fig = px.line(
    df_countries,
    x="Year",
    y="SE.TER.ENRR",
    title="School Enrollment Tertiary over time", 
    color="Country Name",
    template="plotly_dark",
)

fig.update_layout(showlegend=True)

fig.show()

In [15]:
# Gross Enrollment Ratio from Primary to Tertiary both Sexes (Time-Series Analysis)

fig = px.line(
    df_countries,
    x="Year",
    y="SE.TOT.ENRR",
    title="Gross Enrollment Ratio from Primary to Tertiary both Sexes over time", 
    color="Country Name",
    template="plotly_dark",
)

fig.update_layout(showlegend=True)

fig.show()

In [16]:
# Gross Graduation Ratio for Primary both Sexes (Time-Series Analysis)

fig = px.line(
    df_countries,
    x="Year",
    y="SE.PRM.CMPL.ZS",
    title="Gross Graduation Ratio for Primary both Sexes over time", 
    color="Country Name",
    template="plotly_dark",
)

fig.update_layout(showlegend=True)

fig.show()

In [17]:
# Gross Graduation Ratio for Primary Male (Time-Series Analysis)

fig = px.line(
    df_countries,
    x="Year",
    y="SE.PRM.CMPL.MA.ZS",
    title="Gross Graduation Ratio for Primary Male over time", 
    color="Country Name",
    template="plotly_dark",
)

fig.update_layout(showlegend=True)

fig.show()

In [18]:
# Gross Graduation Ratio for Primary Female (Time-Series Analysis)

fig = px.line(
    df_countries,
    x="Year",
    y="SE.PRM.CMPL.FE.ZS",
    title="Gross Graduation Ratio for Primary Female over time", 
    color="Country Name",
    template="plotly_dark",
)

fig.update_layout(showlegend=True)

fig.show()

In [19]:
# final_df2=final_df1[final_df1['Country Name'].isin(['Australia','India','New Zealand','United States','Benin','China'])]

In [20]:
# 2016 Gross Graduation Ratio Primary to Tertiary (Bar-Chart Analysis)

fig = px.bar(
    df_countries.query('Year == 2016'), 
    x="Country Name", 
    y="SE.TOT.ENRR", 
    title="2016 Gross Enrollment Ratio Primary to Tertiary (Bar-Chart Analysis)", 
    color="Country Name", 
    template="plotly_dark"
)

fig.update_layout(showlegend=False)

fig.show()

In [21]:
# Unemployment VS Gross Enrolment Ratio by Gross Graduation Ratio (Scatter Plot Analysis)

fig = px.scatter(
    final_df.query('Year == 2010'),
    x='SL.UEM.TOTL.ZS',
    y='SE.TOT.ENRR', 
    title="Education VS Unemployment", 
    color="Country Name",
    template="plotly_dark" 
)
fig.update_layout(showlegend=True)

fig.show()

## Thank You