<a href="https://colab.research.google.com/github/akukudala/world_development_explorer_final/blob/main/wdx_final/wdx_analysis_partB_final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* Name: Akshitha Kukudala
* Date: 4/08/2022

# Analyzing the percentages of female population (age 15-19) education stats over the years 2010 to 2015 
## Indicators to consider
* No Education
* Completed Primary Education
* Completed Secondary Education

The primary motive of this project is to focus on Female Primary and Secondary education in East Asia & Pacific region. 
Many governments publish statistics showing how well their education systems are working and improving,
with data on enrollment and completion. 
Monitoring progress toward national and global education targets helps allocate resources more efficiently and make quality learning opportunities accessible to all.
Better-quality education leads to an empowered citizenry and a more productive labor force. We will also analyze the mentioned region with the other regions in the world as per data derived from 
World Development Explorer. The reason why most of the females in the world are left uneducationed is due to their financial background. Let us consider the Financial states of the region to compare 
the correlation between the education and the income.

- **Data Source:** World Development Explorer ([worlddev.xyz](https://))
- **Regions Analyzed:** East Asia & Pacific
- **Years considered:** 2010-2015

In [1]:
import pandas as pd
import plotly.express as px


In [2]:

csv_file = "https://raw.githubusercontent.com/akukudala/world_development_explorer/main/wdi_data_3_indicators.csv"  

In [3]:
df = pd.read_csv(csv_file, index_col=0)

df.sample(5)

Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
190,2010,3.4,BAR.NOED.1519.FE.ZS,DEU,Germany,Europe & Central Asia,High income,Not classified
321,2010,11.39,BAR.SEC.CMPT.1519.FE.ZS,CZE,Czech Republic,Europe & Central Asia,High income,Not classified
304,2010,19.74,BAR.SEC.CMPT.1519.FE.ZS,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
49,2010,34.93,BAR.PRM.CMPT.1519.FE.ZS,GTM,Guatemala,Latin America & Caribbean,Upper middle income,IBRD
203,2010,18.53,BAR.NOED.1519.FE.ZS,IRQ,Iraq,Middle East & North Africa,Upper middle income,IBRD


The important analyzation factors are to find the answers for the below questions.

* What are the leading countries in no education, primary and secondary education seperately?
* How does the financial situation impacting the female education?
* What position does East Asia & Pacific is in comparison with other regions?

In [4]:
df_noed = df.query("indicator == 'BAR.NOED.1519.FE.ZS'").query("Region == 'East Asia & Pacific'")
df_noed = df_noed.sort_values(by= "value", ascending= False)
df_noed.sample(5)



Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
259,2010,2.02,BAR.NOED.1519.FE.ZS,SGP,Singapore,East Asia & Pacific,High income,Not classified
245,2010,40.67,BAR.NOED.1519.FE.ZS,PNG,Papua New Guinea,East Asia & Pacific,Lower middle income,Blend
185,2010,0.32,BAR.NOED.1519.FE.ZS,FJI,Fiji,East Asia & Pacific,Upper middle income,Blend
212,2010,0.05,BAR.NOED.1519.FE.ZS,KOR,"Korea, Rep.",East Asia & Pacific,High income,Not classified
149,2010,0.1,BAR.NOED.1519.FE.ZS,AUS,Australia,East Asia & Pacific,High income,Not classified


In [5]:
import plotly

fig = px.bar(
    data_frame= df_noed,
    x= "Country Name",
    labels={"value":"2010 Barro-Lee: Percentage of Female population age 15-19 with no education"},
    y= "value",
    color= "Country Name",
    height= 700,
    template=list(plotly.io.templates.keys())[5],
    title= " Percentage of Female population age 15-19 with no education in East Asia & Pacific "
)
fig = fig.update_layout(showlegend= False)


## Female population (age 15-19) over the years 2010 to 2015 with No Education
Firstly, let us observe the female population with no education among various countries

In [6]:
fig.show()

In [7]:
df_noed.shape

(21, 8)

In [8]:

df_pcmpt = df.query("indicator == 'BAR.PRM.CMPT.1519.FE.ZS'").query("Region == 'East Asia & Pacific'")
df_pcmpt = df_pcmpt.sort_values(by= "value", ascending= False)
df_pcmpt.sample(5)

Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
19,2010,39.93,BAR.PRM.CMPT.1519.FE.ZS,KHM,Cambodia,East Asia & Pacific,Lower middle income,IDA
71,2010,25.56,BAR.PRM.CMPT.1519.FE.ZS,LAO,Lao PDR,East Asia & Pacific,Lower middle income,IDA
95,2010,0.06,BAR.PRM.CMPT.1519.FE.ZS,NZL,New Zealand,East Asia & Pacific,High income,Not classified
64,2010,1.89,BAR.PRM.CMPT.1519.FE.ZS,JPN,Japan,East Asia & Pacific,High income,Not classified
115,2010,1.55,BAR.PRM.CMPT.1519.FE.ZS,SGP,Singapore,East Asia & Pacific,High income,Not classified


In [9]:
import plotly
fig = px.bar(
    data_frame= df_pcmpt,
    x= "Country Name",
    labels={"value":"2010 Barro-Lee: Percentage of Female population age 15-19 with primary schooling, Completed Primary"},
    y= "value",
    color= "Country Name",
    height= 700,
    template=list(plotly.io.templates.keys())[5],
    title= " Percentage of Female population age 15-19 with Primary education in East Asia & Pacific "
)

fig = fig.update_layout(showlegend= False)

## Female population (age 15-19) over the years 2010 to 2015 with Primary Education Completion
Countries with Female Population % who has completed Primary education

In [10]:
fig.show()

In [11]:
df_pcmpt.shape

(21, 8)

In [12]:
df_scmpt = df.query("indicator == 'BAR.SEC.CMPT.1519.FE.ZS'").query("Region == 'East Asia & Pacific'")
df_scmpt = df_scmpt.sort_values(by= "value", ascending= False)
df_scmpt.sample(5)

Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
345,2010,16.73,BAR.SEC.CMPT.1519.FE.ZS,IDN,Indonesia,East Asia & Pacific,Lower middle income,IBRD
366,2010,34.39,BAR.SEC.CMPT.1519.FE.ZS,MAC,"Macao SAR, China",East Asia & Pacific,High income,Not classified
428,2010,11.31,BAR.SEC.CMPT.1519.FE.ZS,VNM,Vietnam,East Asia & Pacific,Lower middle income,IBRD
312,2010,45.25,BAR.SEC.CMPT.1519.FE.ZS,CHN,China,East Asia & Pacific,Upper middle income,IBRD
389,2010,5.6,BAR.SEC.CMPT.1519.FE.ZS,PNG,Papua New Guinea,East Asia & Pacific,Lower middle income,Blend


In [13]:
import plotly
fig = px.bar(
    data_frame= df_scmpt,
    x= "Country Name",
    labels={"value":"2010 Barro-Lee: Percentage of Female population age 15-19 with secondary schooling, Completed Secondary"},
    y= "value",
    color= "Country Name",
    height= 700,
    template=list(plotly.io.templates.keys())[5],
    title= " Percentage of Female population age 15-19 with Secondary education in East Asia & Pacific",
    
)

fig = fig.update_layout(showlegend= False)

## Female population (age 15-19) over the years 2010 to 2015 with Secondary Education Completion

Countries with Female Population % who has completed Secondary education

In [14]:
fig.show()

In the above three charts, we have seen different countries in leading for no education, primary and secondary completion. Now, let us consider 3 different Income Groups and filter the female education % financial status wise and observe how the financial status influence the education of an individual.

In [15]:
df_noed.groupby('Income Group')
df_noed2 = df_noed.groupby(['Income Group']).agg(
    Hours_Mean = ('value', 'mean'))

df_noed2

Unnamed: 0_level_0,Hours_Mean
Income Group,Unnamed: 1_level_1
High income,3.5025
Lower middle income,16.99625
Upper middle income,1.042


In [16]:
dfnoed3 = df_noed2.reset_index()

In [17]:
dfnoed3

Unnamed: 0,Income Group,Hours_Mean
0,High income,3.5025
1,Lower middle income,16.99625
2,Upper middle income,1.042


In [18]:
fig = px.pie(
    dfnoed3, values='Hours_Mean', names="Income Group",
  template=list(plotly.io.templates.keys())[5]
)
fig.update_traces(textposition='inside',hoverinfo="label+name")

fig.update_traces(text = dfnoed3['Income Group'].value_counts(), textinfo = 'label+percent')



indicator_dict = dict(
  font=dict(color="White",size=14),
  x=0.5,
  y=0,
  opacity=0.5,
  showarrow=False,
  text="Percentage of Female population age 15-19 with No education in East Asia & Pacific grouped by Income",
  textangle=0,
  xanchor='center',
  yanchor="top",
  xref="paper",
  yref="paper"
)

fig = fig.add_annotation(indicator_dict) 

## No Education grouping by Incomes
While it is obvious that lower middle income population has the highest illeteracy rates, it is surprising to the high income to take the second place.

In [19]:
fig.show()

In [20]:
df_pcmpt.groupby('Income Group')
df_pcmpt2 = df_pcmpt.groupby(['Income Group']).agg(
    Hours_Mean = ('value', 'mean'))

df_pcmpt2

Unnamed: 0_level_0,Hours_Mean
Income Group,Unnamed: 1_level_1
High income,6.38375
Lower middle income,27.67375
Upper middle income,5.946


In [21]:
df_pcmpt3=df_pcmpt2.reset_index()

In [22]:
fig = px.pie(
    df_pcmpt3, values='Hours_Mean', names="Income Group",
  template=list(plotly.io.templates.keys())[5]
)
fig.update_traces(textposition='inside',hoverinfo="label+name")

fig.update_traces(text = df_pcmpt3['Income Group'].value_counts(), textinfo = 'label+percent')


indicator_dict = dict(
  font=dict(color="White",size=14),
  x=0.5,
  y=0,
  opacity=0.5,
  showarrow=False,
  text="Percentage of Female population age 15-19 with Primary education in East Asia & Pacific grouped by Income",
  textangle=0,
  xanchor='center',
  yanchor="top",
  xref="paper",
  yref="paper"
)

fig = fig.add_annotation(indicator_dict) 

##Primary Education grouping by Incomes
Lower middle income is leading in the primary education too. Despite of earning less, these countries are giving importance for the primary education to female population

In [23]:
fig.show()

In [24]:
df_scmpt.groupby('Income Group')
df_scmpt2 = df_scmpt.groupby(['Income Group']).agg(
    Hours_Mean = ('value', 'mean'))

df_scmpt2

Unnamed: 0_level_0,Hours_Mean
Income Group,Unnamed: 1_level_1
High income,53.225
Lower middle income,15.38875
Upper middle income,52.342


In [25]:
df_scmpt3=df_scmpt2.reset_index()

## Secondary Education grouping by Incomes

This chart cleary shows that the secondary education is mostly dependent on the financial status. As you can see, the Upper middle and high income has the highest percentages of female population who has completed the Secondary education

In [26]:
fig = px.pie(
    df_scmpt3, values='Hours_Mean', names="Income Group",
  template=list(plotly.io.templates.keys())[5]
)
fig.update_traces(textposition='inside',hoverinfo="label+name")

fig.update_traces(text = df_scmpt3['Income Group'].value_counts(), textinfo = 'label+percent')


indicator_dict = dict(
  font=dict(color="White",size=14),
  x=0.5,
  y=0,
  opacity=0.5,
  showarrow=False,
  text="Percentage of Female population age 15-19 with Secondary education in East Asia & Pacific grouped by Income",
  textangle=0,
  xanchor='center',
  yanchor="top",
  xref="paper",
  yref="paper"
)

fig = fig.add_annotation(indicator_dict) 

In [27]:
fig.show()

In [28]:
df.groupby('Region')
df_region2 = df.groupby(['Region']).agg(
    Hours_Mean = ('value', 'mean'))

df_region2

Unnamed: 0_level_0,Hours_Mean
Region,Unnamed: 1_level_1
East Asia & Pacific,20.349365
Europe & Central Asia,11.913333
Latin America & Caribbean,14.284933
Middle East & North Africa,16.325294
North America,13.353333
South Asia,20.247619
Sub-Saharan Africa,20.32625


In [29]:
df_region3 = df_region2.reset_index()

In [30]:
fig = px.pie(
    df_region3, values='Hours_Mean', names="Region",
  template=list(plotly.io.templates.keys())[5]
)
fig.update_traces(textposition='inside',hoverinfo="label+name")
fig.update_traces(text = df_region3['Region'].value_counts(), textinfo = 'label+percent')


indicator_dict = dict(
  font=dict(color="White",size=14),
  x=0.5,
  y=0,
  opacity=0.5,
  showarrow=False,
  text="Comparison between the regions in percentages of female population (age 15-19) education stats",
  textangle=0,
  xanchor='center',
  yanchor="top",
  xref="paper",
  yref="paper"
)

fig = fig.add_annotation(indicator_dict) 

The comparison between the regions in percentages of female population (age 15-19) education stats over the years 2010 to 2015. East Asia & Pacific holds the maximum percentage in completion of the Secondary education for Females in the mentioned year range among the other regions. North America also has a good % of female secondary education. East Asia & Pacific, North America and South Asia holds more than 50% of the secondary education % in the world.

In [31]:
fig.show()