<a href="https://colab.research.google.com/github/AkiraNom/data-analysis-notebook/blob/main/GenderGapIndex_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import pandas as pd
import plotly.express as px
import glob

# Project - Gender Gap Index vs Fertility rate
<b>Analysis concept/ Objects</b>: <br>
 <p> [NHK](https://www3.nhk.or.jp/news/special/news_seminar/jiji/jiji133/) reported that gender gap index and fertility rate is possitively correlated.</p>
 <img src='https://www3.nhk.or.jp/news/special/news_seminar/assets/images/post/2023/03/20230330-syoushika3-03-2-716x565.jpg' alt='NHK_fig' title="GenderGapIndex_fig">

 <p>However, it seems that this graph is inappropriately cut out to bring the statement that the low gender gap index in Japan reflected the lower fertility rate.</p>


<p>
<b>Data source</b>: <br>

&nbsp;&nbsp;&nbsp; * Gender Gap Index : [World Economic Forum, Global Gender Gap Index 2023](https://www.weforum.org/publications/global-gender-gap-report-2023/)

&nbsp;&nbsp;&nbsp; * Fertility rate 2023:  [The world factbook 2023](https://www.cia.gov/the-world-factbook/field/total-fertility-rate/country-comparison/) <br>

&nbsp;&nbsp;&nbsp; * Fertility rate time-series : [Fertility rate in OECD countries](https://data.oecd.org/pop/fertility-rates.htm) <br>

Data accessed: March 3, 2024

</p>

In [49]:
file_paths = glob.glob('GenderGapIndex*.csv')

In [50]:
df = pd.read_csv('CountryGroup.csv')
for path in file_paths:

  _df = pd.read_csv(path)

  print(path.split('_')[-2])

  file_name = path.split('_')[-2]

  _df.rename(columns={'Rank':f'Rank_{file_name}',
                      'Score': f'Score_{file_name}'},
             inplace=True
             )

  df = pd.merge(df, _df, how='left', on='Country')

df

GenderGapIndex
EducationAttainment
EconomicParticipation
PoliticalEmpowerment
HealthSurvival


Unnamed: 0,Country,Region,Rank_GenderGapIndex,Score_GenderGapIndex,Rank_EducationAttainment,Score_EducationAttainment,Rank_EconomicParticipation,Score_EconomicParticipation,Rank_PoliticalEmpowerment,Score_PoliticalEmpowerment,Rank_HealthSurvival,Score_HealthSurvival
0,Afghanistan,Southern Asia,146,0.405,146.0,0.482,146.0,0.188,146.0,0.000,141.0,0.952
1,Albania,Europe,17,0.791,33.0,0.999,18.0,0.786,28.0,0.419,133.0,0.960
2,Algeria,Middle East and North Africa,144,0.573,116.0,0.951,145.0,0.317,135.0,0.065,137.0,0.958
3,Angola,Sub-Saaharan Africa,118,0.656,142.0,0.738,107.0,0.605,46.0,0.305,44.0,0.976
4,Argentina,Latin America and the Carribean,36,0.762,1.0,1.000,95.0,0.644,26.0,0.429,41.0,0.977
...,...,...,...,...,...,...,...,...,...,...,...,...
141,Uruguay,Latin America and the Carribean,67,0.714,1.0,1.000,47.0,0.726,94.0,0.152,1.0,0.980
142,Vanuatu,East Asia and the Pacific,108,0.678,74.0,0.991,35.0,0.742,145.0,0.006,65.0,0.971
143,Viet Nam,East Asia and the Pacific,72,0.711,89.0,0.985,31.0,0.749,89.0,0.166,144.0,0.946
144,Zambia,Sub-Saaharan Africa,85,0.699,101.0,0.979,40.0,0.734,119.0,0.102,1.0,0.980


In [20]:
df.to_csv('GenderGapIndex_2023_merged.csv')

In [21]:
df_fertility = pd.read_csv('./data/Total_fertility_rate_2023.csv')
df_fertility.head()

Unnamed: 0,name,slug,children born/woman,date_of_information,ranking,region
0,Niger,niger,6.73,2023,1,Africa
1,Angola,angola,5.76,2023,2,Africa
2,"Congo, Democratic Republic of the",congo-democratic-republic-of-the,5.56,2023,3,Africa
3,Mali,mali,5.45,2023,4,Africa
4,Benin,benin,5.39,2023,5,Africa


In [51]:
# check if there are any missing names in either data
temp4 = []
ls1 = df.sort_values(by='Country')['Country'].tolist()
ls2 = df_fertility.sort_values(by='name')['name'].tolist()
for i in ls1:
    if i not in ls2:
      temp4.append(i)

temp4

['Brunei Darussalam', 'Myanmar']

In [52]:
# merge two dataset
df = pd.merge(df, df_fertility, how='left', left_on='Country',right_on='name')

In [53]:
df.isna().sum()

Country                        0
Region                         0
Rank_GenderGapIndex            0
Score_GenderGapIndex           0
Rank_EducationAttainment       1
Score_EducationAttainment      1
Rank_EconomicParticipation     3
Score_EconomicParticipation    3
Rank_PoliticalEmpowerment      5
Score_PoliticalEmpowerment     5
Rank_HealthSurvival            3
Score_HealthSurvival           3
name                           2
slug                           2
children born/woman            2
date_of_information            2
ranking                        2
region                         2
dtype: int64

In [54]:
# drop rows containing nan in the children born/woman column
# i.e. Myanmar, and Brunei
df.dropna(subset=['children born/woman'], inplace=True)

In [55]:
df.isna().sum()

Country                        0
Region                         0
Rank_GenderGapIndex            0
Score_GenderGapIndex           0
Rank_EducationAttainment       1
Score_EducationAttainment      1
Rank_EconomicParticipation     3
Score_EconomicParticipation    3
Rank_PoliticalEmpowerment      5
Score_PoliticalEmpowerment     5
Rank_HealthSurvival            3
Score_HealthSurvival           3
name                           0
slug                           0
children born/woman            0
date_of_information            0
ranking                        0
region                         0
dtype: int64

In [56]:
# add column if G7 or not
G7_list = ['Canada','France','Germany','Italy','Japan','United Kingdom','United States of America']
df.loc[:,'G7'] = df[['Country']].isin(G7_list)
display(df.groupby(['G7']).count().iloc[:,0])

# add column developed countries
europe = df[df['Region']=='Europe']['Country'].tolist()
north_america =df[df['Region']=='North America']['Country'].tolist()
developed_countries_list = europe + north_america + ['Japan','Korea','New Zealand','Australia','Israel']
df.loc[:,'Developed countries'] = df[['Country']].isin(developed_countries_list)
display(df.groupby(['Developed countries']).count().iloc[:,0])


G7
False    137
True       7
Name: Country, dtype: int64

Developed countries
False    101
True      43
Name: Country, dtype: int64

In [57]:
df.head()

Unnamed: 0,Country,Region,Rank_GenderGapIndex,Score_GenderGapIndex,Rank_EducationAttainment,Score_EducationAttainment,Rank_EconomicParticipation,Score_EconomicParticipation,Rank_PoliticalEmpowerment,Score_PoliticalEmpowerment,Rank_HealthSurvival,Score_HealthSurvival,name,slug,children born/woman,date_of_information,ranking,region,G7,Developed countries
0,Afghanistan,Southern Asia,146,0.405,146.0,0.482,146.0,0.188,146.0,0.0,141.0,0.952,Afghanistan,afghanistan,4.53,2023.0,16.0,South Asia,False,False
1,Albania,Europe,17,0.791,33.0,0.999,18.0,0.786,28.0,0.419,133.0,0.96,Albania,albania,1.55,2023.0,193.0,Europe,False,True
2,Algeria,Middle East and North Africa,144,0.573,116.0,0.951,145.0,0.317,135.0,0.065,137.0,0.958,Algeria,algeria,2.97,2023.0,49.0,Africa,False,False
3,Angola,Sub-Saaharan Africa,118,0.656,142.0,0.738,107.0,0.605,46.0,0.305,44.0,0.976,Angola,angola,5.76,2023.0,2.0,Africa,False,False
4,Argentina,Latin America and the Carribean,36,0.762,1.0,1.0,95.0,0.644,26.0,0.429,41.0,0.977,Argentina,argentina,2.17,2023.0,91.0,South America,False,False


In [125]:
X = 'Score_GenderGapIndex'
Y = 'children born/woman'
fig = px.scatter(data_frame=df,
                 x=X,
                 y=Y,
                 hover_name='Country',
                 color='G7'
                 )
fig.update_layout(title = '<b>Gender Gap Index vs Fertility Rate between G7 and non-G7 Countries</b>',
                  xaxis = dict(title='<b>Gender gap index</b>'),
                  yaxis = dict(title='<b>Fertility rate (children/woman)</b>'),
                  font = dict(size=14))

fig.update_xaxes(range=[0, 1])
# fig.update_yaxes(range=[1,2.5])
fig.update_traces(marker=dict(size=11))

## add footer
source_note = f'<b>Source: <a href="https://www.cia.gov/the-world-factbook/field/total-fertility-rate/country-comparison/">The world factbook 2023</a> ,<a href="https://www.weforum.org/publications/global-gender-gap-report-2023/">Global Gender Gap Report 2023 (WEF)</a></b>'

fig.add_annotation(
        showarrow=False,
        text=source_note,
        font=dict(color='black',size=13),
        xref='x domain',
        x=0.0,
        yref='y domain',
        y=-0.2
        )
# label Japan
country = 'Japan'
fig.add_annotation(
    x=float(df[df['Country']==country][X]),
    y=float(df[df['Country']==country][Y]),
    text=country,
    showarrow=True,
    xanchor="left",
    ax=-100,
    ay=-75,
    font=dict(
    color="black",
    size=13
    ),
    arrowcolor="black",
    arrowsize=1,
    arrowwidth=2,
    arrowhead=1
)
fig.show()

In [124]:
X = 'Score_GenderGapIndex'
Y = 'children born/woman'
hover_name = 'Country'
color = 'Developed countries'

fig = px.scatter(data_frame=df,
                 x=X,
                 y=Y,
                 hover_name=hover_name,
                 color=color
                 )
fig.update_layout(title = '<b>Gender Gap Index vs Fertility Rate between Developed and Developing Countries</b>',
                  xaxis = dict(title='<b>Gender gap index</b>'),
                  yaxis = dict(title='<b>Fertility rate (children/woman)</b>'),
                  font = dict(size=14))

fig.update_xaxes(range=[0, 1])
fig.update_traces(marker=dict(size=11))

source_note = f'<b>Source: <a href="https://www.cia.gov/the-world-factbook/field/total-fertility-rate/country-comparison/">The world factbook 2023</a> ,<a href="https://www.weforum.org/publications/global-gender-gap-report-2023/">Global Gender Gap Report 2023 (WEF)</a></b>'

fig.add_annotation(
        showarrow=False,
        text=source_note,
        font=dict(color='black',size=13),
        xref='x domain',
        x=0.0,
        yref='y domain',
        y=-0.2
        )

# label Japan
country = 'Japan'
fig.add_annotation(
    x=float(df[df['Country']==country][X]),
    y=float(df[df['Country']==country][Y]),
    text=country,
    showarrow=True,
    xanchor="left",
    ax=-100,
    ay=-75,
    font=dict(
    color="black",
    size=13
    ),
    arrowcolor="black",
    arrowsize=1,
    arrowwidth=2,
    arrowhead=1
)

fig.show()

## Correlation coefficient

In [136]:
X = 'Score_GenderGapIndex'
Y = 'children born/woman'

df_temp = df[df['G7']==True].copy()
correlation_coef_g7 = (df_temp['Score_GenderGapIndex'].corr(df_temp['children born/woman']))
df_temp = df[df['Developed countries']==True].copy()
correlation_coef_developed_countries = (df_temp['Score_GenderGapIndex'].corr(df_temp['children born/woman']))
print(f'Correlation coefficient between Gender Gap Index and Fertility rate \n \
        G7 countries: {correlation_coef_g7:.2f} \n \
        Developed countries: {correlation_coef_developed_countries:.2f}')


Correlation coefficient between Gender Gap Index and Fertility rate 
         G7 countries: 0.49 
         Developed countries: 0.21


## Transition of fertility rate in OECD countries

In [87]:
# transition of fertility rate in OECD countries
df_transition = pd.read_csv('Fertility_rate_transition.csv')
df_transition.head()

Unnamed: 0,Country,1970,1971,1972,1973,1974,1975,1976,1977,1978,...,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Argentina,3.08,3.11,3.15,3.2,3.25,3.3,3.34,3.36,3.36,...,2.33,2.32,2.31,2.3,2.24,2.17,2.04,1.99,1.91,1.89
1,Australia,2.86,2.95,2.74,2.49,2.32,2.15,2.06,2.01,1.95,...,1.93,1.88,1.79,1.79,1.79,1.74,1.74,1.67,1.59,1.7
2,Austria,2.29,2.2,2.08,1.94,1.91,1.83,1.69,1.63,1.6,...,1.44,1.44,1.46,1.49,1.53,1.52,1.48,1.46,1.44,1.48
3,Belgium,2.25,2.21,2.09,1.95,1.83,1.74,1.73,1.71,1.69,...,1.8,1.76,1.74,1.7,1.68,1.65,1.62,1.6,1.55,1.6
4,Brazil,4.97,4.84,4.71,4.6,4.5,4.42,4.34,4.27,4.2,...,1.77,1.75,1.77,1.78,1.71,1.74,1.75,1.7,1.65,1.64


In [95]:
df_transition=df_transition.set_index(keys='Country')\
                           .stack()\
                           .reset_index()\
                           .rename(columns={'level_1':'Year',0:'Fertility rate'})\


In [98]:
df_transition.head()

Unnamed: 0,Country,Year,Fertility rate
0,Argentina,1970,3.08
1,Argentina,1971,3.11
2,Argentina,1972,3.15
3,Argentina,1973,3.2
4,Argentina,1974,3.25


In [110]:
df_transition.loc[:,'G7'] = df_transition[['Country']].isin(G7_list)
df_transition.groupby('G7').count()

Unnamed: 0_level_0,Country,Year,Fertility rate
G7,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
False,2469,2469,2469
True,312,312,312


In [123]:
_df = df_transition[(df_transition['G7']==True)|
                    (df_transition['Country']=='European Union')|
                    (df_transition['Country']=='OECD - Average')
                    ]
X='Year'
Y='Fertility rate'
color='Country'

fig = px.line(_df,
              x=X,
              y=Y,
              color=color
              )
fig.update_layout(title='<b>Fertility rate in OECD countries</b>')

# add japan label
country = 'Japan'
fig.add_annotation(
    x=int(df_transition[(df_transition['Country']==country)&
                    (df_transition['Year']==2010)][X]),
    y=float(df_transition[(df_transition['Country']==country)&
                    (df_transition['Year']==2010)][Y]),
    text=country,
    showarrow=True,
    xanchor="left",
    ax=-100,
    ay=-75,
    font=dict(
    color="black",
    size=13
    ),
    arrowcolor="black",
    arrowsize=1,
    arrowwidth=2,
    arrowhead=1
)

country = 'OECD - Average'
fig.add_annotation(
    x=int(df_transition[(df_transition['Country']==country)&
                    (df_transition['Year']==2010)][X]),
    y=float(df_transition[(df_transition['Country']==country)&
                    (df_transition['Year']==2010)][Y]),
    text=country,
    showarrow=True,
    xanchor="left",
    ax=-100,
    ay=-75,
    font=dict(
    color="black",
    size=13
    ),
    arrowcolor="black",
    arrowsize=1,
    arrowwidth=2,
    arrowhead=1
)

source_note = f'<b>Source: <a href="https://data.oecd.org/pop/fertility-rates.htm">OECD data</a></b>'

fig.add_annotation(
        showarrow=False,
        text=source_note,
        font=dict(color='black',size=13),
        xref='x domain',
        x=0.0,
        yref='y domain',
        y=-0.2
        )

# Conclusion

<p>Fertility rate is generally decreased among G7 countries and developed countries. According to the OECD, without migration and mortality, a total fertility rate of 2.1 children/woman is necessary to have stable population in a country.Thus all countries except for Israel, have lower fertility rate than 2.1 regardless of the gender gap. The fertility rate among developed countries enter the downward trend especially since 2010. Furthermore, the correlation coefficient between the gender gap index and the fertility rate is below 0.5 indicates the low or little correlation. Thus, the claim by NHK that there is the positive correlation between the fertility rate and the gender gap index is misleading. </p>
