# Gender inequality across OECD-countries
In this assignment, we conduct a descriptive analysis of the development in the gender gap across a selection of OECD-countries. In addition, we investigate potential causes of differences in the gender gap by illustrating differences in net child care costs, acess to formal child care and length of paternity leave. The goal is to gain deeper understanding of the potential drivers of the norm-effect that we incorperated in assignment one to make the household division of labor model fit the data better. 

# Code struture

a) Import packages and load data. \\
b) Gender Gap. \\
c) Net childcare costs. \\
d) Parental leave and formal child care. \\



# a. Import packages and load data

In [None]:
# Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
#from matplotlib_venn import venn2

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

In [2]:
# Load data
df_gg = pd.read_csv('Gender gap.csv')
df_pl = pd.read_csv('Parental leave.csv')
df_f = pd.read_csv('Family.csv')
df_nc = pd.read_csv('netchildcare costs.csv')

# b. Gender gap
1) We read the OECD-data set for the gender gap across countries. We drop irrelevant columns and rename 
2) We clean the data-set 
3) We make an interactive plot that illustrates the development in the gender gap over time for a selection of OECD-countries
4) We create a data set with only 2020-observations for later use


In [3]:
# Gender gap dataset

# Drop collumns that we do not need
drop_these = ['COUNTRY','SEX','Sex','Series','TIME','Unit','Unit Code','PowerCode Code','PowerCode','Reference Period Code','Reference Period','Flag Codes','Flags']
df_gg.drop(drop_these, axis=1, inplace=True)

#Rename columns
df_gg = df_gg.rename(columns = {"SERIES" : "Type", "Value": "Gender_gap", "Time": "Year"})

In [4]:
#Keep only the gender gap at the median 
I = df_gg.Type.str.contains('GWG5')
df_gg = df_gg.loc[I==True] 

# Reset old index
df_gg.reset_index(inplace = True, drop = True) # Drop old index too
df_gg.drop('Type', axis=1, inplace=True)

# Sort
df_gg = df_gg.sort_values(['Country','Year'])

# Reset old index
df_gg.reset_index(inplace = True, drop = True) # Drop old index too

In [42]:
# Country specific dataset
countries = ['Denmark', 'United States', 'Sweden','Germany','United Kingdom','OECD countries']
df_c_g = df_gg[df_gg['Country'].isin(countries)]
df_c_g = df_c_g.round(1)

In [43]:
# Times series plot of the Gender Gap
import plotly.graph_objects as go

fig = go.Figure()

for country, df_c_g in df_c_g.groupby('Country'):
    if country == 'OECD countries':  # add condition to change line style for OECD countries
        fig.add_trace(go.Scatter(x=df_c_g['Year'], y=df_c_g['Gender_gap'], name=country, line=dict(color='black', dash='dash')))
    else:
        fig.add_trace(go.Scatter(x=df_c_g['Year'], y=df_c_g['Gender_gap'], name=country))

fig.update_layout(title='Development in Gender Gap over time',
                  xaxis_title='Year',
                  yaxis_title='Pct.')

fig.show()

The gender gap has generally been decreasing over time across all selected OECD-countries. On average, the gender gap has decreased from around 16 pct. in 2005 to approximately 12 pct. in 2021. The gender gap is generally lower in Denmark and Sweden and higher in United States and United Kingdom in particular. In Germany, the gender gap is above the OECD-average over the entire period. 

In [7]:
# 2020 dataset for later use
df_gg_2020 = df_gg.loc[df_gg['Year']==2020]
df_gg_2020 = df_gg_2020.sort_values(['Country','Year'])
df_gg_2020.reset_index(inplace = True, drop = True) # Drop old index too

Unnamed: 0,Country,Year,Gender_gap
0,Australia,2020,10.533333
1,Austria,2020,12.38154
2,Belgium,2020,1.173184
3,Bulgaria,2020,2.554688
4,Canada,2020,16.112532


# c. Net child care costs
1) We read the OECD-data set for net child care costs across countries. We drop irrelevant columns and rename
2) We clean the data-set 
3) We make an interactive plot that illustrates the development in net child care costs over time for a selection of OECD-countries
4) We create a data set with only 2020-observations for later use

In our inagural project, we added a disutility of working in the labor market for women and interpreted this as norms. We did this based on an assumption that part of home production is related to caring of children. According to Kleven et al. (2019), when people have children, only female labor supply is affected, i.e., the child penalty only strikes for women where having children is basically a non event for fathers. One of the reasons for could be that a lack of access to child care in society forces women to stay at home (or at least work less). We investigate this hypothesis by plotting the net child care costs across our selection of OECD-countries as child care costs all things equal are higher when acess to formal child care is low. 

In [8]:
# Net child care costs 
# Drop collumns that we do not need
drop_these = ['LOCATION','Type of indicator','Net childcares cost by item','Family type','Earnings of the first adult','HBTOPUPS','TIME','Flag Codes','Flags']
df_nc.drop(drop_these, axis=1, inplace=True)

#Rename columns
df_nc = df_nc.rename(columns = {"TYPE" : "Type", "Value": "Cost", "Time": "Year"})

In [10]:
# Get one observation pr. year pr. country. 
# We consider percentage of net household income for two earner family with wages at the average. 
# We include social benefits and housing benefits
I = (df_nc['Type'] == 1) & (df_nc['FAMILY'] == '2EARNERC2C_67AW') & (df_nc['Include social assistance benefits'] == 'Yes') & (df_nc['Include housing benefits'] == 'Yes') & (df_nc['COMPONENTS'] == 5) & (df_nc['EARNINGS'] == '67AW') 

df_nc = df_nc.loc[I==True] 

# Reset old index
df_nc.reset_index(inplace = True, drop = True) # Drop old index too
df_nc.drop('Type', axis=1, inplace=True)
df_nc.drop('FAMILY', axis=1, inplace=True)

# Sort
df_nc = df_nc.sort_values(['Country','Year'])

# Reset old index
df_nc.reset_index(inplace = True, drop = True) # Drop old index too


In [44]:
# Keep only relevant countries
countries_2 = ['Denmark', 'United States', 'Sweden', 'Germany','United Kingdom','OECD - Total']
df_c_nc = df_nc[df_nc['Country'].isin(countries_2)]
df_c_nc = df_c_nc.round(1)

In [45]:
fig = go.Figure()

for country, df_c_nc in df_c_nc.groupby('Country'):
    if country == 'OECD - Total':  # add condition to change line style for OECD countries
        fig.add_trace(go.Scatter(x=df_c_nc['Year'], y=df_c_nc['Cost'], name=country, line=dict(color='black', dash='dash')))
    else:
        fig.add_trace(go.Scatter(x=df_c_nc['Year'], y=df_c_nc['Cost'], name=country))

fig.update_layout(title='Net costs for a household with two children using childcare',
                  xaxis_title='Year',
                  yaxis_title='Pct. of household income')

fig.show()

# d. Parental leave, formal child care and gender gap

In [13]:
# Parental leave dataset 
drop_these = ['COU','Indicator','SEX','Sex','AGE','Age Group','TIME','Unit','Unit Code','PowerCode Code','PowerCode','Reference Period Code','Reference Period','Flag Codes','Flags']
df_pl.drop(drop_these, axis=1, inplace=True)

#Rename columns
df_pl = df_pl.rename(columns = {"IND" : "Type", "Value": "Father_leave", "Time": "Year"})

In [14]:
#Keep only EMP18_PAT variable
I = df_pl.Type.str.contains('EMP18_PAT')
df_pl = df_pl.loc[I==True] 

# Reset old index
df_pl.reset_index(inplace = True, drop = True) # Drop old index too
df_pl.drop('Type', axis=1, inplace=True)

# Sort
df_pl = df_pl.sort_values(['Country','Year'])

# Reset old index
df_pl.reset_index(inplace = True, drop = True) # Drop old index too

In [15]:
# 2020
df_pl_2020 = df_pl.loc[df_pl['Year']==2020]
df_pl_2020 = df_pl_2020.sort_values(['Country','Year'])
df_pl_2020.reset_index(inplace = True, drop = True) # Drop old index too
df_pl_2020.head()

Unnamed: 0,Country,Year,Father_leave
0,Australia,2020,2.0
1,Austria,2020,13.0
2,Belgium,2020,19.3
3,Canada,2020,5.0
4,Chile,2020,1.0


In [16]:
# Child benefits dataset
drop_these = ['COU','Indicator','SEX','Sex','YEAR','Unit','Unit Code','PowerCode Code','PowerCode','Reference Period Code','Reference Period','Flag Codes','Flags']
df_f.drop(drop_these, axis=1, inplace=True)

In [17]:
#Keep FAM13
I = df_f.IND.str.contains('FAM13')
df_fam13 = df_f.loc[I==True] 
df_fam13 = df_fam13.rename(columns = {"IND" : "Type", "Value": "Formal_child_care"})

# Reset old index
df_fam13.reset_index(inplace = True, drop = True) # Drop old index too
df_fam13.drop("Type", axis=1, inplace=True)

# Sort
df_fam13 = df_fam13.sort_values(['Country','Year'])

# Reset old index
df_fam13.reset_index(inplace = True, drop = True) # Drop old index too

In [18]:
# 2020
df_fam13_2020 = df_fam13.loc[df_fam13['Year']==2020]
df_fam13_2020 = df_fam13_2020.sort_values(['Country','Year'])
df_fam13_2020.reset_index(inplace = True, drop = True) # Drop old index too
df_fam13_2020.head()

Unnamed: 0,Country,Year,Formal_child_care
0,Australia,2020,44.9
1,Austria,2020,20.2
2,Belgium,2020,56.9
3,Brazil,2020,21.1
4,Bulgaria,2020,15.0


In [29]:
df_merged = pd.merge(df_gg_2020, df_pl_2020, on=['Country','Year'], how='outer')
df_merged = pd.merge(df_merged, df_fam13_2020, on=['Country','Year'], how='outer')

In [36]:
df_merged = df_merged.round(1)
df_merged.fillna(0, inplace=True)
index_to_drop = df_merged[df_merged['Gender_gap'] == 0].index
df_merged = df_merged.drop(index=index_to_drop)
df_merged

Unnamed: 0,Country,Year,Gender_gap,Father_leave,Formal_child_care
0,Australia,2020,10.5,2.0,44.9
1,Austria,2020,12.4,13.0,20.2
2,Belgium,2020,1.2,19.3,56.9
3,Bulgaria,2020,2.6,0.0,15.0
4,Canada,2020,16.1,5.0,0.0
5,Chile,2020,8.6,1.0,19.5
6,Colombia,2020,2.7,1.6,29.8
7,Costa Rica,2020,3.0,0.0,0.0
8,Cyprus,2020,16.6,0.0,22.3
9,Czech Republic,2020,12.4,1.0,5.9


In [25]:
df_c_2020 = df_merged[df_merged['Country'].isin(countries)]
df_c_2020 = df_c_2020.round(1)
df_c_2020.fillna(0, inplace=True)
df_c_2020 = df_c_2020.drop(df_c_2020[df_c_2020['Country'] == 'OECD countries'].index)
df_c_2020

Unnamed: 0,Country,Year,Gender_gap,Father_leave,Formal_child_care
10,Denmark,2020,5.0,2.0,55.3
14,Finland,2020,16.0,9.0,37.0
16,Germany,2020,14.2,8.7,39.2
32,Sweden,2020,7.4,14.3,47.6
34,United Kingdom,2020,12.0,2.0,0.0
35,United States,2020,17.7,0.0,0.0


In [39]:
highlighted_countries = ['Denmark', 'Sweden', 'Finland', 'Germany','United Kingdom','United States']

# Create a color map for the markers
marker_colors = df_merged['Country'].apply(lambda country: 'red' if country in highlighted_countries else 'blue')

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_merged['Father_leave'],
    y=df_merged['Formal_child_care'],
    mode='markers',
    marker=dict(
        size=df_merged['Gender_gap'],
        sizemode='area',
        sizeref=0.1,
        sizemin=5,
        color=marker_colors,  # Set the marker color based on the color map
        opacity=0.7,  # Set the opacity to make the markers semi-transparent
        line=dict(width=0.5, color='white'),  # Add a white border around the markers
    ),
    text=df_merged['Country'],
))

# Set the plot layout
fig.update_layout(
    title='Paternal leave and formal child care by country',
    xaxis_title='Paternal leave (Weeks)',
    yaxis_title='Formal child care (Pct. of GDP)',
)

# Display the plot in an interactive window
fig.show()

# Conclusion