# Wages and divorce rates

In this project we seek to analyze Danish men and womens wages from 2013 to 2021 as well as the divorce rate in Denmark from 2011 to 2022. 
We use data from Danmarks Statistikbank. 

Imports and set magics:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams.update({"axes.grid":True,"grid.color":"black","grid.alpha":"0.25","grid.linestyle":"--"})
plt.rcParams.update({'font.size': 14})
plt.style.use('seaborn-whitegrid')

import ipywidgets as widgets
import seaborn as sns

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject


# Read and clean data

We are importing the three datasets from excel. 

**Dataset for men's wages:**

In [None]:
#Importing data
filename = 'wm.xlsx'
pd.read_excel(filename).head(7)

wm = pd.read_excel(filename, skiprows=2)


In [None]:
#Removing the first columns
drop_these = ['Unnamed: ' + str(num) for num in range(5)]
wm.drop(drop_these, axis=1, inplace=True)

In [None]:
wm.rename(columns = {'Unnamed: 5':'age_intervals'}, inplace=True)

#Renaming the columns
col_dict = {}
for i in range(2013, 2021+1): 
    col_dict[str(i)] = f'wm{i}' 
col_dict
wm.rename(columns = col_dict, inplace=True)

wm

In [None]:
#Changing from wide to long
wm_long = pd.wide_to_long(wm, stubnames='wm', i='age_intervals', j='year')
wm_long.head(10)

# Importing and cleaning the second dataset

In [None]:
#import data
filename = 'ww.xlsx'
pd.read_excel(filename).head(7)

ww = pd.read_excel(filename, skiprows=2)

In [None]:
#Removing the first columns
drop_these = ['Unnamed: ' + str(num) for num in range(5)] 
ww.drop(drop_these, axis=1, inplace=True)
print(drop_these)

In [None]:
ww.rename(columns = {'Unnamed: 5':'age_intervals'}, inplace=True)

# Renaming the columns
col_dict = {}
for i in range(2013, 2021+1): 
    col_dict[str(i)] = f'ww{i}' 
col_dict
ww.rename(columns = col_dict, inplace=True)


In [None]:
#Changing from wide to long
ww_long = pd.wide_to_long(ww, stubnames='ww', i='age_intervals', j='year')
ww_long.head(10)

**Data set for divorce rate**

In [None]:
#Importing data
filename = 'div.xlsx'
pd.read_excel(filename).head(7)

div = pd.read_excel(filename, skiprows=2)

div

In [None]:
div.rename(columns = {'Unnamed: 0':'div_rate'}, inplace=True)

#Renaming the columns
col_dict = {}
for i in range(2011, 2022+1): 
    col_dict[str(i)] = f'div{i}' 
col_dict
div.rename(columns = col_dict, inplace=True)

In [None]:
#Changing from wide to long
div_long = pd.wide_to_long(div, stubnames='div', i='div_rate', j='year')
div_long.head(10)

## Explore each data set

In order to analyse our data sets further, we create interactive plots for men's and women's wages, respectively, from 2013-2021. It is possible to select different age intervals.

In [None]:
#Resetting index
wm_long = wm_long.reset_index()
wm_long.loc[wm_long.age_intervals == 'Alder i alt', :]

**Interactive plot for men's wages**

In [None]:
# Defining a function to construct the interactive plot
def plot_m(df, age_intervals): 
    I = df['age_intervals'] == age_intervals
    ax=df.loc[I,:].plot(x='year', y='wm', style='-o', legend=False)
    ax.xaxis.set_ticks(np.arange(2013, 2022, 1))
    ax.set_ylabel('Wage in million DKK')
    ax.set_title('Interactive plot for different age groups for men\'s wage')

In [None]:
#Plotting men's wages
widgets.interact(plot_m, 
    df = widgets.fixed(wm_long),
    age_intervals = widgets.Dropdown(description='Age groups', 
                                    options=wm_long.age_intervals.unique(), 
                                    value='Alder i alt')
);

From the interactive plot above it is seen that most age groups follow an almost linear increasing trend. However, the age group "under 20 years" stagnate from 2015 to 2016, and it is also quite flat for the age group "60 or above" in the same period. 

**Interactive plot for women's wages**

In [None]:
#Resetting index
ww_long = ww_long.reset_index()
ww_long.loc[ww_long.age_intervals == 'Alder i alt', :]

In [None]:
#Defining a function to construct the interactive plot
def plot_w(df, age_intervals): 
    I = df['age_intervals'] == age_intervals
    ax=df.loc[I,:].plot(x='year', y='ww', style='-o', legend=False)
    ax.xaxis.set_ticks(np.arange(2013, 2022, 1))
    ax.set_ylabel('Wage in million DKK')
    ax.set_title('Interactive plot for different age groups for women\'s wage')

In [None]:
#Plotting women's wages
widgets.interact(plot_w, 
    df = widgets.fixed(ww_long),
    age_intervals = widgets.Dropdown(description='Age groups', 
                                    options=ww_long.age_intervals.unique(), 
                                    value='Alder i alt')
); 

It is seen from the figure above that the wage is increasing for all age groups. For the age group "under 20 years" there is a drop in the wages for women from 2015 to 2016.

# Merge data sets

Now we combine our loaded data sets, starting off with combining the two data sets for men's and women's wages.

In [None]:
#Merging the two wage data sets
mergedw = pd.merge(ww_long, wm_long, how='left', on=['year', 'age_intervals'])
mergedw.head(10)

In [None]:
#Merging the dataset for men and women's wages and divorce rates
mergeda = pd.merge(mergedw, div_long, how='left', on=['year'])
mergeda.head(7)

We can see that the years 2011, 2012 and 2022 have been dropped from the merged data set, as wage information for those years was not included in our data set.

# Analysis

We create a summary table for men's and women's wage across several age intervals, in order to look at the mean, std, min, max and the three fractiles (25%, 50% and 75%).

In [None]:
#Creating table over summary statistics
mergeda.groupby(['age_intervals'])['wm', 'ww'].describe().head(11)

The table presented reveals that, among men, the age group "under 20 years" has the lowest mean wage while the age group "45-49 years" has the highest. However, the maximum wage value for men is observed in the age group "50-54 years".

On the other hand, women have the same highest and lowest mean wages in the aforementioned age groups, but the age group "45-49 years" exhibits the highest maximum wage value.

Overall, men's mean wages exceed women's mean wages across all age groups except for individuals "under 20 years" old.

In [None]:
#Creating a figure
fig = plt.figure()
ax = fig.add_subplot(1,1,1)

#Grouping by years and choosing to look at the wages for men and women only
mergeda.groupby('year')['wm', 'ww'].mean().plot(ax=ax,style='-o')

#Adding labels and titles
ax.xaxis.set_ticks(np.arange(2013, 2022, 1))
ax.set_ylabel('Wage in mio. DKK')
ax.set_title('Wage development for men and women');

#Dublicating the figure to be able to make two y-axes
ax2=ax.twinx()

#Creating the plot
ax2.plot(mergeda['year'], mergeda["div"],color="brown",marker="o",label='Divorce Rate')
ax2.set_ylabel("Divorce rate in %")
ax2.legend(loc='center left', bbox_to_anchor=(1.15, 0.63))
ax.legend(loc='center left', bbox_to_anchor=(1.15, 0.75))


# Conclusion

Our analysis shows that men in general have a higher wage than women in Denmark. However, there doesn't seem to be a correlation between the wages and the divorce rate. We observe that in 2019, the divorce rate declined significantly. This is most likely due to a change in the divorce rules that were implemented in April 2019.