## Prerequisites and Packages:
Firstly if you haven't read the readme, please do so as the interactive map we produce requires installing geopandas and various dependencies.
<br> NB: To view fig. 2 (interactive map), you'll have to run the code yourself, as the notebook is too large to upload, if the figure is displayed within this notebook. See README for instrutions or TLDR use: conda install -c conda-forge geopandas 

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import ipywidgets as widgets
import matplotlib.patches as mpatches
import json
#Here we import various packages which are needed to create an interactive figure. Inspiration from the internet. 
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
from bokeh.io import curdoc, output_notebook
from bokeh.models import HoverTool


## Prepare Data for Manipulation

### Income Data

Data is obtained from Danmarks Statistik (DST) ref. no. INDKP101.
We have extracted mean income for women and men in each municipality from 1987 to 2017:

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
filnavn = 'Indkomster i DK.xlsx'
Data = pd.read_excel(filnavn,skiprows=2)

In [4]:
#Fjerner de kolonner hvor der står unnamed, og som ikke skal bruges
drop_noname = ['Unnamed: 0', 'Unnamed: 1','Unnamed: 2',]
Data.drop(drop_noname, axis=1, inplace=True)

#Kalder variablen Unnamed: 3 for kommune i stedet for
Data.rename(columns ={'Unnamed: 3': 'Kommune'}, inplace=True)

#Ikke god praksis at have variable som er tal, så der bliver tilføjet et e foran
myDict = {}
for i in range(1987,2018):
    myDict[str(i)]=f'e{i}'
myDict

Data.rename(columns=myDict, inplace=True)

In [5]:
F = Data['Kommune'].str.contains('Landsdel') # Mark rows which register country parts rather than muncipalities

In [6]:
#Remove the rows for country parts
for val in ['Landsdel']:
    F  = Data.Kommune.str.contains(val)
    Data=Data.loc[F==False]

__Check that data is intact;__ same number of municipalities observed across year, that maxes and minimums are 'reasonable' etc.
We see 196 rows, which corresponds to 2 x 98 (no. of municipalities) -> first 98 rows for men next 98 for women.

In [7]:
Data.describe()

Unnamed: 0,e1987,e1988,e1989,e1990,e1991,e1992,e1993,e1994,e1995,e1996,...,e2008,e2009,e2010,e2011,e2012,e2013,e2014,e2015,e2016,e2017
count,196.0,196.0,196.0,196.0,196.0,196.0,196.0,196.0,196.0,196.0,...,196.0,196.0,196.0,196.0,196.0,196.0,196.0,196.0,196.0,196.0
mean,79186.132653,83324.857143,88139.153061,91767.734694,96274.780612,99317.969388,102630.091837,109878.209184,115144.423469,119700.285714,...,178403.056122,180825.163265,197023.47449,201273.173469,206485.362245,212083.714286,216271.69898,221923.392857,225162.040816,231627.591837
std,15177.657559,14644.357423,14578.458852,14828.282831,15229.730789,15042.608499,16169.309746,17309.010719,18941.375129,20319.552329,...,29731.18054,26441.288704,34417.453514,36754.715171,38541.912139,40846.314654,42923.935925,46414.247785,47409.158349,48995.639891
min,55691.0,60592.0,64299.0,70351.0,75354.0,78456.0,81182.0,86519.0,90238.0,93487.0,...,140593.0,145592.0,155976.0,156279.0,160736.0,164238.0,167574.0,169527.0,171198.0,173612.0
25%,65302.5,69905.75,74796.5,78216.0,82189.5,85563.0,88744.0,94368.75,98045.25,101839.0,...,157741.75,162900.75,173721.0,176610.75,180907.25,183945.25,187559.75,191413.25,195132.75,198566.5
50%,78219.5,82029.5,86546.0,90471.0,96309.0,98480.0,101863.0,109792.5,115177.5,119305.5,...,173163.0,174907.5,189593.0,195664.5,200470.0,206668.5,209265.5,214829.0,217113.5,224818.0
75%,86831.25,90360.25,96511.75,99681.0,104597.0,107969.5,110753.0,119772.0,125825.25,130800.75,...,189845.5,189887.75,207891.25,211729.0,217730.25,225407.0,227666.25,235305.25,238415.75,245649.75
max,145475.0,148305.0,144270.0,155342.0,159076.0,160987.0,190397.0,177585.0,203830.0,213937.0,...,337239.0,338866.0,396307.0,425414.0,439329.0,449012.0,467785.0,515346.0,519931.0,533813.0


In [8]:
Data_men = Data.iloc[0:98,:] # Split wage observations on men from dataset
Data_women = Data.iloc[98:199,:] # split wage observations on women from dataset
Data_men_reind=Data_men.set_index('Kommune') # Reindex by municipalites
Data_women_reind=Data_women.set_index('Kommune') # Reindex by municipalites
collist = (Data_men_reind.columns.values) # Store columnnames 
indexlist=(Data_men_reind.index.values) # Store indexing municipalities

Calculate the male wage premium in a municipality for a given year (% men earn more than woman on average):

In [9]:
dif = np.zeros((len(Data_men_reind),len(collist))) # Create an empty array
for y in range(len(collist)):
  for x in range(len(Data_men_reind)):
    dif[x,y]=((Data_men_reind.iloc[x,y]-Data_women_reind.iloc[x,y])/Data_women_reind.iloc[x,y])*100 # Calculate percentage wage premium for men
    
Data_dif=pd.DataFrame(data=dif, index=indexlist, columns=collist) ## Wide dataframe containing male wage premium across yrs/muncipality
Data_dif = Data_dif.reset_index()
Data_dif.rename(columns={'index':'Kommune'}, inplace=True)

### Municipal Election Data

Data is obtained from Danmarks Statistik (DST) ref. no. VALGK3.
We extracted the number of women and men who ran for municipal offices in every municipality for each party in the elections of 2005, 2009, 2013 and 2017.

In [10]:
filnavn = 'VALGK3.xlsx'
Data_valg = pd.read_excel(filnavn,skiprows=2)

In [11]:
Data_valg=Data_valg.rename(columns={"Unnamed: 0": "Køn", "Unnamed: 1": "Parti", "Unnamed: 2": "Kommune"})

In [12]:
Data_valg=Data_valg.fillna(method='ffill') # forward fills 
Data_vag=Data_valg.sort_values(by=['Kommune', 'Køn'], inplace=True)
Data_valg=Data_valg.groupby(['Kommune','Køn'], as_index=False).agg({"2005": "sum", "2009": "sum", "2013": "sum", "2017":"sum"}) # Generate groups by municipality & sex.


Christiansøindex = Data_valg[(Data_valg['Kommune']=="Christiansø")].index # Mark and remove the pseudo-municipality of Christiansø
Data_valg.drop(Christiansøindex, inplace=True)

 Get observations of men and women on same row and reduce:

In [13]:
Data_valg["2005_k"] = Data_valg.groupby("Kommune")["2005"].shift() 
Data_valg["2009_k"] = Data_valg.groupby("Kommune")["2009"].shift()
Data_valg["2013_k"] = Data_valg.groupby("Kommune")["2013"].shift()
Data_valg["2017_k"] = Data_valg.groupby("Kommune")["2017"].shift()
Naindex=Data_valg[Data_valg["2005_k"].isna()].index
Data_valg.drop(Naindex, inplace=True) # Reduces data to one row pr. municipality

Calculate how percentage of women running for office in a given municipality of all candidates:

In [14]:
Data_valg["2005_ratio"]=(Data_valg["2005_k"]/(Data_valg["2005"]+Data_valg["2005_k"]))*100 # Calculate ratios for each year pr. municipalities
Data_valg["2009_ratio"]=Data_valg["2009_k"]/(Data_valg["2009"]+Data_valg["2009_k"])*100
Data_valg["2013_ratio"]=Data_valg["2013_k"]/(Data_valg["2013"]+Data_valg["2013_k"])*100
Data_valg["2017_ratio"]=Data_valg["2017_k"]/(Data_valg["2017"]+Data_valg["2017_k"])*100

Data_valg=Data_valg.drop(columns=["Køn","2005","2009","2013","2017","2005_k","2009_k", "2013_k","2017_k"],axis=1)
Data_valg=Data_valg.rename(columns={"2005_ratio": "2005", "2009_ratio": "2009", "2013_ratio": "2013", "2017_ratio":"2017"})
myDict = {}
for i in range(2005,2018):
    myDict[str(i)]=f'e{i}'
myDict

Data_valg.rename(columns=myDict, inplace=True) # We promise not to rename columns again.. No really we won't!

### Geodata:
Geodata for municipality borders is available through Kortforsyningen (registration is needed).

In [15]:
shapefile = 'KOMMUNE.shp'
#Read shapefile using Geopandas. We need only the columns: ID, geometry, KOMNAVN
gdf = gpd.read_file(shapefile)[['FEAT_ID','KOMNAVN','geometry']]
#Rename columns. 
gdf.columns = ['ID', 'Kommune', 'geometry']

## Figures

### 1. Income across time in a chosen municipality. 

In [16]:
#Create function which determines which municipality to look at, and the corresponding income for men and women.
def plot_e(Data_men_tall, Data_women_tall, Kommune): 
    """Plots income of men and women across time for a municipality.
    Args:
    Data_men_tall: Dataframe of income in long format (for men in our example)
    Data_women_tall: -..- (for women in our example)
    Kommune: List of municipalities used drop down menu.
    
    Output:
    plot of income with a widget allowing user to choose municipality.
    """
    
    I = Data_men_tall['Kommune'] == Kommune
    I2 = Data_women_tall['Kommune']==Kommune
    ax=Data_men_tall.loc[I,:].plot(x='år', y='e', style='-', legend='True',label='Avg. Income Men')
    ax2 = Data_women_tall.loc[I2,:].plot(ax=ax,x='år',y='e',style='-',label='Avg. Income Women')
    
widgets.interact(plot_e, 
    Data_men_tall = widgets.fixed(Data_men_tall),Data_women_tall = widgets.fixed(Data_women_tall),
    Kommune = widgets.Dropdown(description='Kommune', options=Data_men_tall.Kommune.unique(), value='København')
); 


NameError: name 'Data_men_tall' is not defined

Note income is nominal: If one scrolls through different municipalities a two communalities are apparent: 
<br> 1) A wage drop is generally found aorund 2009, this coincides with the onset of the recession. As wages are nominal, we would expcet them to be increasing over time, it may still be, that real wages have decreased.
<br> 2) The gender wage gap is present in all years for all municipalities.

## 2. Map illustrating differences wages, (gender) pay gaps and representation across Danish municipalities in 2017

Change our 3 dataframes for absolute income, pay gap (%) and municipality board gender composistion to long format for 2017 only and merge:

In [None]:
##Absolute Income: 

# 1. split sexes and reindex
Data_men_tall = pd.wide_to_long(Data_men, stubnames='e', i='Kommune', j='år')
Data_men_tall = Data_men_tall.reset_index()
Data_women_tall = pd.wide_to_long(Data_women, stubnames='e',i='Kommune', j='år')
Data_women_tall = Data_women_tall.reset_index()
# 2.extract 2017 
Data_men_tall_2017 = Data_men_tall[Data_men_tall['år']==2017]
Data_women_tall_2017 = Data_women_tall[Data_women_tall['år']==2017]

# 3. Merge again
Indkomst = pd.merge(Data_men_tall_2017,Data_women_tall_2017,on='Kommune')
## Pay gap:
Data_dif_tall = pd.melt(Data_dif, id_vars=['Kommune'], var_name='år', value_name='Wage')
Data_dif_tall_2017 = Data_dif_tall[Data_dif_tall['år']=='e2017']
Data_dif_tall_2017 = Data_dif_tall_2017.reset_index()

## Candidates in municipal elections:
Data_valg_tall = pd.melt(Data_valg, id_vars=['Kommune'], var_name='år',value_name='andel')
Data_valg_tall_2017 = Data_valg_tall[Data_valg_tall['år']=='e2017']
Data_valg_tall_2017 = Data_valg_tall_2017.reset_index()

## Merging the above dataframes into one:
result = pd.merge(Data_dif_tall_2017,Data_valg_tall_2017,on='Kommune')
result = pd.merge(result,Indkomst, on='Kommune')


Plot this data on the map:

In [None]:
#Merging the result dataframe with the geopandas file.
merged = gdf.merge(result)
#Read data to json. We want to convert the merged file to a GEOJson file, since GEOJson can describe points, lines 
#and polygons with the bokeh package.
merged_json = json.loads(merged.to_json())
#Convert to String like object.
json_data = json.dumps(merged_json)

#Input GeoJSON source that contains features for plotting.
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
#Reverse color order so that dark blue is highest income.
palette = palette[::-1]
#Use LinearColorMapper that linearly maps numbers in a range, into a sequence of colors.
color_mapper = LinearColorMapper(palette = palette, low = 10, high = 30)
#Create color bar. 
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=4,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal')
color_bar.title="Male Wage Premium, %"
#Create hovertool, with the different variables and descriptions.
hover = HoverTool(tooltips = [ ('Municipality','@Kommune'),('Women running for city board, pct.', '@andel %'),
                              ('Wage premium for men, pct.','@Wage %'),('Income men','@e_x kr.'),('Income women','@e_y kr.')])

#Create the figure object.
p = figure(title = 'Income in Denmark 2017', plot_height = 730 , plot_width = 950, toolbar_location = None, tools=[hover])
#Might not need all of these commands
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

#Remove the geo-coordinates on the x and y.axis.
p.xaxis.major_label_text_color = None  
p.yaxis.major_label_text_color = None 
p.xaxis.major_label_text_font_size = '0pt'  
p.yaxis.major_label_text_font_size = '0pt'

#Add patch renderer to figure, and specify by which variable the map should be colored in relation to.
p.patches('xs','ys', source = geosource,fill_color = {'field' :'Wage', 'transform' : color_mapper},
          line_color = 'black', line_width = 0.25, fill_alpha = 1)
#Specify figure layout.
p.add_layout(color_bar, 'below')

#Display figure inline in Jupyter Notebook.
output_notebook()
#Display figure.
show(p)


The map shows larger wage gaps in South-Western Jutland and for a few select municipalities on Zealand. If we hover over these municipalities, the female representation in municipal boards vary considerably between them.


## 3. Scatterplot of municipal wage gap against municipal gender representation in 2017

A hypothesis could be, that women select into public sector jobs more readily, when gender pay gaps are large, as public sector job wages are regulated more intensely than private sector jobs, where employers can more easily influence (and potentially bias) wages.

In [None]:
#create scatterplot of the income differences against fraction of women running for municipal board

ax = result.plot.scatter(x='andel' , y='Wage',c='DarkBlue')
plt.title('Correlation between wage gap and women running for municipal board')
plt.xlabel('Share of women running for municipal board')
plt.ylabel('Wage gap')
plt.show()

There is no clear support for the hypothesis, that a higher wage gap correlates with female candidates for municipal boards.
This plot reveals significant outliers in wage gap, namely Gentofte 67%, Hørsholm 65.8%, and Rudersdal 57%. These are three small and very rich municipalities in Northern Zealand, that are most likely different in a number of ways. Possibly very rich men select into these municipalities, which may allow their spouses to reduce formal labor supply and instead produce in the household or in other informal markets (i.e. husbands firm, charity etc.)*.

*This is a prejudice with no factual support presented nor seeked out.