In [1]:
import numpy as np
import pandas as pd
import folium as fol
import matplotlib.pyplot as plt
import json
%matplotlib inline

# 1. European Unemployment Rates

The data comes from the Eurostat website. The selected dataset contains the annual unemployment rate for european countries from 1995 to 2016 and can be found at http://ec.europa.eu/eurostat/web/products-datasets/-/tipsun20.

In [2]:
""" Read Europe employment rates from the Eurostat website
    Url: http://ec.europa.eu/eurostat/tgm/table.do?tab=table&init=1&plugin=1&pcode=tipsun20&language=en
"""

europe_unemployment_file = "data/tipsun20.tsv"
europe_topo = json.load(open("topojson/europe.topojson.json"))


# Read from tsv file
eu_data = pd.read_csv(europe_unemployment_file, sep='\t', header=0, na_values=": ")

# Use age and country code as indexes (coming from 1st column)
new_cols = eu_data[eu_data.columns[0]].str.split(',', 3, expand=True)
eu_data['CC'] = new_cols[3]
eu_data.index = [new_cols[1], new_cols[3]]
eu_data.index.names = ["Age", "CC"]

# Drop the no-longer-needed column
eu_data.drop(eu_data.columns[0], axis=1, inplace=True)

def try_to_int(col_name):
    try:
        return int(col_name)
    except:
        return col_name
    
def try_to_float(cell):
    if cell == np.nan:
        return cell
    else: 
        try:
            return float(re.sub(r'[a-z ]+', "", cell))
        except:
            return cell

# Convert the years as ints
eu_data.columns = eu_data.columns.map(try_to_int)
years = [ c for c in eu_data.columns if isinstance(c, int) ]

# Remove the "i" (or "b") present in certain cells
eu_data[years] = eu_data[years].applymap(try_to_float)

On the next map, the opacity of each layer was set to 1 – the last layer set is the one currently displayed. This enables the possibility to compare easily the differences between the categories of age (<25, 25-75, everyone) made in our source dataset. We used a quantile classification to get a better feel of the repartition of the different rates. The quantile scale was taken from the category counting everyone so we can actually see the difference when looking at another category, ie. if we see a lighter shade when looking at the 25-75 category we know that this category suffers less from unemployment than the other (<25 in this case).

One use case where our map shines is the youth unemployment rate: we can see that by hidding the 25-75 layer and toggling the <25 layer on and off. We can clearly see that young people suffers a much higher unemployment rate in nearly every european country we have the data for.

In [3]:
prague_coordinates = [50.0595854, 14.3255415] # around the center of europe
m = fol.Map(
    location=prague_coordinates,
    tiles='Mapbox Bright',
    zoom_start=4
)

age_categories = [('Y_LT25', "people less than 25yo"), 
                 ('Y25-74', "people between 25 and 74yo"), 
                 ('TOTAL', 'everyone')]

for year in years[-1:]: # Only display last year, but could extend to multiple years
    year_data = eu_data[eu_data[year].notnull()] # Strip countries where we don't have data for this year
    qscale = year_data.loc["TOTAL"][year].quantile([0, 0.5, 0.75, 0.85, 0.9, 1]).tolist()
    
    for age, description in age_categories:
        
        m.choropleth(
            geo_data = europe_topo, topojson = 'objects.europe', data = year_data.loc[age],
            legend_name = 'Unemployment rate for {} in {}'.format(description, year),
            threshold_scale = qscale, name = "{}, {}".format(description, year), 
            key_on = 'feature.id', columns = ['CC', year],
            fill_color = 'YlOrRd', fill_opacity = 1, line_opacity = 1, line_color = 'black'
        )

fol.LayerControl().add_to(m)

m

We can see that the ~3% unemployment rate (unfortunately not in our dataset, but found here https://tradingeconomics.com/switzerland/unemployment-rate) from Switzerland is really low compared to the rest of Europe. But is it really that low ? We'll see in the next point it is not the end of the story!...

# 2. Swiss Job Seekers

We retreived from the Amstat website the latest unemployment statistics for september 2017. Since the Swiss unemployment rate count both people unemployed and people seeking for a new job, we rename "Unemployment rate" to "Jobseeker rate". We then compute the real "Unemployment rate" with a scaling rule. We made sure to keep the same scales for both maps to simplify comparisons of values.

We also extract from the topojson the cantons' abbreviation to have a common key between the `geo_data` and the `data`.

For display, we provide the option to select the scaling. Notice that both layers are selcted by defaults.

In [4]:
swiss_topo = json.load(open('topojson/ch-cantons.topojson.json'))
cantons_abbr = list(map(lambda c: c['id'], swiss_topo['objects']['cantons']['geometries']))

jobseeker = pd.read_excel('data/ch-unemployment-rate.xlsx', skiprows = 1)
jobseeker = jobseeker.drop(['Mesures', 'Taux de chômage.1', 'Coefficients de variation.1', 'Chômeurs inscrits.1', 'Demandeurs d\'emploi.1'], axis = 1)
jobseeker = jobseeker.drop(26) # Drop total
jobseeker.columns = ['Canton', 'Jobseeker rate', 'Variation coefficient', 'Unemployed', 'Jobseeker']
jobseeker['Canton'] = cantons_abbr
jobseeker['Unemployment rate'] = jobseeker['Unemployed'] * jobseeker['Jobseeker rate'] / jobseeker['Jobseeker']
jobseeker

Unnamed: 0,Canton,Jobseeker rate,Variation coefficient,Unemployed,Jobseeker,Unemployment rate
0,ZH,3.3,A,27225,34156,2.630358
1,BE,2.4,A,13658,18385,1.782932
2,LU,1.7,A,3885,6756,0.977575
3,UR,0.6,C,112,257,0.261479
4,SZ,1.7,A,1455,2229,1.10969
5,OW,0.7,B,153,319,0.335737
6,NW,1.0,B,248,436,0.568807
7,GL,1.8,B,416,713,1.05021
8,ZG,2.3,B,1543,2615,1.357132
9,FR,2.7,A,4466,7837,1.538624


In [5]:
swiss_map = fol.Map(location = [46.8, 8.33], tiles = 'Mapbox Bright', zoom_start = 7)

linscale = np.linspace(jobseeker['Jobseeker rate'].min(), jobseeker['Jobseeker rate'].max(), 6).tolist()
swiss_map.choropleth(
    geo_data = swiss_topo, topojson = "objects.cantons",
    legend_name = 'Jobseeker rate of september 2017 (linear)', name = 'linear scale',
    data = jobseeker, columns = ('Canton', 'Jobseeker rate'), key_on = 'feature.id',
    fill_color = 'YlOrRd', threshold_scale = linscale)

qscale = jobseeker['Jobseeker rate'].quantile([0, 0.5, 0.75, 0.85, 0.9, 0.99]).tolist()
swiss_map.choropleth(
    geo_data = swiss_topo, topojson = "objects.cantons",
    legend_name = 'Jobseeker rate of september 2017 (quantiles)', name = 'quantiles',
    data = jobseeker, columns = ('Canton', 'Jobseeker rate'), key_on = 'feature.id',
    fill_color = 'YlOrRd', threshold_scale = qscale)

fol.LayerControl(collapsed=False).add_to(swiss_map)

swiss_map

With this second map where we ignore people who are employed but seek a new job, we see that unemployment is way lower than announced! But the distribution of unemployment say the same between the different cantons.

In [6]:
swiss_map = fol.Map(location = [46.8, 8.33], tiles = 'Mapbox Bright', zoom_start = 7)
swiss_map.choropleth(
    geo_data = swiss_topo, topojson = "objects.cantons",
    legend_name = 'Unemployment rate of september 2017 (linear)', name = 'linear scale',
    data = jobseeker, columns = ('Canton', 'Unemployment rate'), key_on = 'feature.id',
    fill_color = 'YlOrRd', threshold_scale = linscale)

swiss_map.choropleth(
    geo_data = swiss_topo, topojson = "objects.cantons",
    legend_name = 'Unemployment rate of september 2017 (quantiles)', name = 'quantiles',
    data = jobseeker, columns = ('Canton', 'Unemployment rate'), key_on = 'feature.id',
    fill_color = 'YlOrRd', threshold_scale = qscale)

fol.LayerControl(collapsed=False).add_to(swiss_map)

swiss_map

# Swiss/Foreigner Unemployment

We selected the same statistics for september than previously and made sure to select "Nationalities" in "Other attributes". We then simply grouped the rows by nationality to plot the results. We again compute a scale to be used for both swiss and foreigner unemployment, enabling us to compare the data more easily. We also provide a second map where the scale use quantiles instead of a linear scale.

In [7]:
from itertools import repeat

unemployed = pd.read_excel('data/ch-foreigner-unemployment-rate.xlsx', skiprows = 1)
unemployed = unemployed.drop(['Mesures', 'Taux de chômage.1', 'Coefficients de variation.1', 'Chômeurs inscrits.1', 'Demandeurs d\'emploi.1'], axis = 1)
unemployed = unemployed.drop(26) # Drop total
unemployed.columns = ['Canton', 'Nationality', 'Jobseeker rate', 'Variation coefficient', 'Unemployed', 'Jobseeker']
unemployed['Canton'] = [c for canton in cantons_abbr for c in repeat(canton, 2)]
unemployed.set_value(52, 'Nationality', 'Etrangers')
unemployed

Unnamed: 0,Canton,Nationality,Jobseeker rate,Variation coefficient,Unemployed,Jobseeker
0,ZH,Etrangers,5.3,A,12111,15384
1,ZH,Suisses,2.5,A,15114,18772
2,BE,Etrangers,5.5,A,4900,6859
3,BE,Suisses,1.8,A,8758,11526
4,LU,Etrangers,3.9,B,1593,2902
5,LU,Suisses,1.3,A,2292,3854
6,UR,Etrangers,2.1,D,53,130
7,UR,Suisses,0.4,C,59,127
8,SZ,Etrangers,3.4,C,617,1027
9,SZ,Suisses,1.2,B,838,1202


## Swiss/Foreigner unemployment rate (linear scale)

In [8]:
swiss_map = fol.Map(location = [46.8, 8.33], tiles = 'Mapbox Bright', zoom_start = 7)

linscale = np.linspace(unemployed['Jobseeker rate'].min(), unemployed['Jobseeker rate'].max(), 6).tolist()
for group, data in unemployed.groupby('Nationality'):
    swiss_map.choropleth(
        geo_data = swiss_topo, topojson = "objects.cantons",
        legend_name = 'Unemployment rate of september 2017 ({})'.format(group), name = group,
        data = data, columns = ('Canton', 'Jobseeker rate'), key_on = 'feature.id',
        fill_color = 'YlOrRd', threshold_scale = linscale)
   
fol.LayerControl(collapsed=False).add_to(swiss_map)

swiss_map

## Swiss/Foreigner unemployment rate (quantile scale)

In [9]:
swiss_map = fol.Map(location = [46.8, 8.33], tiles = 'Mapbox Bright', zoom_start = 7)

qscale = unemployed['Jobseeker rate'].quantile([0, 0.5, 0.75, 0.85, 0.9, 0.99]).tolist()
for group, data in unemployed.groupby('Nationality'):
    swiss_map.choropleth(
        geo_data = swiss_topo, topojson = "objects.cantons",
        legend_name = 'Unemployment rate of september 2017 ({})'.format(group), name = group,
        data = data, columns = ('Canton', 'Jobseeker rate'), key_on = 'feature.id',
        fill_color = 'YlOrRd', threshold_scale = qscale)
   
fol.LayerControl(collapsed=False).add_to(swiss_map)

swiss_map

## Difference in unemployment

To understand better the difference of unemployment between swiss and foreigners, we actually plot it in the following map. We see that Valais and Grisons are the cantons where the foreign unemployment is the largest when comparing with the natives' unemployment. Probably because they're the most racist.

We see Genève and Tessin having high difference too, they are known for having too much people comming from the neighbours country and now prioritize swiss people over foreigner. This may be the reason we see so much difference.

In [10]:
groups = dict()
for group, data in unemployed.groupby('Nationality'):
    groups[group] = data
    
deltas = np.array(groups['Etrangers']['Jobseeker rate']) - np.array(groups['Suisses']['Jobseeker rate'])
deltas = pd.DataFrame({ 'Canton': cantons_abbr, 'Delta': deltas.tolist()})

swiss_map = fol.Map(location = [46.8, 8.33], tiles = 'Mapbox Bright', zoom_start = 7)
swiss_map.choropleth(
    geo_data = swiss_topo, topojson = "objects.cantons",
    legend_name = 'Difference between foreign and swiss unemployment',
    data = deltas, columns = ('Canton', 'Delta'), key_on = 'feature.id',
    fill_color = 'YlOrRd')
swiss_map

# Bonus

About the Röstigraben, using the quantile scaling of jobseeker's (the first graph) we clearly see that latin cantons have a higher jobseeker rate than the german ones. The latin cantons are with a rate averaged around 4.3% where the german ones are more around 2.7