## Exercise 2

Go to the [amstat](https://www.amstat.ch) website to find a dataset that includes the unemployment rates in Switzerland at a recent date.

   > *HINT* Go to the `details` tab to find the raw data you need. If you do not speak French, German or Italian, think of using free translation services to navigate your way through. 

   Use this data to build another Choropleth map, this time showing the unemployment rate at the level of swiss cantons. Again, try to make the map as expressive as possible, and comment on the trends you observe.

   The Swiss Confederation defines the rates you have just plotted as the number of people looking for a job divided by the size of the active population (scaled by 100). This is surely a valid choice, but as we discussed one could argue for a different categorization.

   Copy the map you have just created, but this time don't count in your statistics people who already have a job and are looking for a new one. How do your observations change ? You can repeat this with different choices of categories to see how selecting different metrics can lead to different interpretations of the same data.


# Used libraries

In [2]:
import os
import pandas as pd
import folium
import numpy as np
import jenkspy

from IPython.display import display

# Understanding the data

In order to solve our task we have to build two Choropleth maps presenting two different interpretations of the unemployment rate:
* First map should express the rate of people registered as jobseekers for each canton

* Second map should exclude from the statistics the people that are already employed

Amstat provides the rate of unemployment defined as number of **chômeurs inscrits** divided by the **personnes actives**, as can be seen in the image below captured from the [definitions section of the amstat site](https://www.amstat.ch/v2/definition.jsp?lang=fr) (mouse hover for English translation - courtesy of Google Translate)

![](images/taux_de_chomage.png "Number of registered unemployed at the reference day (last day of the month) divided by the number of active persons, multiplied by 100. The number of active persons is recorded each year by the Federal Statistical Office as part of the Structural Survey ( census of the population). Since January 1, 2014, it has risen to 4'493'249 according to the three-year pooling of data collected in the framework of the 2012-2014 structural surveys. Active persons used by SECO also includes diplomats and international civil servants domiciled in Switzerland.")

Looking further into the definitions of **chômeurs inscrits**, we find that this represents the number of jobseekers that are unemployed. Again, with the courtesy of Google Translate, mouse over for english translation.

![](images/chomeurs_inscrits.png "Persons registered at regional employment agencies, who are unemployed and immediately available for placement. It does not matter whether they are receiving unemployment benefits or not.")


Consulting the definitions on the amstat, we find that **personnes actives** (active population) is consisted of the employed and unemployed individuals. Here we encounter our first limitation of the dataset as we do not know more details about the active population, for example the age range.

![](images/personnes_actives.png "Employed persons (at least one hour per week) or unemployed.The unemployment rate is calculated by taking the number of active persons as the denominator. Breakdown by regions, cantons, nationalities, age groups and by sex, the number of active persons influences various tables of SECO's labor market statistics. Exception: For the economic branches, the unemployment rate is not calculated on the basis of the number of active persons, but of the number of active persons employed.
Since 2010, the Federal Statistical Office (FSO) has been counting the number of active persons per year in the framework of the Structural Survey on the active life of the population.
The completion of an annual structural survey makes it possible to cumulate the results over a period of several years (pooling). The advantage of this way of proceeding is to have wider data
in the field of active persons. Since 1 January 2014, SECO has no longer calculated the unemployment rate on the basis of the number of active persons dating from 2010, but on the
of their number determined in the context of pooling over the period 2012 to 2014 based on data from the structural survey.
Using the sampling method allows a more regular adjustment of the denominator of the unemployment rate than the method previously used based on the population census
(exhaustive survey carried out every ten years). The number of active persons on which SECO is based also includes diplomats and international civil servants residing in Switzerland.
(Before 31 December 1999, the unemployment rate was calculated on the basis of the number of persons engaged in gainful employment of at least six hours per week, which is no longer available).")

We can then remark that the unemployment rate aggregates the data corresponding to the second map, meaning it takes into consideration only jobseekers that currently do not have job.

For the first map, amstat does not provide the necessary rates but we decide to compute them based on the unemployment rate provided and the number of jobseekers to which we have access.

Again, from the Definitions section, we can notice that **Demandeurs d'emploi** represent the number of jobseekers, regardless of whether they are employed or not. This is why, in the next section, we will check this indicator when exporting the data.

![](images/demandeus_d_emploi_inscrits.png "All job seekers, unemployed and non-unemployed, who are registered with regional employment agencies and are looking for work.") 

# Getting the data

As indicated in the statement, we head over to the [amstat (fr)](https://www.amstat.ch) to get the data. 

Accessing the Details section, we are presented with multiple options out of which the **Chomeurs et demandeurs d'emploi** (unemployed and jobseekers) is the one of interest here. After accesing the previously mentioned option, the section **Taux de chomage** (unemployment rate) allows us to get the data we want. Accessing it, we are presented with multiple indication for the export process. Among these indicators, the following are of interest to us:

> **Taux de chomage** (unemployment rate) computed as **Chomeurs inscrit** **/** **personnes actives** scaled by 100. This is the key statistic for the second map
>
> **Demandeurs d'employ** (jobseekers) - this is necessary for computing the unemployment rate for the first map
>
> **Chomeurs inscrit** (unemployed jobseekers) - necessary for computing the unemployment rate for the first map as described below

We choose to get the unemployment data only for **month september** as we want to focus on the current status of Switzerland. The limitations of this decision are that we don't have an yearly context to spot, for example, seasonal trends and so we cannot make general statements about the unemployment rate based only on this data, but only related to month september

There are multiple indicators when choosing what data to export and for the purposes of our task this is what the exported indicators look like:

![](images/indicators_1.png)

<a id='computing_data_first_map'></a>
## Computing data for the first map

By the definition, 

>taux de chomage = chomeurs inscrit / personnes actives * 100

For the second map, we want the following new unemployment rate

>new taux de chomage = demandeurs d'emploi / personnes actives * 100

which can be obtain from the first one with the following formulae:

>new taux de chomage = (demandeurs d'emploi)\*(taux de chomage) / (chomeurs inscrit)

# Working with the data


There are multiple things we can notice when glancing at the data:
* the first row containes additional information about some of the columns, therefore we will want to incorporate this information in the column name

* column Mois seems to have only NaN values so we will want to drop it as it bring no information 

In [3]:
data = pd.read_excel('2_1 Taux de chômage Septembre.xlsx')

In [4]:
data.head()

Unnamed: 0,Canton,Mois,Septembre 2017,Septembre 2017.1,Septembre 2017.2,Septembre 2017.3,Septembre 2017.4,Total,Total.1,Total.2,Total.3,Total.4
0,,Mesures,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
1,Zurich,,3.3,A,27225,34156,6931,3.3,A,27225,34156,6931
2,Berne,,2.4,A,13658,18385,4727,2.4,A,13658,18385,4727
3,Lucerne,,1.7,A,3885,6756,2871,1.7,A,3885,6756,2871
4,Uri,,0.6,C,112,257,145,0.6,C,112,257,145


## Inspecting the data

As presumed, the column *Mois*, excepting the first row that does not contain actual unemployment data, all of the values are NaNs so we decide to drop it

In [5]:
data[1:]['Mois'].any()

False

In [6]:
data = data.drop('Mois', axis=1)
data.head()

Unnamed: 0,Canton,Septembre 2017,Septembre 2017.1,Septembre 2017.2,Septembre 2017.3,Septembre 2017.4,Total,Total.1,Total.2,Total.3,Total.4
0,,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
1,Zurich,3.3,A,27225,34156,6931,3.3,A,27225,34156,6931
2,Berne,2.4,A,13658,18385,4727,2.4,A,13658,18385,4727
3,Lucerne,1.7,A,3885,6756,2871,1.7,A,3885,6756,2871
4,Uri,0.6,C,112,257,145,0.6,C,112,257,145


## Creating meaningful columns 

Having descriptional information on the first row (as exported by the amstat site) we plan on transforming the dataframe by creating hierarchical columns where the first level is the month and the second level is the aditional information from the first row.

For this purpose, we create a separate DataFrame with two columns: the month and the corresponding information. We will use this dataframe to create a MultiIndex which will be set as columns.

We use a regular expression to extract the month and the year from the column names or the word *Total* for the last columns.

**Note**:
A problematic case is the Canton column where the first row has no additional information. We solve this by setting this column as index, we then process the dataframe and afterwards we reset the index so that we have the Canton as column, needed for the Choropleth map.

In [10]:
canton_indexed_data = data.set_index('Canton')
canton_indexed_data.head()

Unnamed: 0_level_0,Septembre 2017,Septembre 2017.1,Septembre 2017.2,Septembre 2017.3,Septembre 2017.4,Total,Total.1,Total.2,Total.3,Total.4
Canton,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
Zurich,3.3,A,27225,34156,6931,3.3,A,27225,34156,6931
Berne,2.4,A,13658,18385,4727,2.4,A,13658,18385,4727
Lucerne,1.7,A,3885,6756,2871,1.7,A,3885,6756,2871
Uri,0.6,C,112,257,145,0.6,C,112,257,145


In [11]:
columns_info = canton_indexed_data.iloc[0].values
canton_indexed_data.drop(canton_indexed_data.index[0], inplace=True)

columns_months = canton_indexed_data.columns.str.extract(r'(.* \d+|Total)', expand=True).values.reshape(columns_info.shape)

canton_indexed_data.columns = pd.MultiIndex.from_arrays((columns_months.tolist(), columns_info.tolist()))

display(canton_indexed_data.head())
display(canton_indexed_data.tail())

Unnamed: 0_level_0,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Total,Total,Total,Total,Total
Unnamed: 0_level_1,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
Canton,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Zurich,3.3,A,27225,34156,6931,3.3,A,27225,34156,6931
Berne,2.4,A,13658,18385,4727,2.4,A,13658,18385,4727
Lucerne,1.7,A,3885,6756,2871,1.7,A,3885,6756,2871
Uri,0.6,C,112,257,145,0.6,C,112,257,145
Schwyz,1.7,A,1455,2229,774,1.7,A,1455,2229,774


Unnamed: 0_level_0,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Total,Total,Total,Total,Total
Unnamed: 0_level_1,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
Canton,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Valais,2.8,A,4816,8027,3211,2.8,A,4816,8027,3211
Neuchâtel,5.1,A,4738,6350,1612,5.1,A,4738,6350,1612
Genève,5.2,A,12234,15497,3263,5.2,A,12234,15497,3263
Jura,4.4,B,1619,2375,756,4.4,B,1619,2375,756
Total,3.0,A,133169,193624,60455,3.0,A,133169,193624,60455


We have the DataFrame columns properly formatted. We now reset the index to have the cantons as a column.

Additionally, we skip the last line in the dataframe as it aggregates the information about the cantons, information which we are not using in our choropleth map.

In [12]:
cantons_data = canton_indexed_data.reset_index()[:-1]
cantons_data.head()

Unnamed: 0_level_0,Canton,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Total,Total,Total,Total,Total
Unnamed: 0_level_1,Unnamed: 1_level_1,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
0,Zurich,3.3,A,27225,34156,6931,3.3,A,27225,34156,6931
1,Berne,2.4,A,13658,18385,4727,2.4,A,13658,18385,4727
2,Lucerne,1.7,A,3885,6756,2871,1.7,A,3885,6756,2871
3,Uri,0.6,C,112,257,145,0.6,C,112,257,145
4,Schwyz,1.7,A,1455,2229,774,1.7,A,1455,2229,774


## Data values types

By calling the info method on the loaded data, we notice that where we would expect the values to be numbers, they are actually objects. This means we have to parse them to numbers in order to use them with the Choropleth map. 

In [13]:
cantons_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 11 columns):
(Canton, )                                            26 non-null object
(Septembre 2017, Taux de chômage)                     26 non-null object
(Septembre 2017, Coefficients de variation)           26 non-null object
(Septembre 2017, Chômeurs inscrits)                   26 non-null object
(Septembre 2017, Demandeurs d'emploi)                 26 non-null object
(Septembre 2017, Demandeurs d'emploi non chômeurs)    26 non-null object
(Total, Taux de chômage)                              26 non-null object
(Total, Coefficients de variation)                    26 non-null object
(Total, Chômeurs inscrits)                            26 non-null object
(Total, Demandeurs d'emploi)                          26 non-null object
(Total, Demandeurs d'emploi non chômeurs)             26 non-null object
dtypes: object(11)
memory usage: 2.3+ KB


We convert just the values corresponding to month September as they are the ones we are working with.

In [23]:
cantons_data[('Septembre 2017', 'Taux de chômage')] = cantons_data[('Septembre 2017', 'Taux de chômage')].astype(float)
cantons_data[('Septembre 2017', 'Demandeurs d\'emploi')] = cantons_data[('Septembre 2017', 'Demandeurs d\'emploi')].astype(float)
cantons_data[('Septembre 2017', 'Chômeurs inscrits')] = cantons_data[('Septembre 2017', 'Chômeurs inscrits')].astype(float)

## Creating the Choropleth map

To create the Choropleth map we will use the choroplet method of the folium.Map class. One thing to notice here is that in the method's defintion the geo_data parameter expects a GeoJSON geometric defintion. Inspecting the documentation, we can see that the choropleth method accepts topojson data in this way:

>TopoJSONs can be passed as "geo_data", but the "topojson" keyword must also be passed with the reference to the topojson objects to convert.

Therefore, we create the json object from the topojson file in order to pass it as the geo_data parameter in the choropleth method.

In [24]:
import json

topo_path = os.path.join('topojson', 'ch-cantons.topojson.json')

with open(topo_path) as json_file:
    topo_data = json.load(json_file)

In order to bind data in the Choropleth map we must make sure that the canton names in the dataset match exactly the canton names in the TopoJson.

With this in mind, we create a dataframe to compare side-by-side the canton names from both sources in the following way:

* We extract the names from the TopoJson which are identified in the Json tree with the following path: objects -> cantons -> geometries -> properties -> name 

* From the amstat dataset, we extract the column containing the cantons' names

In [25]:
topo_cantons_names = []

for geometry in topo_data['objects']['cantons']['geometries']:
    topo_cantons_names.append(geometry['properties']['name'])

We have to plot the entire dataframe beneath in order to make sure the values in the two columns are properly aligned as we will want to use one of the two names lists.

In [26]:
pd.DataFrame(topo_cantons_names, cantons_data['Canton'])

Unnamed: 0_level_0,0
Canton,Unnamed: 1_level_1
Zurich,Zürich
Berne,Bern/Berne
Lucerne,Luzern
Uri,Uri
Schwyz,Schwyz
Obwald,Obwalden
Nidwald,Nidwalden
Glaris,Glarus
Zoug,Zug
Fribourg,Fribourg


With a little bit of help from Wikipedia, we remark that values in each row reffer to the same canton, but are in different languages. Knowing this we can change the values in the *Canton* column in the amstat dataset to be the ones extracted from the TopoJson.

We're doing this so that we have no matching problems when we bind the data to the Choropleth map.

In [27]:
cantons_data['Canton'] = topo_cantons_names

In [28]:
cantons_data.head()

Unnamed: 0_level_0,Canton,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Septembre 2017,Total,Total,Total,Total,Total
Unnamed: 0_level_1,Unnamed: 1_level_1,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs,Taux de chômage,Coefficients de variation,Chômeurs inscrits,Demandeurs d'emploi,Demandeurs d'emploi non chômeurs
0,Zürich,3.3,A,27225.0,34156.0,6931,3.3,A,27225,34156,6931
1,Bern/Berne,2.4,A,13658.0,18385.0,4727,2.4,A,13658,18385,4727
2,Luzern,1.7,A,3885.0,6756.0,2871,1.7,A,3885,6756,2871
3,Uri,0.6,C,112.0,257.0,145,0.6,C,112,257,145
4,Schwyz,1.7,A,1455.0,2229.0,774,1.7,A,1455,2229,774


## Infer the data for the first map

As explained in the section [Computing data for the first map](#computing_data_first_map), we will now create a new column with the an unemployment rate representing the jobseekers.

In [29]:
cantons_data[('Septembre 2017', 'Taux demandeurs d emploi')] = \
    cantons_data[('Septembre 2017', 'Demandeurs d\'emploi')] * \
    cantons_data[('Septembre 2017', 'Taux de chômage')] /\
    cantons_data[('Septembre 2017', 'Chômeurs inscrits')]

# Exploring visualization differences

We can argue that the unemployment rate can be expressed as the jobseekers divided by the active population. 

This might seem reasonable because the employed people that register as jobseekers might not be content with their current place of work and could quit or might expect to be fired sooner or later.

We now plot this rate in a choropleth map where the classes are chosen to be fixed in order to compare with the next visualizations.

In [30]:
swiss_coord = [46.8827, 8.2178]
legend_scale=[0,2,4.1,6,10]

In [31]:
def add_choropleth_layer(m, column, fill_color, layer_name,
                         legend_name, scale='quantile', nb_classes=3,
                         fixed_thresholds=None):
    '''
    Utility function that allows a easier manipulation of the threshold_scale for the choropleth layer.
    Parameters:
    m: folium map to which the layer is added
    column: specifies the column in the cantons_data that is binded to the map
    fill_color: passed to the choropleth method
    layer_name: passed to the choropleth method as the name of the layer
    legend_name: passed to the choropleth method
    scale: specifies the scale of the choropleth map. Options:
            - quantile: the data is classified in nb_classes of the same approximate size
            - jenks: the data is classified in nb_classes using the Natural Breaks (Jenks) classification
            - fixed: the fixed_thresholds parameter is used to specify the scale
    '''
    if scale == 'quantile':
        _, bins = pd.qcut(cantons_data[column], nb_classes, retbins=True)
        bins = list(bins)
    elif scale == 'jenks':
        bins = jenkspy.jenks_breaks(cantons_data[column], nb_classes)
    elif scale == 'fixed':
        if fixed_thresholds is not None:
            bins = fixed_thresholds
        else:
            raise Exception('Missing fixed thresholds in add choropleth layer')
            
    m.choropleth( 
        geo_data=topo_data,
        topojson='objects.cantons',
        data=cantons_data,
        name=layer_name,
        columns=['Canton',column],
        key_on='feature.properties.name',
        fill_color=fill_color,
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name=legend_name,
        threshold_scale=bins
    )

We would like to show on the map additional information about the cantons like the name of it, the actual value or some confidence measure. In order to do this, we make good use of popups and to be able to add them to the map we implement two helper functions:
* *create_canton_topos*: that returns a TopoJSON for each conton used to add the popup on click per canton
* *add_popups*: that handles the popup content and binding it to the canton and the map

In [32]:
import copy

def create_canton_topos(originalTopoJSON):
    canton = []
    for geometry in originalTopoJSON["objects"]["cantons"]["geometries"]:
        tmp_topo = copy.deepcopy(originalTopoJSON)
        tmp_topo["objects"]["cantons"]["geometries"]=[geometry]
        
        canton.append(tmp_topo)
              
    return canton

cantons_topo = create_canton_topos(topo_data)

In [33]:
import branca 

def add_range_coeff_var(coeff):
    '''
    Adds explicit values for the variation coefficients
    '''
    if coeff == 'A':
        return ' '.join([coeff, '(0 to 1%)'])
    if coeff == 'B':
        return ' '.join([coeff, '(1.1 to 2%)'])
    if coeff == 'C':
        return ' '.join([coeff, '(2.1 to 5%)'])
    if coeff == 'D':
        return ' '.join([coeff, '(5.1 to 10%)'])
    return coeff

def add_popups(m, rate_name, column):
    '''
    Adds popups to each canton for map m.
    rate_name specifies what is the unemployment rate
    column specifies the column whose values will be shown in the popup 
    '''
    for canton in cantons_topo:
        canton_name = canton["objects"]["cantons"]["geometries"][0]['properties']['name'];
        coeff_column = ('Septembre 2017','Coefficients de variation')
        canton_data = cantons_data[cantons_data['Canton'] == canton_name]
    
        html = '''
            <h3>{}</h3>
            <p> {}: {} </p>
            <p> Coefficient of variation: {}</p>
            
        '''.format(canton_name, 
                   rate_name,
                   canton_data[column].values[0], 
                   add_range_coeff_var(canton_data[coeff_column].values[0]))
        
        iframe = branca.element.IFrame(html=html, width=300, height=150)
        popup = folium.Popup(iframe, max_width=2650)
        
        tj = folium.TopoJson(canton, 
                   'objects.cantons',
                   name=canton_name)
        
        tj.add_child(popup)
        tj.add_to(m)
        

We can now easily create the map corresponding to the unemployment rate computed taking into consideration jobseekers, employed or unemployed.

In [34]:
m_jobseekers = folium.Map(
    location= swiss_coord,
    tiles='Mapbox Bright',
    zoom_start=8
)

add_choropleth_layer(m=m_jobseekers, 
                     column=('Septembre 2017', 'Taux demandeurs d emploi'),
                     fill_color='YlOrRd',
                     layer_name='jobseekers',
                     legend_name='% of jobseekers out of active population in month september',
                     scale='fixed',
                     fixed_thresholds=legend_scale
                    )
column = ('Septembre 2017', 'Taux demandeurs d emploi') 
add_popups(m_jobseekers, 
           'Jobseekers rate for month September',
           column)

m_jobseekers.save('swiss_jobseekers_september_2017.html')

In [35]:
from IPython.display import IFrame
IFrame(src="swiss_jobseekers_september_2017.html",width=900,height=800)

An interpretation just as valid but which could be considered a bit more rigurous and on point would be to take into consideration only the people that are unemployed. This would be more intuitive given that we're looking at the unemployment rate.

We proceed to plot this rate having the classes the same as in the previous plot.

In [36]:
m_unemployed = folium.Map(
    location= swiss_coord,
    tiles='Mapbox Bright',
    zoom_start=8
)

add_choropleth_layer(m=m_unemployed,
                     column=('Septembre 2017', 'Taux de chômage'),
                     fill_color='YlOrRd',
                     layer_name='unemployed jobseeker',
                     legend_name='% of unemployed jobseekers out of active population in month september',
                     scale='fixed',
                     fixed_thresholds=legend_scale
                    )


column = ('Septembre 2017', 'Taux de chômage') 
add_popups(m_unemployed, 
           'Unemployed jobseekers rate for month September',
           column)

m_unemployed.save('swiss_unemployed_jobseekers_september_2017.html')

In [37]:
from IPython.display import IFrame
IFrame(src="swiss_unemployed_jobseekers_september_2017.html",width=900,height=800)

## Importance of scale

For a better visualization we would want both of the two unemployment rates plotted on the same map. 

One way to show the differences would be to have the same scale, given that we're presenting the unemployment rate in both maps. A reason for this would be that the colors in the choropleth map correspond to the same classes for both visualizations, making it easy to see the differences in the way the unemployment rate is computed.

But there is a perception problem with a fixed scale for two different rates-computing methods: as one of them takes into consideration more people(not only unemployed jobseekers) but the denominator is the same, the values are shifted by a certain amount to one side of the scale given the impresion of multiple extreme values.

An important decision to be made here are the thresholds in the fixed scale as it drastically changes the message sent by the map. For this, we adopt the scale used in the amstat visualization [0,2,4.1], which we extend by adding two more classes: [0,2,4.1,6,10]

To better view this, we create two choropleth layers and we add a layer control to the map

In [40]:
m = folium.Map(
    location= swiss_coord,
    tiles='Mapbox Bright',
    zoom_start=8
)

add_choropleth_layer(m=m,
                     column=('Septembre 2017', 'Taux de chômage'),
                     fill_color='YlOrRd',
                     layer_name='unemployed jobseeker',
                     legend_name='% of unemployed jobseekers out of active population in month september',
                     scale='fixed',
                     fixed_thresholds=legend_scale
                    )

add_choropleth_layer(m=m, 
                     column=('Septembre 2017', 'Taux demandeurs d emploi'),
                     fill_color='YlOrRd',
                     layer_name='all jobseekers',
                     legend_name='% of jobseekers out of active population in month september',
                     scale='fixed',
                     fixed_thresholds=legend_scale
                    )

folium.LayerControl().add_to(m)

m.save('swiss_2_choropleths_fixed_scale.html')

In [41]:
from IPython.display import IFrame
IFrame(src="swiss_2_choropleths_fixed_scale.html",width=900,height=800)

We can observe the difference between the two ways of computing the unemployment rate by adding/removing the choropleth layer corresponding to each of them. Clearly, the message sent by the two visualization is different in this setup: the unemployment rate that takes into consideration the employed jobseekers has bigger values for the cantons which are mapped to more intense colors making the viewer have a stronger reaction.

Another way of plotting the results would be to keep the colors relative to the computing method of the unemployment rate. For this, we will use the Natural Breaks (Jenks) classification because it arranges each grouping so there is less variation. This aligns with the intuition that cantons colored the same are similar in regards to unemployment rate.

As we can see below, this does not change the visual impact by much between the two maps. The strong drawback is that the viewer would have to check the scale to realise that the colors correspond to different values, which is a not something one would normally expect.

In [42]:
m = folium.Map(
    location= swiss_coord,
    tiles='Mapbox Bright',
    zoom_start=8
)

add_choropleth_layer(m=m,
                     column=('Septembre 2017', 'Taux de chômage'),
                     fill_color='YlOrRd',
                     layer_name='unemployed jobseeker',
                     legend_name='% of unemployed jobseekers out of active population in month september',
                     scale='jenks',
                     nb_classes=3
                    )

add_choropleth_layer(m=m, 
                     column=('Septembre 2017', 'Taux demandeurs d emploi'),
                     fill_color='YlOrRd',
                     layer_name='jobseekers',
                     legend_name='% of jobseekers out of active population in month september',
                     scale='jenks',
                     nb_classes=3
                    )

folium.LayerControl().add_to(m)

m.save('swiss_2_choropleths_jenks_scale.html')

In [43]:
from IPython.display import IFrame
IFrame(src="swiss_2_choropleths_jenks_scale.html",width=900,height=800)

An interesting detail we're noticing when looking at a choropleth map in general is consisted by the elements at the extremes of the legend. 
In the above map, what happens when we take into consideration also the employed jobseekers, is that the canton of Geneva is no longer the one with the biggest unemployment rate, but rather Neuchatel, which puts them in two very different perspectives.

In conclusion, this is a clear example of how the visualization can influence the viewer's perspective towards some desired form by controling the amounts of details made available.