# 03 - Interactive Viz

In this homework we will be exploring interactive visualization, which is a key ingredient of many successful data visualizations (especially when it comes to infographics).

Please read the notebook from [this URL](http://nbviewer.jupyter.org/github/nunomota/ada2017-hw/blob/master/Homework_02/Homework_02.ipynb).

### Content:
* [Task 1](#t1)
* [Task 2](#t2)
* [Task 3](#t3)
* [Task 4](#t4)

# Imports

In [1]:
import os
import pandas as pd
import numpy as np
import json
import folium

# Constants definition

In [2]:
DATA_PATH = './data/'
TOPOJSON_PATH = './topojson/'
EUROSTAT_FILE = 'lfsq_urgaed_sheets.xls'
AMSTAT_FILE = '2_1 Tasso di disoccupazione.xlsx'
 
COORDINATE_EUROPE = [54.5260, 15.2551]
COORDINATE_SWITZERLAND = [46.8182, 8.2275]

# Task 1 <a name="t1">

From the eurostat website we found a dataset that includes the European unemployment rates at a recent date. We chose the dataset containing the quaterly unemployment rates, so that we could get the most recent data. We also made sure that the chosen dataset included the rates for Switzerland as well.

We downloaded a dataset containing both the IDs and the name for each country, since this allows us to compare the IDs from the dataset and the TopoJSON file and verify that all the countries will be matched correctly.

#### Read European data
Therefore, we first read the `.xls` file containing the unemployment rates per country. 

In [3]:
# Read Europe unemployment data
def read_europe_df(sheet):
    # Read xls file
    df = pd.read_excel('{dp}{f}'.format(dp=DATA_PATH, f=EUROSTAT_FILE),
        sheetname=sheet, skiprows=[12,13,14,15,16,17], header=10, na_values=':')

    # Rename country IDs and names columns
    df.rename(columns={'GEO': 'id', 'GEO(L)/TIME': 'name'}, inplace=True)

    # We need to replace the IDs for UK and Greece so that they match TopoJSON IDs
    return df.replace('UK', 'GB').replace('EL', 'GR')

europe_df = read_europe_df('Data5')

The following is a sample of our dataframe:

In [4]:
europe_df.head()

Unnamed: 0,id,name,2015Q1,2015Q2,2015Q3,2015Q4,2016Q1,2016Q2,2016Q3,2016Q4,2017Q1,2017Q2
0,BE,Belgium,8.8,8.4,8.2,8.7,8.3,7.9,7.9,7.2,7.7,7.0
1,BG,Bulgaria,10.6,9.9,8.3,7.9,8.6,8.1,7.0,6.7,6.9,6.3
2,CZ,Czech Republic,6.0,4.9,4.8,4.5,4.4,3.9,4.0,3.6,3.5,3.0
3,DK,Denmark,6.6,6.0,6.2,5.8,6.3,6.0,6.3,6.1,6.5,5.5
4,DE,Germany (until 1990 former territory of the FRG),5.0,4.7,4.4,4.5,4.5,4.2,4.0,3.8,4.1,3.8


#### Obtain recent unemployment rate

We will average the rates for the last 4 quarters, that is:
* 2016, Quarter 3
* 2016, Quarter 4
* 2017, Quarter 1
* 2017, Quarter 2

In this way we have the most recent information over a year, making sure that we avoid any possible seasonal trends on unemployment.

In [5]:
europe_df['recent'] = europe_df[['2016Q3', '2016Q4', '2017Q1', '2017Q2']].mean(axis=1)
recent_europe_df = europe_df[['id', 'recent']]
recent_europe_df.head()

Unnamed: 0,id,recent
0,BE,7.45
1,BG,6.725
2,CZ,3.525
3,DK,6.1
4,DE,3.925


#### Remove countries from TopoJSON file

In order to create the map of Europe, we will first read and modify the topoJSON file for Europe. We remove from it all the countries that we do not have information about, so that they are not plotted in the map.

In [6]:
# Load topojson file for Europe
europe_topo_path = r'{tp}europe.topojson.json'.format(tp=TOPOJSON_PATH)
geo_json = json.load(open(europe_topo_path))

# We keep only the countries we have information of
geo_countries = []
for c in geo_json['objects']['europe']['geometries']:
    if c['id'] in list(europe_df['id']):
        geo_countries.append(c)

# We substitute the list of countries
geo_json['objects']['europe']['geometries'] = geo_countries
geo_data = json.dumps(geo_json)

#### Create map

Lastly, we create the map for Europe.

TODO: choose and explain colors

TODO: Choose best background! I plotted all of them to compare them...

TODO: split the intervals into data classes and explain why those

TODO: Add markers including the name of the country and the actual rate? (*interactions you could add in order to make the visualization intuitive and expressive*) (Example of implementation in next cell, maybe put the markers in the capital city of the country?)

TODO: Compare Switzerland's unemployment rate to that of the rest of Europe (it's the main goal of this task)

TODO: Decide how to compute the values for the plot

* (Change and) Explain colors used [url given in assignment](https://carto.com/academy/courses/intermediate-design/choose-colors-1/)
* Export map
* ? Explain intervals considered... [url given in assignment](http://gisgeography.com/choropleth-maps-data-classification/)
* Highlight Switzerland somehow...

Suggestions:
* ? Add Interactivity (eg choose sex, age...)
* ? Set a color scale for countries with less unemployment than switzerland, other for more than Switerland

Useful URLs:
* [Many examples](http://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/) (very good)
    * [Markers](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/ContinuousWorld.ipynb), [Markers2](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/FeatureGroup.ipynb), [Marker cluster](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/MarkerCluster.ipynb), [Popups](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Popups.ipynb)
    * [LayerControl and others](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Features.ipynb)
    * [Timestamp (and others?)](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/HeatMapWithTime.ipynb), [Time slider](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/TimeSliderChoropleth.ipynb)
    * [Many plugins](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Plugins.ipynb)
    * [Tiles](http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/TilesExample.ipynb)
* [Colormap](https://nbviewer.jupyter.org/github/python-visualization/folium/blob/v0.2.0/examples/Colormaps.ipynb), [Colors](https://github.com/python-visualization/folium/blob/v0.2.0/folium/utilities.py#L104)
* [Folium doc](https://media.readthedocs.org/pdf/folium/latest/folium.pdf) (not very good)
* [Pretty example](http://andrewgaidus.com/leaflet_webmaps_python/)

In [7]:
# Create a new map
map_europe = folium.Map(COORDINATE_EUROPE, tiles=None, zoom_start=3)

# Add tile layers
folium.TileLayer('stamentoner').add_to(map_europe)
folium.TileLayer('OpenStreetMap').add_to(map_europe)
folium.TileLayer('stamenterrain').add_to(map_europe)
folium.TileLayer('cartodbdark_matter').add_to(map_europe)
folium.TileLayer('cartodbpositron').add_to(map_europe)

# Plot total unemployment
map_europe.choropleth(geo_data=json.loads(geo_data), name='Total Population',
    data=recent_europe_df, columns=['id', 'recent'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.europe')

# Add markers with information on unemployment per country
#feature_group = FeatureGroup(name='Some icons')
#Marker(location=[45.3288, -121.6625],
       #popup='Mt. Hood Meadows').add_to(feature_group)

#Marker(location=[45.3311, -121.7113],
       #popup='Timberline Lodge').add_to(feature_group)

#feature_group.add_to(m)

# Add LayerControl
folium.LayerControl(collapsed=False).add_to(map_europe)

map_europe

# Task 2 <a name="t2">
In the amstat website we found a dataset including the unemployment rates in Switzerland for the last 12 months.

#### Read data
We read the unemployment rates for the last 12 months. In this case we create a function that returns a dataframe containing the average (*Totale*) for those months, as well as a column with the IDs for each canton.

In [8]:
# Load topojson file for Europe
swiss_topo_path = r'{tp}ch-cantons.topojson.json'.format(tp=TOPOJSON_PATH)
geo_json = json.load(open(swiss_topo_path))

In [9]:
def get_clean_swiss_df(file, headers, dictionary, ids):
    # Read xls file containing Swiss unemployment data
    df = pd.read_excel('{dp}{f}'.format(dp=DATA_PATH, f=file), header=headers)
    df.drop(('Metriche', 'mese'), axis=1, inplace=True)

    # We will create a dataframe in English with the variables we need
    clean_df = pd.DataFrame()

    # For each variable, we get the 'Total' column
    variables_list = df.columns.get_level_values(0).unique()
    for v in variables_list:
        clean_df[italian_english_variables[v]] = df[(v, 'Totale')]

    clean_df.drop('Totale', inplace=True)

    # Add 'id' column to the dataframe
    clean_df['id'] = ids
    
    return clean_df

The IDs were directly taken from the TopoJSON file, since we manually verified that the cantons are in the same order in both the TopoJSON file and the data files we obtained from amstat. We translate the names of the variables to English using a manually defined dictionary

In [10]:
# Manually define dictionary with translations to English
italian_english_variables = {
    "Tasso di disoccupazione": 'Unemployment rate',
    "Disoccupati registrati": 'Unemployed',
    "Persone in cerca d'impiego": 'Looking for job',
    "Persone in cerca d'impiego non disoccupate": 'Looking for job not unemployed'
}

# Get the 'id' for each canton directly from the JSON file
cantons_id = []
for c in geo_json['objects']['cantons']['geometries']:
    cantons_id.append(c['id'])

switzerland_df = get_clean_swiss_df(file=AMSTAT_FILE, headers=[2,3],
    dictionary=italian_english_variables, ids=cantons_id)

The following is a sample of the resulting dataframe:

In [11]:
switzerland_df.head()

Unnamed: 0,Unemployment rate,Unemployed,Looking for job,Looking for job not unemployed,id
Zurigo,3.6,355658,440479,84821,ZH
Berna,2.7,181433,238627,57194,BE
Lucerna,1.9,51925,87769,35844,LU
Uri,1.1,2430,4314,1884,UR
Svitto,1.8,18765,28363,9598,SZ


#### Add data on people looking for a job
We are said that the Swiss Confederation defines the `Unemployment rate` as the number of people looking for a job divided by the size of the active population (scaled by 100). Therefore this value can be considered as the `Looking for job rate`. We will now include the rates for `Unemployed looking for job` rate, as well as `Not unemployed looking for job` rate (i.e. people who already have a job and are looking for a new one).

In [12]:
switzerland_df['Unemployed looking for job rate'] = (switzerland_df['Unemployed']
    * switzerland_df['Unemployment rate'] / switzerland_df['Looking for job'])

switzerland_df['Not unemployed looking for job rate'] = (switzerland_df['Looking for job not unemployed']
    * switzerland_df['Unemployment rate'] / switzerland_df['Looking for job'])

switzerland_df.rename(columns={italian_english_variables['Tasso di disoccupazione']: 'Looking for job rate'},
    inplace=True)

switzerland_df.head()

Unnamed: 0,Looking for job rate,Unemployed,Looking for job,Looking for job not unemployed,id,Unemployed looking for job rate,Not unemployed looking for job rate
Zurigo,3.6,355658,440479,84821,ZH,2.906765,0.693235
Berna,2.7,181433,238627,57194,BE,2.052865,0.647135
Lucerna,1.9,51925,87769,35844,LU,1.124059,0.775941
Uri,1.1,2430,4314,1884,UR,0.619611,0.480389
Svitto,1.8,18765,28363,9598,SZ,1.190882,0.609118


#### Create map

In [13]:
# Create a new map
map_swiss = folium.Map(COORDINATE_SWITZERLAND,
    tiles='cartodbpositron', zoom_start=8)

# Plot people looking for a job
map_swiss.choropleth(geo_data=geo_json, name='Looking for job',
    data=switzerland_df, columns=['id', 'Looking for job rate'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.cantons')

map_swiss.choropleth(geo_data=geo_json, name='Looking for job, unemployed',
    data=switzerland_df, columns=['id', 'Unemployed looking for job rate'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.cantons')

map_swiss.choropleth(geo_data=geo_json, name='Looking for job, not unemployed',
    data=switzerland_df, columns=['id', 'Not unemployed looking for job rate'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.cantons')

# Add markers with information on unemployment per canton
#feature_group = FeatureGroup(name='Some icons')
#Marker(location=[45.3288, -121.6625],
       #popup='Mt. Hood Meadows').add_to(feature_group)

#Marker(location=[45.3311, -121.7113],
       #popup='Timberline Lodge').add_to(feature_group)

#feature_group.add_to(m)

# Add LayerControl
folium.LayerControl(collapsed=False).add_to(map_swiss)

map_swiss

# Task 3 <a name="t3">
Use the amstat website again to find a dataset that includes the unemployment rates in Switzerland at recent date, this time making a distinction between Swiss and foreign workers.

The Economic Secretary (SECO) releases a monthly report on the state of the employment market. In the latest report (September 2017), it is noted that there is a discrepancy between the unemployment rates for foreign (5.1%) and Swiss (2.2%) workers.

Show the difference in unemployment rates between the two categories in each canton on a Choropleth map (hint The easy way is to show two separate maps, but can you think of something better ?). Where are the differences most visible ? Why do you think that is ?

#### Read data
We use the previously defined function to read the unemployment rates for Swiss and foreign workers.

In [14]:
switzerland_foreign_df = get_clean_swiss_df(file='foreign {f}'.format(f=AMSTAT_FILE), headers=[5,6],
    dictionary=italian_english_variables, ids=cantons_id)
switzerland_foreign_df.head()

Unnamed: 0,Unemployment rate,Unemployed,Looking for job,Looking for job not unemployed,id
Zurigo,5.9,162352,202705,40353,ZH
Berna,6.5,68923,92480,23557,BE
Lucerna,4.5,22072,38473,16401,LU
Uri,4.4,1324,2213,889,UR
Svitto,3.9,8427,13497,5070,SZ


In [15]:
switzerland_swiss_df = get_clean_swiss_df(file='swiss {f}'.format(f=AMSTAT_FILE), headers=[5,6],
    dictionary=italian_english_variables, ids=cantons_id)
switzerland_swiss_df.head()

Unnamed: 0,Unemployment rate,Unemployed,Looking for job,Looking for job not unemployed,id
Zurigo,2.7,193306,237774,44468,ZH
Berna,2.0,112510,146147,33637,BE
Lucerna,1.4,29853,49296,19443,LU
Uri,0.6,1106,2101,995,UR
Svitto,1.2,10338,14866,4528,SZ


#### Create map

In [16]:
# Create a new map
map_swiss = folium.Map(COORDINATE_SWITZERLAND,
    tiles='cartodbpositron', zoom_start=8)

# Plot people looking for a job
map_swiss.choropleth(geo_data=geo_json, name='Foreign',
    data=switzerland_foreign_df, columns=['id', 'Unemployment rate'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.cantons')

map_swiss.choropleth(geo_data=geo_json, name='Swiss',
    data=switzerland_swiss_df, columns=['id', 'Unemployment rate'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.cantons')

# Add markers with information on unemployment per canton
#feature_group = FeatureGroup(name='Some icons')
#Marker(location=[45.3288, -121.6625],
       #popup='Mt. Hood Meadows').add_to(feature_group)

#Marker(location=[45.3311, -121.7113],
       #popup='Timberline Lodge').add_to(feature_group)

#feature_group.add_to(m)

# Add LayerControl
folium.LayerControl(collapsed=False).add_to(map_swiss)

map_swiss

It is interesting to compare both in the same scale:

In [17]:
# Create common scale
minimum = min(min(switzerland_swiss_df['Unemployment rate']), min(switzerland_foreign_df['Unemployment rate']))
maximum = max(max(switzerland_swiss_df['Unemployment rate']), max(switzerland_foreign_df['Unemployment rate']))
common_scale = list(np.linspace(minimum, maximum, 6))

# Create a new map
map_swiss = folium.Map(COORDINATE_SWITZERLAND,
    tiles='cartodbpositron', zoom_start=8)

# Plot people looking for a job
map_swiss.choropleth(geo_data=geo_json, name='Foreign',
    data=switzerland_foreign_df, columns=['id', 'Unemployment rate'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.cantons', threshold_scale=common_scale)

map_swiss.choropleth(geo_data=geo_json, name='Swiss',
    data=switzerland_swiss_df, columns=['id', 'Unemployment rate'], key_on='feature.id', fill_opacity=1,
    fill_color='YlOrRd', topojson='objects.cantons', threshold_scale=common_scale)

# Add markers with information on unemployment per canton
#feature_group = FeatureGroup(name='Some icons')
#Marker(location=[45.3288, -121.6625],
       #popup='Mt. Hood Meadows').add_to(feature_group)

#Marker(location=[45.3311, -121.7113],
       #popup='Timberline Lodge').add_to(feature_group)

#feature_group.add_to(m)

# Add LayerControl
folium.LayerControl(collapsed=False).add_to(map_swiss)

map_swiss


Now let's refine the analysis by adding the differences between age groups. As you may have guessed it is nearly impossible to plot so many variables on a map. Make a bar plot, which is a better suited visualization tool for this type of multivariate data.

# Task 4 <a name="t4">

**BONUS**: using the map you have just built, and the geographical information contained in it, could you give a rough estimate of the difference in unemployment rates between the areas divided by the Röstigraben?