## Task 3 - Swiss and Foreign workers statistics - discrepant or logical?

---
<div class="alert alert-block alert-info">
<b>Task</b>: 
<ol>
    <li>Use the <a href="https://www.amstat.ch">amstat</a> website again to find a dataset that includes the unemployment rates in Switzerland at recent date, this time making a distinction between <i>Swiss</i> and <i>foreign</i> workers.
    </li>
    <li>
The Economic Secretary (SECO) releases <a href="https://www.seco.admin.ch/seco/fr/home/Arbeit/Arbeitslosenversicherung/arbeitslosenzahlen.html">a monthly report</a> on the state of the employment market. In the latest report (September 2017), it is noted that there is a discrepancy between the unemployment rates for <i>foreign</i> (5.1%) and <i>Swiss</i> (2.2%) workers.
    </li>
    <li>
Show the difference in unemployment rates between the two categories in each canton on a Choropleth map (<i>hint</i> The easy way is to show two separate maps, but can you think of something better ?). Where are the differences most visible ? Why do you think that is ?
    </li>
    <li>Now let's refine the analysis by adding the differences between age groups. As you may have guessed it is nearly impossible to plot so many variables on a map. Make a bar plot, which is a better suited visualization tool for this type of multivariate data.
    </li>
</ol>
</div>

---

### Data retrieval and exploration

One of the main challenges of this task is finding appropriate data and navigating through the website. Since we have no knowledge of national languages, we have used the translation services. This may impact the quality of the data we could possibly obtain for several reasons:
* we might not understand the semantics of data
* we might misinterpret the explanations
* we might miss the data that is obviously present

In this part we will mostly use the latest data - for September 2017. We could opt to aggregate data to find mean values, but we think such aggregation over several months would not bring new information unless done specifically, for example to see the mean values for different seasons such as spring, summer, fall or winter. We will load a year worth of data and will try utilizing most of it in several approaches, such as with interactive selection of the month. Unfortunately, in practical terms this approach of interactivity through widgets proved hard to make portable which will be described in detail.

#### Navigating the Amstat website

For this task we collect the data on unemployment based on nationality with cantonal distribution. Raw data is available under *Details*, where needed data could be found, depending on the criteria.

![amstat landing page](images/amstat_landing.png)

As we need the unemployment rate, we navigate to the specific page, where we specify the criteria for selecting the data.

##### Unemployment rate per nationality 

We extracted the data for unemploymet rate *for each canton* with the aggregation criteria based on the *nationality*, where we are presented with the rates given separately for Swiss and Foreign nationals. 

We have exported the data as an **Excel** spreadsheet and we have specified the time span of 1 year: from October 2016 until September 2017. We will rely mostly on data from September 2017, but with well parametrized visualizations display for different month is only a change in a single parameter of a function. In this part we have opted for an Excel spreadsheet instead of CSV file for providing superior encoding support. Since Pandas supports both formats, we have opted for the one with easier preprocessing.

Data is available in: `data\Unemployment_Rate_Nationality-1year.xlsx`.

##### Unemployment rate per age

Similarly, we have exported the unemployment rate data for each canton based on the *age* criteria instead of nationality. The time span is similarly one year for possible further analytics without any need for data recollection. Age criterion selected aggregates the data based on 3 age intervals:
- 15-24 years
- 25-49 years
- 50+ years

Other option would be using the 5 year intervals, which do provide better granularity. For this analysis it might be more suitable to use such 3 general groups, roughly with meaning of *young*, *mid-age* and *senior* workers and providing more insights for such population groups. 

Data is available in: `data\Unemployment_Rate_Age-1year.xlsx`. 

##### Unemployment rate per age and nationality

In the previos two data collection tasks we have collected data which is disjoint in terms of criteria: **either age or nationality**. This is the limitation of the specific platform and page, since we could select only one:

![one criterion](images/amstat_one_criterion.png)

One of the solutions would be to infer such data from the total cantonal population or other datasets available on *amstat*. We have opted not to infer, since we do not have a complete understanding of the data and methodologies used. Indeed some descriptions exist, but when using translation services, meaning is commonly lost. Other possible issue would be using a wrong methodology or different data, such as external source for cantonal population, which may differ from the figures used in specific data provided. **It was an imperative to find the data on amstat!**

By navigating the website, a possible misconfiguration of page or security measures allows us to access the [server data](https://www.amstat.ch/MicroStrategy/servlet/mstrWeb). This way we could see the data and options both present on the website, but also options that are not present (or the ones we did not manage to find)! We will present our findings along with the screenshots for easier comprehension:

###### Server landing page

We believe this might be a misconfiguration of the website that we are allowed to see this (Test and Production server data). We choose to explore the Production data. Luckily the data is read only and mostly similar to content which is usable from the website.

![server 0](images/amstat_server.png)

###### Production data

Next we select the general search data.
![server 1](images/amstat_server1.png)

###### Detailed search

We proceed by going into the detailed search to try to find the option for getting the data with more general criteria.
![server 2](images/amstat_server2.png)

###### Monthly data on unemployment

We select the option to see the monthly data reports. There are other options explored, but this has yielded the final result.
![server 3](images/amstat_server3.png)

###### Monthly data on unemployment rate

Luckily, we can find the data on unemployment rate! We further explore the form, it differs from the previously seen form where we could choose only one criterion. We can set up our filter with deeper detail and granularity. 

![server 4](images/amstat_server4.png)

###### Profit!

The most important feature here is that we could select multiple criteria! To get the data we select both **age AND nationality**. We finally export the data as an Excel spreadsheet to use in the final part of the analysis.

![server 5](images/amstat_server5.png)

Raw data which aggregates unemployment rates per nationality and age is available at: `data\Unemployment_Rate-Age+Nationality.xlsx`.

---

### Final obtained data - short revision:

We have obtained following data from the *amstat* page:
* `data\Unemployment_Rate_Nationality-1year.xlsx`: unemployment rate, per canton, by nationality in the period Oct 2016-Sep 2017
* `data\Unemployment_Rate_Age-1year.xlsx`: unemployment rate, per canton, by age in the period Oct 2016-Sep 2017
* `data\Unemployment_Rate-Age+Nationality.xlsx`: unemployment rate, per canton, by nationality and age in the period Oct 2016-Sep 2017

Since this data is from the single source, using the same methodology we can conclude that we have not included any error by manual inference or calculation using external data or unverified methodologies.

---

### Visualizing the findings - let's map!

As we have obtained the data, an intuitive way to present the geographicaly split data, in this case cantonal data, is to use the map. The audience has a better understanding of the geographical and spatial distribution of data and we could make further conclusions based on geopolitical factors.

We will go through the visualization process step by step, explain the difficulties we have encountered as well as try to provide context for the actual data and its analysis.

In [1]:
import pandas as pd
import folium
import json
import seaborn as sns
import branca
import bokeh
import datetime

from bokeh.embed import file_html
from bokeh.resources import CDN
from bokeh.charts import Donut, show, output_file, Scatter, Bar
from bokeh.sampledata.olympics2014 import data
from IPython.display import IFrame, HTML

import vincent
vincent.core.initialize_notebook()

import matplotlib.pyplot as pl
%matplotlib inline

The bokeh.charts API has moved to a separate 'bkcharts' package.

This compatibility shim will remain until Bokeh 1.0 is released.
After that, if you want to use this API you will have to install
the bkcharts package explicitly.

  warn(message)


We will use *Pandas* for data manipulation; *Folium* and *Branca* for creating maps; *Seaborn*, *Bokeh*, *Vincent* and *Matplotlib* for making graphs. 

**To preserve compatability with different versions, especially with folium, we use bokeh.charts as is without additional installations, thus the warning message.**

We define the paths to the previously obtained files.

In [2]:
UNEMPLOYMENT_RATE_BY_NATIONALITY = 'data/Unemployment_Rate_Nationality-1year.xlsx'
UNEMPLOYMENT_RATE_BY_AGE = 'data/Unemployment_Rate_Age-1year.xlsx'
UNEMPLOYMENT_RATE_COMBINED = 'data/Unemployment_Rate-Age+Nationality.xlsx'

### Data processing

After obtaining the necessary files from *amstat*, we need to process them to a usable `DataFrame`. We make a generic function for processing data, in simple terms we always need to drop certain columns, rename some columns to more suitable name or set a new index

In [3]:
'''
This function is used to load and clean the .xlsx data on unemployment rates in a generalized manner.
Function returns the dataframe ready for use and analysis, with specified index and pruned data.

Parameters:
path -- path to the excel file containing the necessary data
cols_to_drop -- specify the columns to drop, a list of column names
rename_pair -- specify the key-value (dictionary) pair to rename the column
new_index -- specify how to index the data, a list of column names for the new index
drop_last -- boolean, specify whether to drop the last value, which is usually a 'Total' value and not cantonal value

Returns:
df -- processed dataframe
'''
def get_dataframe_rate(path, cols_to_drop, rename_pair, new_index, drop_last=True):
    df = pd.read_excel(path, convert_float=False)
    df.drop(cols_to_drop, axis=1, inplace=True) # drop the unnecesary data
    if drop_last: # drop the last row, usually representing the total
        df.drop([0,len(df)-1], axis=0, inplace=True) # drop total values, we need cantonal values only
    else:
        df.drop([0], axis=0, inplace=True) # no total values present, we drop only header text
    df.rename(columns=rename_pair, inplace=True)
    #df.set_index(new_index, inplace=True) # We opted not to set the multiindex, since for the map we would need to reset it
    
    return df

Using the previously defined generic function, we instantiate functions for processing each file to a `DataFrame` in a proper manner.

In [4]:
'''
This function is a shortcut to get the pruned dataframe for the unemployment rate by nationality.
'''
def get_dataframe_rate_nationality(path=UNEMPLOYMENT_RATE_BY_NATIONALITY):
    return get_dataframe_rate(path, ['Mois'], {"Nationalité":'Nationality'},['Canton', 'Nationality'])

'''
This function is a shortcut to get the pruned dataframe for the unemployment rate by age.
'''
def get_dataframe_rate_age(path=UNEMPLOYMENT_RATE_BY_AGE):
    return get_dataframe_rate(path, ['Mois', 'Unnamed: 2'], {"Classes d'âge 15-24, 25-49, 50 ans et plus":'Age category'}, ['Canton', 'Age category'])

'''
This function is a shortcut to get the pruned dataframe for the unemployment rate by composite data with age and nationality.
'''
def get_dataframe_rate_combined(path=UNEMPLOYMENT_RATE_COMBINED):
    return get_dataframe_rate(path, ['Unnamed: 3', 'Monat'], {"Altersklassen 15-24, 25-49, 50 und mehr":'Age category', 
                                                              "Nationalität":'Nationality', "Kanton":'Canton'},
                                       ['Canton', 'Nationality', 'Age category'], drop_last=False)

Finally we instantiate the necessary `DataFrames`:
* u_rate_nationality - unemployment rate per nationality (foreign or Swiss)
* u_rate_age - unemployment rate per age (15-24, 25-49, 50+)
* u_rate_combined - unemployment rate with both nationality and age distribution per such group

In [5]:
u_rate_nationality = get_dataframe_rate_nationality()
u_rate_age = get_dataframe_rate_age()
u_rate_combined = get_dataframe_rate_combined()

As we have obtained all the necessary data, we are ready to explore the geographical data through visualization on maps!

## Let us explore through visualization

Since data contains geographical distribution of data over cantons, the best way for visualizing such data is to place such data on a map. The position of each canton and the geo(political) factors such as neighboring different coutries or certain geographical features may provide worthy in the analysis.

#### GeoJSON vs TopoJSON
We are provided with TopoJSON files, which is a variant of GeoJSON. In GeoJSON each object to represent on map had separately listed coordinates which served as an input to the specified method of drawing (e.g. a line or polygon). TopoJSON utilizes the same paradigm, but it uses a compression strategy: instead of specifying coordinates for each object, we construct a dictionary of coordinates. This way we avoid repetition and reduce the total size of a GeoJSON file by accessing a single dictionary.

Nevertheless, *Folium* uses GeoJSON as a primary source for displaying data and certain preprocessing steps are necessary to achieve the same result!

We load the Swiss TopoJSON object, representing the outline of each canton:

In [6]:
ch_topo = json.load(open('topojson/ch-cantons.topojson.json'))

#### Unifying TopoJSON and statistical data identifier

Before we show data on the map, we need to have corresponding identifiers both in statistical data and in the TopoJSON data. One logical option was to use the canton abbreviations. Since in our data both German and French names occur, depending on the downloaded data (the file combining nationality and age was available only in German), we need to establish a mapping between such names and the canton code.

First, we extract the canton code (`id`) from the TopoJSON file:

In [7]:
canton_id = [canton['id'] for canton in ch_topo['objects']['cantons']['geometries']]

Next we proceed to make a dictionary based on the obtained canton code and the full canton name present in the statistical data. It is important to mention that the ordering of the cantons in statistical data and the TopoJSON file is the same intrinsically.

In [8]:
canton_id_name_fr = zip(canton_id, u_rate_nationality.reset_index()['Canton'].drop_duplicates())
canton_id_name_de = zip(canton_id, u_rate_combined.reset_index()['Canton'].drop_duplicates())

Finally we establish a dictionary suitable for replacing the values in column named *Canton* in the `DataFrame` with canton codes. We will use this dictionary in the `replace` function of each `DataFrame`.

In [9]:
cantons_pairs_fr = {'Canton':{pair[1]: pair[0] for pair in canton_id_name_fr}}
cantons_pairs_de = {'Canton':{pair[1]: pair[0] for pair in canton_id_name_de}}

For a sanity check, we manually inspect one such dictionary to see if data pairings are correct:

In [10]:
cantons_pairs_de

{'Canton': {'Aargau': 'AG',
  'Appenzell Ausserrhoden': 'AR',
  'Appenzell Innerrhoden': 'AI',
  'Basel-Landschaft': 'BL',
  'Basel-Stadt': 'BS',
  'Bern': 'BE',
  'Freiburg': 'FR',
  'Genf': 'GE',
  'Glarus': 'GL',
  'Graubünden': 'GR',
  'Jura': 'JU',
  'Luzern': 'LU',
  'Neuenburg': 'NE',
  'Nidwalden': 'NW',
  'Obwalden': 'OW',
  'Schaffhausen': 'SH',
  'Schwyz': 'SZ',
  'Solothurn': 'SO',
  'St. Gallen': 'SG',
  'Tessin': 'TI',
  'Thurgau': 'TG',
  'Uri': 'UR',
  'Waadt': 'VD',
  'Wallis': 'VS',
  'Zug': 'ZG',
  'Zürich': 'ZH'}}

We define a helper function to replace the full canton names with their code:

In [11]:
def replace_canton_with_id(dataframe, canton_pairs):
    replaced_df = dataframe.reset_index().replace(to_replace=canton_pairs)
    return replaced_df

By using the previously defined dictionary and function, we perform the final modification of the data and replace the mentioned values.

In [12]:
u_rate_age = replace_canton_with_id(u_rate_age, cantons_pairs_fr)
u_rate_nationality = replace_canton_with_id(u_rate_nationality, cantons_pairs_fr)
u_rate_combined = replace_canton_with_id(u_rate_combined, cantons_pairs_de)

### Let's map!

As mentioned several times before, we will use the *Folium* package, version *0.5.0*. Folium package is a Python wrapper around the very popular `JavaScript` *Leaflet* library for maps. Unfortunately, in different versions there is varying compatability support and the documentation is sometimes misleading, so additional care had to be taken when implementing more exotic aspects of visualization, as it will be shown later on.

For the start, we will show a map of Switzerland (*m_switzerland*), we set up the zoom level to the appropriate value and test the longtitute and lattitude values to see that Switzerland fits nicely in window. We decided on using `Mapbox Bright` overlay since it leaves only the most important geographical features, mainly because we will use **choropleth** overlay as the main feature later on.

In [13]:
m_switzerland = folium.Map([46.8,8.3], tiles='Mapbox Bright', zoom_start=8)
m_switzerland

### Displaying data on Swiss and foreigners unemployment rate

As mentioned, we could use two separate maps, but it would be more difficult to notice some differences with respect to the geographical distribution. We will use two separate layers, one for Swiss and one for foreign citizens and the respective unemployment rate. 

Since the scale of data differs, we have chosen a different color scheme for each for more intuitive difference display. The opacity has been set at 0.5 for both layers because we use different color schemas. Had we used the same fill color for both an appropriate thing to do would be to manually adjust the fill_opacity to suit the scale. This would be necessary to achieve a proper fill when both layers are displayed.

In [14]:
# we add a layer for the Swiss unemployment rate
m_switzerland.choropleth(geo_data=ch_topo,
                         name="Suisses", 
                         topojson='objects.cantons',
                         data = u_rate_nationality[u_rate_nationality.Nationality=='Suisses'],
                         columns = ['Canton','Septembre 2017'],
                         key_on='feature.id',
                         fill_color='YlGnBu', 
                         fill_opacity=0.5, 
                         line_opacity=0.4,
                         legend_name="Unemployment Rate (%) - Swiss")

# we add a layer for the foreigners unemployment rate
m_switzerland.choropleth(geo_data=ch_topo,
                         name="Etrangers", 
                         topojson='objects.cantons',
                         data = u_rate_nationality[u_rate_nationality.Nationality=='Etrangers'],
                         columns = ['Canton','Septembre 2017'],
                         key_on='feature.id',
                         fill_color='YlOrRd', 
                         fill_opacity=0.5, 
                         line_opacity=0.4,
                         legend_name="Unemployment Rate (%) - Foreigners")

# we add layer control to be able to control the display of the layers
folium.LayerControl().add_to(m_switzerland)
m_switzerland

#folium.Map.save(m_switzerland, "map-unemployment.html") # Optionally save the map for later display as html

We can notice the trend in higher unemployment in both foreign and swiss nationals in the western parts of the country, bordering France. Such rates are significantly lower in the easter parts of Switzerland, especially ones bordering both Italy and Austria.

Observed separately, the rates of foreign and swiss unemployment rates have similar trend over cantons, which is also visible when displaying both layers.

What raises a question is why are foreigners less employed. Some insight might be shown with the methodology how the percentage is calculated. 

>The unemployment rate is defined as the ratio of number of *unemployed, registered jobseekers* divided by the number of *total active population*.

When dividing this by foreign and swiss nationals, we see that there are more *unemployed, registered jobseekers* of foreign nationality. Albeit, the percentage is calculated with respect to the total population of foreingers. Therefore, if we have a lower population and more unemployed foreingers the rate will be higher. Considering the employment policy of Switzerland where Swiss workers are more protected and preferred than foreign workers, we can see the result of such policy. There might be other internal policies or explanations we are not aware of since we are not Swiss. 

The conclusion additionally supports the fact that we see a consistent corellation of unemployment by nationality over cantons.

#### [Failed] Attempt on interactive overview of data

Since we have displayed the data only for September 2017, we are interested in being able to display monthly data for different dates. There is limited support for sliders for *Folium*, mainly reserved for *Heatmaps*. We have tried implementing a slider widget.

In [15]:
import ipywidgets as widgets

In [16]:
# We add a month mapping between english abbreviations and month names in french
english2french_month = {
    'Oct':'Octobre',
    'Nov':'Novembre',
    'Dec':'Décembre',
    'Jan':'Janvier',
    'Feb':'Février',
    'Mar':'Mars',
    'Apr':'Avril',
    'May':'Mai',
    'Jun':'Juin',
    'Jul':'Juillet',
    'Aug':'Août',
    'Sep':'Septembre'}

In [17]:
'''
We define a function for showind the choropleth map of foreign and swiss unemployment rate, by passing the desired month and year.

date -- datetime in '%b-%Y' format (short month-full year, e.g. Sep-2017)
'''
def show_data(date):
    # we parse the given date
    month = date.strftime('%b')
    year = date.strftime('%Y')
    # and convert the date for display
    str_date = english2french_month[month]+' '+year
    
    # we construct a new map
    m_switzerland = folium.Map([46.8,8.3], tiles='Mapbox Bright', zoom_start=8)
    
    # we add the layers representing swiss and foreing unemployment rate
    m_switzerland.choropleth(geo_data=ch_topo, 
                         name="Suisses", 
                         topojson='objects.cantons',
                         data = u_rate_nationality[u_rate_nationality.Nationality=='Suisses'],
                         columns = ['Canton',str_date],
                         key_on='feature.id',
                         fill_color='YlGnBu', 
                         fill_opacity=0.5, 
                         line_opacity=0.2,
                         legend_name="Unemployment Rate (%) - Swiss"
                        )

    m_switzerland.choropleth(geo_data=ch_topo, 
                         name="Etrangers", 
                         topojson='objects.cantons',
                         data = u_rate_nationality[u_rate_nationality.Nationality=='Etrangers'],
                         columns = ['Canton',str_date],
                         key_on='feature.id',
                         fill_color='YlOrRd', 
                         fill_opacity=0.5, 
                         line_opacity=0.2,
                         legend_name="Unemployment Rate (%) - Foreign"
                        )


    folium.LayerControl().add_to(m_switzerland)
    
    # display the content on function call
    display(HTML('<h3>'+str_date+'</h3>'))
    display(m_switzerland)

In [18]:
# we define a wrapper for calling the show_data function from the widget
def f(x):
    show_data(x)

In [19]:
# we define a list of dates for wich we have data (October 2016 - September 2017)
dates = [datetime.date(2016,i,1) for i in range(10,13)]
dates += [datetime.date(2017,i,1) for i in range(1,10)]

In [20]:
# we convert the possible dates to string format suitable for the widget
options = [(i.strftime('%b-%Y'), i) for i in dates]

We opted to try using the selection slider to select the desired month.

In [21]:
w = widgets.SelectionSlider(
    options=options,
    description='Select month',
    disabled = False,
    readout = True,
    continuous_update=False
)

In [22]:
widgets.interact(f, x=w)

<function __main__.f>

**Unfortunately, interactive widgets do not display well in GitHub or nbviewer. Screenshot is provided to see the idea behind the functionality when the local notebook is running:** 

![interactive](images\interactive.png)

---

This way we would be able to browse through historical data to try to see different trends.


### Adding age data to the map

As mentioned earlier, we have managed to find the dataset containing the unemployment rate by nationality additionally split across 3 different age groups. Since this data is simply too much to show on a map, we will show the cantonal data as a popup.

To make a separate popup for each canton is not too difficult if using GeoJSON. For TopoJSON we need to devise a method to extract a single object (canton manually). TopoJSON [documentation](https://github.com/topojson/topojson/wiki/Introduction) has proven invaluable at this point.

In [23]:
import copy

# a function to extract TopoJSON feature for each canton. Returns a list of TopoJSON objects.
def create_canton_topos(originalTopoJSON):
    canton = []
    for geometry in originalTopoJSON["objects"]["cantons"]["geometries"]:
        tmp_topo = copy.deepcopy(originalTopoJSON)
        tmp_topo["objects"]["cantons"]["geometries"]=[geometry]
        
        canton.append(tmp_topo)
        
        
    return canton

We separate the TopoJSON into 26 TopoJSON descriptors for each canton:

In [24]:
canton_topos = create_canton_topos(ch_topo)

We define generic functions for creating `Vincent Vega` graphs for piecharts of *Age* and *Nationality*. Unfortunately, this data has not been used in the final run, instead a **grouped bar chart** has been implemented. The implementations stay since it would be easy to change the desired visualization by simply invoking one of the functions with desired canton and date.

In [26]:
# returns a dictionary object for displaying on Folium map - a pie chart of unemployment rate by age
def make_pie_age(canton, date):
    d = {}
    cnt = 0
    keys = ['15-24','25-49','50+']
    
    for el in u_rate_age[u_rate_age.Canton==canton][date]:
        d[keys[cnt]] = el
        cnt += 1
    
    pie = vincent.Pie(d, width=100, height=100)
    pie.legend('Age range')
    pie_json = pie.to_json()
    pie_dict = json.loads(pie_json)
    
    return pie_dict

# returns a dictionary object for displaying on Folium map - a pie chart of unemployment rate by nationality
def make_pie_nationality(canton, date):
    d = {}
    cnt = 0
    keys = ['Foreign','Swiss']
    
    for el in u_rate_nationality[u_rate_nationality.Canton==canton][date]:
        d[keys[cnt]] = el
        cnt += 1
        
    pie = vincent.Pie(d, width=100, height=100)
    pie.legend('Nationality')
    pie_json = pie.to_json()
    pie_dict = json.loads(pie_json)
    
    return pie_dict

Since there was not an easy way to display multiple `Vincent Vega` graphs in a `Popup`, we have decided to use a single **grouped barchart** to aggregate such data. The function is generic in sense of generating a graph for selected canton and date.

In [25]:
# A function which returns a Vincent object for displaying a grouped barchart, data is selected based on canton and date
def make_grouped_bar_combined(canton, date):
    foreigners = {}
    swiss = {}
        
    cnt = 0
    keys = ['15-24','25-49','50+']
    
    df_canton = u_rate_combined[u_rate_combined.Canton==canton].replace('...',-1)
    df_foreign = df_canton[df_canton.Nationality=='Ausländer']
    df_swiss = df_canton[df_canton.Nationality=='Schweizer']
  
    for el in df_foreign[date]:
        if(el!=-1):
            foreigners[keys[cnt]] = el
        cnt += 1
        
    cnt = 0
    for el in df_swiss[date]:
        if(el!=-1):
            swiss[keys[cnt]] = el
        cnt += 1
        
    data = [foreigners, swiss]
    index = ['Foreigners', 'Swiss']
    
    bar = vincent.GroupedBar(pd.DataFrame(data, index=index))
    bar.legend(title='Unemployment rate by nationality and age')
    bar.axis_titles(x='Nationality', y='Unemployment rate')
    bar.common_axis_properties(title_size=10)
    bar.width = 250
    bar.height = 200
    
    bar_json = bar.to_json()
    bar_dict = json.loads(bar_json)
    #bar.display()
    return bar_dict

#### Displaying a map with popup data

We now generate and display the data with more detailed insights of age distribution over different nationality unemployment rate visible as a popup. By clicking on each canton we see the cantonal data for September 2017. Since the generated map is too big for notebook to render as an object, we save the map to an HTML file, and then load it to display.

In [27]:
# we create an empty map
m_switzerland = folium.Map([46.8,8.3], tiles='Mapbox Bright', zoom_start=8)

# we iterate through every separate cantonal TopoJSON
for canton in canton_topos:
    
    name = canton["objects"]["cantons"]["geometries"][0]['id'];
    
    tj = folium.TopoJson(canton, 
               'objects.cantons',
               name=name)
    
    # we create a grouped barchart for the current canton
    v_bar_combined = folium.Vega(make_grouped_bar_combined(name,'September 2017'), width=500, height=250)
    
    # we create and add a popup for each canton
    popup = folium.Popup(max_width=500)
    popup.add_child(v_bar_combined)
    
    tj.add_child(popup)
    tj.add_to(m_switzerland)

In [28]:
# we save the map to an html file
folium.Map.save(m_switzerland, "map-vincent.html")

We load the map as an IFrame, since browsers can't display HTML page in HTML page otherwise, since when exporting a map a standalone HTML page is created. Do click on cantons to see the detailed distribution of age per canton!

#### Interactive map - click on the canton to see the age distribution and nationality 

When data is not available for certain age group, bar is represented as 0. It is the feature of the data we can't affect. By using popups we have managed to preserve the valuable geographical representation, while adding extra information without overwhelming the user.

In [29]:
IFrame(src="map-vincent.html",width=900,height=800)

### We are not satisfied yet!

We would still able to see 3 parameters in the popup:
* total age ratios as a piechart
* total nationality ratio as a piechart
* unemployment rate grouped by age and nationality as a barchart

We had to go deeper into darker realms of coding to manage this. We need to use bokeh to create the graphs. We merge all the graphs into one HTML and show them in a single popup.

In [30]:
# we define a generic function for creating a bokeh piechart for unemployment rate by nationality.
# canton and date is passed as a parameters, while bokeh Donut (piechart) is returned
def bokeh_pie_nationality(canton, date):
    df_canton = u_rate_nationality[u_rate_nationality.Canton==canton]
    
    d = Donut(df_canton, values=date, label=['Nationality'], text_font_size='12pt', hover_text='Unemployment by nationality',
             height = 220, width=220)
    
    d.toolbar.disabled = True
    d.toolbar.logo = None
    d.toolbar_location = None
    d.title.text = "Unemployment (%) by nationality"
    
    return d

In [31]:
# we define a generic function for creating a bokeh piechart for unemployment rate by age.
# canton and date is passed as a parameters, while bokeh Donut (piechart) is returned
def bokeh_pie_age(canton, date):
    df_canton = u_rate_age[u_rate_age.Canton==canton]
    
    d = Donut(df_canton.replace({'Age category': {1.0: '15-24', 2.0: '25-49', 3.0:'50+'}}), values=date, label=['Age category'], text_font_size='12pt', hover_text='Unemployment by age',
             height = 220, width=220)
    
    d.toolbar.disabled = True
    d.toolbar.logo = None
    d.toolbar_location = None
    d.title.text = "Unemployment (%) by age"
    
    return d

In [32]:
# we define a generic function for creating a bokeh grouped barchart for unemployment rate by age and nationality.
# canton and date is passed as a parameters, while bokeh Bar (barchart) is returned
def bokeh_bar_combined(canton, date):
    df_canton = u_rate_combined[u_rate_combined.Canton==canton].replace('...',0)
    
    d = bokeh.charts.Bar(df_canton.replace({'Age category': {1.0: '15-24', 2.0: '25-49', 3.0:'50+'}}), values=date,
                         label=['Age category'], group=['Nationality'], legend='top_right')
    
    d.toolbar.disabled = True
    d.toolbar.logo = None
    d.toolbar_location = None
    d.title.text = "Unemployment % by age category, nationality"
    d.yaxis.axis_label = "Unemployment rate [%]"
    d.axis.axis_label_text_font_size = '12pt'
    d.title.align = 'center'
    d.title.text_font_size = '12pt'
    d.xaxis.major_label_text_font_size = '12pt'
    d.height = 350
    d.width = 440
    
    return d
    

### The magic starts here!

Since the documentation and the compatability is very limited for this specific use-case, a lot of testing and errors have been made. Finally, we have managed to create a desired map!

In [34]:
# we create an empty map
m_switzerland = folium.Map([46.8,8.3], tiles='Mapbox Bright', zoom_start=8)

cnt = 0

# we iterate through each canton TopoJSON object to add a popup
for canton in canton_topos:
    
    cnt += 1
    
    name = canton["objects"]["cantons"]["geometries"][0]['id'];
    
    tj = folium.TopoJson(canton, 
               'objects.cantons',
               name=name)
    
    # we generate a bokeh piechart for age, and then extract the HTML data
    v_pie_age = bokeh_pie_nationality(name, 'Septembre 2017')
    html_age = file_html(v_pie_age, CDN, 'age'+name+str(cnt))
    
    # we generate a bokeh piechart for nationality, and then extract the HTML data
    v_pie_nationality = bokeh_pie_age(name, 'Septembre 2017')
    html_nationality = file_html(v_pie_nationality, CDN, 'nationality'+name+str(cnt))
    
    # we generate a bokeh barchart for combined criteria, and then extract the HTML data
    v_bar_grouped = bokeh_bar_combined(name, 'September 2017')
    html_grouped = file_html(v_bar_grouped, CDN, 'grouped'+name+str(cnt))
    
    # we make an IFrame of every such generated HTML, for future integration in the popup
    age = branca.element.IFrame(html=html_age, width=100, height=100)
    nationality = branca.element.IFrame(html=html_nationality, width=100, height=100)
    grouped = branca.element.IFrame(html=html_grouped, width=200, height=100)
    
    # finally, we make an HTML snippet which combines all the previously generated HTML (IFrame) files
    combined_html = '<p style="font-family: Verdana; text-align: center;"> Statistics for canton '+name+'</p>'\
    +'<figure style="width:90; max-width:90; max-height:90; float:left;">'+html_age+'</figure>'\
    +'<figure style="width:90; max-width:90; max-height:90; float:right;">'+html_nationality+'</figure>'\
    +'<figure style="width:95; max-width:95; max-height:95; float:left; padding-top:100;">'+html_grouped+'</figure>'
    
    # we take such HTML and integrate it again as an IFrame to put in the popup
    combined = branca.element.IFrame(html=combined_html, width=450, height=380)
    
    # we create a popup with desired HTML elements
    popup = folium.Popup(combined, max_width=450)
    
    tj.add_child(popup)
    tj.add_to(m_switzerland)

folium.Map.save(m_switzerland, "map-full.html")

Since the map object generated is simply too big because of additional HTML elements, we need to save it first, and then load it to the notebook. Otherwise Jupyter has trouble in rendering the python object because of the size.

**Do click on some canton to see the more detailed data**

In [35]:
IFrame(src="map-full.html",width=900, height=800)

Quite an effort has been done to make this kind of popup work! But it provides the most relevant information in a concise and intuitive manner!

#### References and manuals

If not mentioned in text, the complete list of used manuals and documentation is provideed here. Thanks to those resources the last map has been possible to create:

* https://python-visualization.github.io/folium/quickstart.html
* https://github.com/wrobstory/vincent
* https://altair-viz.github.io/
* http://nbviewer.jupyter.org/gist/BibMartin/4b9784461d2fa0d89353
* http://jeffpaine.github.io/geojson-topojson/
* http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/TimeSliderChoropleth.ipynb
* http://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Popups.ipynb
* http://nbviewer.jupyter.org/github/python-visualization/folium/tree/master/examples/
* http://bokeh.pydata.org/en/0.11.0/docs/user_guide/charts.html