# 03 - Interactive Viz

## Assignment

1. Go to the eurostat website and try to find a dataset that includes the european unemployment rates at a recent date.

Use this data to build a Choropleth map which shows the unemployment rate in Europe at a country level. Think about the colors you use, how you decided to split the intervals into data classes or which interactions you could add in order to make the visualization intuitive and expressive. Compare Switzerland's unemployment rate to that of the rest of Europe.
- Go to the amstat website to find a dataset that includes the unemployment rates in Switzerland at a recent date.

HINT: Go to the details tab to find the raw data you need. If you do not speak French, German or Italian, think of using free translation services to navigate your way through.
Use this data to build another Choropleth map, this time showing the unemployment rate at the level of swiss cantons. Again, try to make the map as expressive as possible, and comment on the trends you observe.

The Swiss Confederation defines the rates you have just plotted as the number of people looking for a job divided by the size of the active population (scaled by 100). This is surely a valid choice, but as we discussed one could argue for a different categorization.

Copy the map you have just created, but this time don't count in your statistics people who already have a job and are looking for a new one. How do your observations change ? You can repeat this with different choices of categories to see how selecting different metrics can lead to different interpretations of the same data.

- Use the amstat website again to find a dataset that includes the unemployment rates in Switzerland at recent date, this time making a distinction between Swiss and foreign workers.

The Economic Secretary (SECO) releases a monthly report on the state of the employment market. In the latest report (September 2017), it is noted that there is a discrepancy between the unemployment rates for foreign (5.1%) and Swiss (2.2%) workers.

Show the difference in unemployment rates between the two categories in each canton on a Choropleth map (hint The easy way is to show two separate maps, but can you think of something better ?). Where are the differences most visible ? Why do you think that is ?

Now let's refine the analysis by adding the differences between age groups. As you may have guessed it is nearly impossible to plot so many variables on a map. Make a bar plot, which is a better suited visualization tool for this type of multivariate data.

- BONUS: using the map you have just built, and the geographical information contained in it, could you give a rough estimate of the difference in unemployment rates between the areas divided by the Röstigraben?

## 1. European Unemployment Rate
---
<div class="alert alert-block alert-info">
Go to the eurostat website and try to find a dataset that includes the european unemployment rates at a recent date. <br><br>

Use this data to build a Choropleth map which shows the unemployment rate in Europe at a country level. Think about the colors you use, how you decided to split the intervals into data classes or which interactions you could add in order to make the visualization intuitive and expressive. Compare Switzerland's unemployment rate to that of the rest of Europe.
</div>

First, we need to import the libraries used, which are the usual ones, such as *pandas* and *numpy*, but also *folium* for generating the required maps.

In [1]:
import pandas as pd
import numpy as np
import jenkspy
import branca
import folium
import os
import json
import copy
from IPython.display import IFrame
%matplotlib inline
import matplotlib.pyplot as plt

Finding the data on the *eurostat* website is straightforward, and also there are metadata files available, explaining exactly the definition of *employed* and *unemployed* persons, along with other terms used in the datasets. <br>

In the same time, having a lot of information available leads to the situation in which we might get confused and be really unsure what dataset to use. This is the case for the first exercise, where we have the following available branches under Employment and Unemployment domain <a href="http://ec.europa.eu/eurostat/cache/metadata/EN/employ_esms.htm">[source]</a>:
- 'LFS main indicators' consists of a selection of the most important monthly, quarterly and annual  labour market indicators, most of them based on EU-LFS.
- 'LFS series - detailed quarterly survey results' and 'LFS series - detailed annual survey results' is a more comprehensive selection of data from the EU-LFS.
- 'LFS series - specific topics' report annual regional data (NUTS III) and annual data on households (both households demographics and labour market results by household type).
- 'LFS series - adhoc modules' report annual results for some EU-LFS adhoc modules. Other adhoc module results are published in other domains where they better fit (e.g. in education statistics or health statistics).
---

Looking more carefully at the four branches and the data presented inside, we realize that actually only the first two branches could be helpful in our case. The decision between the two of them might be tough, but we decided to continue with the data in the *LFS main indicators* branch. The main reason for that decision is because in the branch *LFS series - detailed quarterly survey results*, the data presented is the raw data collected every three months from surveys conducted in each country. In the same time, data in the branch *LFS main indicators* is computed from the data in the first branch, but also adjusted and enriched, as presented <a href="http://ec.europa.eu/eurostat/cache/metadata/de/une_esms.htm#stat_pres1496733880223">here</a>. Therefore, we consider that the data in the first branch is selected more carefully and might be more close to reality than in the second branch. We downloaded the dataset representing the unemployment rates by sex and age and import it in a pandas DataFrame below:

In [2]:
UNEMPLOYMENT_MONTHLY_FILE = os.path.join('.', 'data', 'une_rt_m.tsv')
UNEMPLOYMENT_QUARTERLY_FILE = os.path.join('.', 'data', 'une_rt_q.tsv')
UNEMPLOYMENT_YEARLY_FILE = os.path.join('.', 'data', 'une_rt_a.tsv')


unemployed = pd.read_csv(UNEMPLOYMENT_MONTHLY_FILE, sep='\t')
unemployed.head()

Unnamed: 0,"s_adj,age,unit,sex,geo\time",2017M09,2017M08,2017M07,2017M06,2017M05,2017M04,2017M03,2017M02,2017M01,...,1983M10,1983M09,1983M08,1983M07,1983M06,1983M05,1983M04,1983M03,1983M02,1983M01
0,"NSA,TOTAL,PC_ACT,F,AT",5.1,5.2,4.4,4.9,5.1,4.7,5.4,4.9,5.3,...,:,:,:,:,:,:,:,:,:,:
1,"NSA,TOTAL,PC_ACT,F,BE",7.2,7.7,7.6,6.8,6.9,7.4,8.0,8.2,7.8 b,...,:,:,:,:,:,:,:,:,:,:
2,"NSA,TOTAL,PC_ACT,F,BG",5.3,5.3,5.3,5.3,5.6,6.2,6.8,7.1,7.0,...,:,:,:,:,:,:,:,:,:,:
3,"NSA,TOTAL,PC_ACT,F,CY",10.7,12.2,12.4,11.4,10.7,11.4,13.1,13.9,14.0,...,:,:,:,:,:,:,:,:,:,:
4,"NSA,TOTAL,PC_ACT,F,CZ",3.5,3.5,3.4,3.5,3.5,4.2,4.4,4.3,4.2,...,:,:,:,:,:,:,:,:,:,:


Looking at the resulted data frame, we can see that it contains monthly data available since January 1983, with each month represented as a column in the DataFrame. Because the exercise explicitly requires the unemployment rates at a recent date, we decided to keep only the last 12 months in the DataFrame, as below:

In [3]:
columns_to_drop = list(range(13, len(unemployed.columns))) 
# drop all the columns with index bigger or equal to 13, i.e. keep the first 13 columns
unemployed = unemployed.drop(unemployed.columns[columns_to_drop], axis=1)

In [4]:
unemployed.head()

Unnamed: 0,"s_adj,age,unit,sex,geo\time",2017M09,2017M08,2017M07,2017M06,2017M05,2017M04,2017M03,2017M02,2017M01,2016M12,2016M11,2016M10
0,"NSA,TOTAL,PC_ACT,F,AT",5.1,5.2,4.4,4.9,5.1,4.7,5.4,4.9,5.3,4.9,5.5,5.4
1,"NSA,TOTAL,PC_ACT,F,BE",7.2,7.7,7.6,6.8,6.9,7.4,8.0,8.2,7.8 b,6.8,6.5,6.6
2,"NSA,TOTAL,PC_ACT,F,BG",5.3,5.3,5.3,5.3,5.6,6.2,6.8,7.1,7.0,6.6,6.5,6.4
3,"NSA,TOTAL,PC_ACT,F,CY",10.7,12.2,12.4,11.4,10.7,11.4,13.1,13.9,14.0,14.2,14.4,12.3
4,"NSA,TOTAL,PC_ACT,F,CZ",3.5,3.5,3.4,3.5,3.5,4.2,4.4,4.3,4.2,4.2,4.2,4.3


Furthermore, if we look at the first column, we realize it is composed of multiple fields, which are described below:

1. s_adj: seasonal adjustment, which has the following possible values:
    * NSA = Unadjusted data (i.e. neither seasonally adjusted nor calendar adjusted data).
    * SA = Seasonally adjusted data, not calendar adjusted data.
    * TC = Trend cycle data.
    
2. age: age of the people in the category, which has the following possible values:
    * TOTAL = Everyone is included in the category.
    * Y_LT25 = Less than 25 years.
    * Y25-74 = From 25 to 74 years.
    
3. unit: unit of measure, which has the following possible values:
    * THS_PER = Thousand persons
    * PC_ACT = 	Percentage of active population
 
4. sex: sex of the people in the category, which has the following possible values:
    * T = Total (everyone is included)
    * M = Males
    * F = Females
    
5. geo: The country or the area in the category. Has as possible values the country codes for the european countries, as well as for European Union or Euro Area. 
---

As specified before, we are interested only in the seasonally adjusted data, without taking into consideration differences in age and sex. Also, we want the unemployment *rate* and not the total number of unemployed people. Thus, only rows for which the first column value starts with the string *SA,TOTAL,PC_ACT,T,* are taken into consideration. Also, from the first column we will keep only the country code, because all the other information is the same for all rows. The table after the specified modifications is constructed below:

In [5]:
select_rows = unemployed["s_adj,age,unit,sex,geo\\time"].map(lambda x: x.startswith("SA,TOTAL,PC_ACT,T,"))
unemployed = unemployed[select_rows] # select only specified rows

unemployed.rename(columns={"s_adj,age,unit,sex,geo\\time" : "country_code"}, inplace=True) # rename first column
unemployed["country_code"] = unemployed["country_code"].map(lambda x: x[18:]) 
# delete the first 18 characters of the first column, i.e. keep only the country code

unemployed = unemployed.reset_index(drop=True) # reset index from 0

In [6]:
unemployed.head(10)

Unnamed: 0,country_code,2017M09,2017M08,2017M07,2017M06,2017M05,2017M04,2017M03,2017M02,2017M01,2016M12,2016M11,2016M10
0,AT,5.6,5.5,5.4,5.3,5.4,5.6,5.7,5.8,5.7,5.7,5.8,5.9
1,BE,7.1,7.3,7.3,7.2,7.3,7.4,7.6,7.7,7.6 b,7.2,7.2,7.2
2,BG,6.1,6.2,6.1,6.1,6.1,6.2,6.4,6.6,6.6,6.7,6.8,7.0
3,CY,10.3,10.6,10.7,10.9,11.3,11.6,12.1,12.5,12.7,13.2,13.4,13.1
4,CZ,2.7,2.8,2.8,2.9,3.0,3.3,3.2,3.3,3.3,3.5,3.7,3.7
5,DE,3.6,3.6,3.7,3.7,3.8,3.8,3.9,3.9,3.9,3.9,3.9,4.0
6,DK,5.7,5.7,5.8,5.7,5.7,5.7,5.9,6.1,6.1,6.1,6.3,6.5
7,EA,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.6,9.7,9.8
8,EA18,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.7,9.8,9.8
9,EA19,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.6,9.7,9.8


Let us look at the data types of each column:

In [7]:
unemployed.dtypes

country_code    object
2017M09         object
2017M08         object
2017M07         object
2017M06         object
2017M05         object
2017M04         object
2017M03         object
2017M02         object
2017M01         object
2016M12         object
2016M11         object
2016M10         object
dtype: object

We want to transform the data to numeric in all the columns except country_code, after replacing the missing data, marked with ":", with NaN value. Also, we change the value of unemployment rate during January 2017 in Belgium from *7.6 b* to *7.6*.

In [8]:
unemployed = unemployed.applymap(lambda x: np.NaN if x == ": " else x) # replace : with pd.NaN
unemployed.set_value(1, '2017M01 ', 7.6)

for column in unemployed.columns[1:]:
    unemployed[column] = unemployed[column].map(pd.to_numeric)

In [9]:
unemployed.head(10)

Unnamed: 0,country_code,2017M09,2017M08,2017M07,2017M06,2017M05,2017M04,2017M03,2017M02,2017M01,2016M12,2016M11,2016M10
0,AT,5.6,5.5,5.4,5.3,5.4,5.6,5.7,5.8,5.7,5.7,5.8,5.9
1,BE,7.1,7.3,7.3,7.2,7.3,7.4,7.6,7.7,7.6,7.2,7.2,7.2
2,BG,6.1,6.2,6.1,6.1,6.1,6.2,6.4,6.6,6.6,6.7,6.8,7.0
3,CY,10.3,10.6,10.7,10.9,11.3,11.6,12.1,12.5,12.7,13.2,13.4,13.1
4,CZ,2.7,2.8,2.8,2.9,3.0,3.3,3.2,3.3,3.3,3.5,3.7,3.7
5,DE,3.6,3.6,3.7,3.7,3.8,3.8,3.9,3.9,3.9,3.9,3.9,4.0
6,DK,5.7,5.7,5.8,5.7,5.7,5.7,5.9,6.1,6.1,6.1,6.3,6.5
7,EA,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.6,9.7,9.8
8,EA18,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.7,9.8,9.8
9,EA19,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.6,9.7,9.8


Also, we will set the *country_code* as index, and moreover will transform columns names in integers 1-12, corresponding to the month index:

In [10]:
unemployed = unemployed.set_index('country_code')

renamed_columns_dict = {old : int(old[5:-1]) for old in unemployed.columns}
unemployed = unemployed.rename(columns=renamed_columns_dict)

In [11]:
unemployed.head()

Unnamed: 0_level_0,9,8,7,6,5,4,3,2,1,12,11,10
country_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
AT,5.6,5.5,5.4,5.3,5.4,5.6,5.7,5.8,5.7,5.7,5.8,5.9
BE,7.1,7.3,7.3,7.2,7.3,7.4,7.6,7.7,7.6,7.2,7.2,7.2
BG,6.1,6.2,6.1,6.1,6.1,6.2,6.4,6.6,6.6,6.7,6.8,7.0
CY,10.3,10.6,10.7,10.9,11.3,11.6,12.1,12.5,12.7,13.2,13.4,13.1
CZ,2.7,2.8,2.8,2.9,3.0,3.3,3.2,3.3,3.3,3.5,3.7,3.7


We must also explain what these number represent. As presented in the metadata quoted above, the table is populated with the monthly unemployment rate, which is calculated as the number of people unemployed as a percentage of the labour force. The *labour force* is the total number of people employed and unemployed.

We must also define the concepts of employed and unemployed persons below:

* **Unemployed persons** are all persons 15 to 74 years of age (16 to 74 years in ES, IT and the UK) who were not employed during the reference week, had actively sought work during the past four weeks and were ready to begin working immediately or within two weeks. 

The duration of unemployment is defined as the duration of a search for a job or as the length of the period since the last job was held (if this period is shorter than the duration of search for a job).

* **Employed persons** are all persons who worked at least one hour for pay or profit during the reference week or were temporarily absent from such work. For the unemployment rate, only persons from 15 to 74 years of age are used.

Therefore, we can proceed to constructing the map of unemployment for European countries. First inconvenience we encounter is that some of the countries in the topojson are not in the European Union, therefore they do not appear in the *unemployed* table. Thus, if we try to create the Choropleth map with the current DataFrame, the code will throw an error. In order to solve that problem, we need to also add the countries which are not in the European Union in the DataFrame, having only NaN values for the unemployment rate. 

In [12]:
TOPOGRAPHY_FILE = os.path.join(".", "topojson", "europe.topojson.json")

topojson = json.load(open(TOPOGRAPHY_FILE))
country_codes_topojson = [item['id'] for item in topojson["objects"]["europe"]["geometries"]]
# country_codes_topojson represents the codes of the country in the topojson

set(unemployed.index).difference(set(country_codes_topojson))

{'EA', 'EA18', 'EA19', 'EL', 'EU25', 'EU27', 'EU28', 'JP', 'UK', 'US'}

As we can see in the previous output, we have some normal values which appear in the DataFrame but don't appear in the topojson because they are not european country codes, such as *EA, EA18,EA19, EU25, EU27, EU28, JP, US*. But, in the same time, we have two European countries which seem to appear in the *unemployed* DataFrame, but not in the topojson. At a closer look, we can see that *EL* is the country code for Greece, which is denoted by *GR* in the topojson and *UK* is the country code for United Kingdom, which is denoted by *GB* in the topojson. Therefore, we will modify the country code for those two countries in the DataFrame below:

In [13]:
index_list = unemployed.index.tolist()

greece_index = index_list.index('EL')
index_list[greece_index] = 'GR'

uk_index = index_list.index('UK')
index_list[uk_index] = 'GB'

unemployed.index = index_list
unemployed['country_code'] = unemployed.index

set(unemployed.index).difference(set(country_codes_topojson))

{'EA', 'EA18', 'EA19', 'EU25', 'EU27', 'EU28', 'JP', 'US'}

Next, for constructing the Choropleth map, we need to determine the Data Classification method, i.e. cluster the unemployment rates into a small number of classes, like 5 or 6, and then use the right colors to represent it. As we may see <a href="http://gisgeography.com/choropleth-maps-data-classification/">here</a>, one of the best ways of creating classes is the Jenks Natural Breaks. The goal of this method is to construct clusters which have the minimum variations inside, but also the cluster means are at maximum distance. For example, in our case the outliers (such as Greece) need to be represented in a different class, to show the big difference in unemployment compared to the other countries. 

---
For using Jenks Natural breaks, we will install a library called jenkspy, with the command:
* conda install -c conda-forge jenkspy

Then, we will try to see what is the output of the algorithm when we want to have 5 clusters or 6 of them, for July 2017. We choose July 2017 because we don't have missing values for that month in the initial DataFrame:

In [14]:
jenks_breaks_5 = jenkspy.jenks_breaks(unemployed[7].dropna(), nb_class=5)
jenks_breaks_6 = jenkspy.jenks_breaks(unemployed[7].dropna(), nb_class=6)

jenks_breaks_5, jenks_breaks_6

([2.8, 4.3, 6.5, 9.0, 11.3, 21.0], [2.8, 4.3, 6.5, 9.0, 11.3, 16.9, 21.0])

We can see that the two lists differ only at the very end, where in the bigger list we add one intermediate element. We will take the decision to use the Jenks breaks with 6 clusters, because we want to have a more granular distinction for countries with unemployment between 11.3% and 21.0%. This is because we consider that the difference between the last bin's edges is very big, i.e. almost 10%. Therefore, we will use a different colors for the outliers. 

In [15]:
bins_edges = jenks_breaks_6

Also, knowing that we decided to use 6 classes, it means that we will need 6 colors to represent the different classes. For that, we use the <a href="http://colorbrewer2.org/#type=sequential&scheme=YlOrBr&n=6">colorbrewer</a> website, and we will decide for a yellow-brown theme. Also, we have to mark the european countries which are not in the European Union, and the EU contries that have missing data, with a different color, that may suggest that the data is missing. 

For being able to distinguish between the missing data and existing data easily, we split the topojson given into two different topojsons: one only with countries that appear in the DataFrame AND contain data on a specific column, and the other one only with countries that do not appear in the Dataframe OR have missing data in the column. The functions that realize the split are written below:

In [16]:
def country_in_dataframe_and_not_missing_value(json_entry, month):
    '''
    Function that returns True iff the country defined by the json entry appears in the unemployed DataFrame and has no missing
    value for the specified month.
    '''
    country_codes = set(unemployed.index)
    
    if json_entry["id"] not in country_codes:
        return False
    
    return not pd.isnull(unemployed.loc[json_entry["id"], month])

In [17]:
def construct_topojson_filter(filter_function):
    '''
    Function that computes a topojson from the initial one, only with the countries that pass the filtering function
    '''
    
    topojson_filtered = json.load(open(TOPOGRAPHY_FILE))
    
    filtered_entries_list = filter(filter_function, topojson["objects"]["europe"]["geometries"])
    topojson_filtered["objects"]["europe"]["geometries"] = list(filtered_entries_list)
    
    return topojson_filtered

In [18]:
month_dictionary_english = {1: 'January ',
                           2: "February ", 
                           3: "March ",
                           4: "April ", 
                           5: "May ", 
                           6: "June ",
                           7: "July ",
                           8: "August ", 
                           9: "September ",
                           10: "October ", 
                           11: "November ",
                           12: "December "}

Next, we write a function that will compute the Choropleth map for a specific month given as parameter. For that, we first split the topojson into two topojsons, using the functions described above. Then, we cluster the data in 6 clusters using Jenks Natural Breaks and compute the Choropleth map using this data and a color scheme from *colorbrewer* website. Then, for all the countries which have missing data or are not in the European union, we will apply a new color on top of the computed map, which will be in a big contrast with all of the other colors used for expressing unemployment rates, such that the user can easily make the difference between the missing data and available one. The function is written below:

In [19]:
def create_choropleth_for_month(month, m=None):
    if month > 9:
        year = '2016'
    else:
        year = '2017'        
    
    bins_edges = jenkspy.jenks_breaks(list(unemployed[month].dropna()), nb_class=6)[:-1] # create the bins, without the rightmost edge
    
    if m is None:
        m = folium.Map(location=[55, 10], tiles='Mapbox Bright', zoom_start=3.4)
    topojson_common = construct_topojson_filter(lambda x: country_in_dataframe_and_not_missing_value(x, month))
    topojson_distinct = construct_topojson_filter(lambda x: not country_in_dataframe_and_not_missing_value(x, month))

    # Construct Choropleth map with the countries in the DataFrame that don't have missing data for the month
    m.choropleth(
    geo_data=topojson_common,
    data=unemployed,
    columns=['country_code', month],
    key_on='feature.id',
    fill_color='YlOrBr',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Unemployment Rate for ' + month_dictionary_english[month] + year + '(%)',
    topojson='objects.europe',
    threshold_scale=bins_edges,
    highlight=True)
    
    # for all the European countries that are not in EU OR have missing data for the specified month, fill them with blue
    folium.TopoJson(
    data=topojson_distinct,
    object_path='objects.europe',
    style_function=lambda feature: {
        'fillColor': '7a0177',
        'fill_opacity' : 0.8,
        'color' : 'black',
        'weight' : 0.2}
    ).add_to(m)
    
    return m

Then, let's take a look at the actual unemployment DataFrame, to decide which month will be taken into consideration:

In [20]:
unemployed

Unnamed: 0,9,8,7,6,5,4,3,2,1,12,11,10,country_code
AT,5.6,5.5,5.4,5.3,5.4,5.6,5.7,5.8,5.7,5.7,5.8,5.9,AT
BE,7.1,7.3,7.3,7.2,7.3,7.4,7.6,7.7,7.6,7.2,7.2,7.2,BE
BG,6.1,6.2,6.1,6.1,6.1,6.2,6.4,6.6,6.6,6.7,6.8,7.0,BG
CY,10.3,10.6,10.7,10.9,11.3,11.6,12.1,12.5,12.7,13.2,13.4,13.1,CY
CZ,2.7,2.8,2.8,2.9,3.0,3.3,3.2,3.3,3.3,3.5,3.7,3.7,CZ
DE,3.6,3.6,3.7,3.7,3.8,3.8,3.9,3.9,3.9,3.9,3.9,4.0,DE
DK,5.7,5.7,5.8,5.7,5.7,5.7,5.9,6.1,6.1,6.1,6.3,6.5,DK
EA,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.6,9.7,9.8,EA
EA18,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.7,9.8,9.8,EA18
EA19,8.9,9.0,9.0,9.1,9.2,9.2,9.4,9.5,9.6,9.6,9.7,9.8,EA19


We can see that for September 2017 and August 2017 we have missing values for some countries, and most important for Greece, which is actually the country with the biggest unemployment rate in the last months. Therefore, we decide to show the unemployment map for July 2017, because this is the most recent date for which we have data for all the European countries:

In [21]:
EUROPE_UNEMPLOYMENT_RATE = os.path.join('.', 'out', 'europe_unemployment_rate.html')

m = create_choropleth_for_month(7)
m.save(EUROPE_UNEMPLOYMENT_RATE)

IFrame(src=EUROPE_UNEMPLOYMENT_RATE,width=900, height=600)

In order to make the map more interactive, we can enable popups when we click inside a country, which will display the actual unemployment rate for the specific country in the analyzed month. The code that enables the feature is written below:

In [22]:
def construct_dict_country_code_to_country_name():
    result = {}
    topojson = json.load(open(TOPOGRAPHY_FILE))
    for item in topojson["objects"]["europe"]["geometries"]:
        result[item["id"]] = item["properties"]["NAME"]
    
    return result

In [23]:
country_code_to_country_name = construct_dict_country_code_to_country_name()

In [24]:
def construct_popup_text(month, country_code, rate):
    '''
    Function that returns the text to be written in a specific country area.
    '''
    country = country_code_to_country_name[country_code]
    
    if month > 9:
        year = '2016'
    else:
        year = '2017'
        
    html = "<h3>" + country + " unemployment rate in " + month_dictionary_english[month] + " " + year + " is: "
    html += str(rate) + "%.</h3><br>"
    
    return html

For enabling the function mentioned before, we need to do a special trick: for each country, create the topojson with only that country (basically, deleting all other countries from the topojson), then for each constructed topojson, assign a popup to it and then add to the previously constructed map. All the functions are implemented below:

In [25]:
def construct_countries_topojsons():
    result = []
    
    topojson = json.load(open(TOPOGRAPHY_FILE))
    
    for item in topojson["objects"]["europe"]["geometries"]:
        copy_topojson = copy.deepcopy(topojson)
        copy_topojson["objects"]["europe"]["geometries"] = [item]
        
        result.append((item["id"], copy_topojson)) # pair (country, country_topojson)
        
    return result

In [26]:
def construct_popups_topojsons(month):
    '''
    Function that creates all the topojsons with popups for a specific month.
    '''
    
    eu_countries = set(unemployed.index)
    result = []
    
    countries_topojsons = construct_countries_topojsons()
    for country_code, country_topojson_dict in countries_topojsons:
        if (not (country_code in eu_countries)) or np.isnan(unemployed.loc[country_code, month]):
            rate = 'unknown'
        else :
            rate = unemployed.loc[country_code, month]
        
        popup_html = construct_popup_text(month, country_code, rate)
        popup = folium.Popup(html=popup_html, max_width=500)
        
        country_topojson = folium.TopoJson(
            country_topojson_dict, 
            'objects.europe',
            name=country_code,
            style_function=lambda feature:{'fill_opacity':0.01, 'line_opacity':0.01, 'line_color':'black', 'weight':0.01, 'color':'white'})
        
        country_topojson.add_child(popup)
        
        result.append(country_topojson)
        
    return result

In [27]:
def create_choropleth_for_month_interactive(month, m=None):
    if m is None:
         m = folium.Map(location=[55, 10], tiles='Mapbox Bright', zoom_start=3.4)
    
    m = create_choropleth_for_month(month, m)
    
    countries_topojsons_popups = construct_popups_topojsons(month)
    for country_topojson in countries_topojsons_popups:
        country_topojson.add_to(m)
        
    return m

In [29]:
EUROPE_UNEMPLOYMENT_RATE_INTERACTIVE = os.path.join('.', 'out', 'europe_unemployment_rate_interactive.html')

# m = create_choropleth_for_month_interactive(7)
# m.save(EUROPE_UNEMPLOYMENT_RATE_INTERACTIVE)

IFrame(src=EUROPE_UNEMPLOYMENT_RATE_INTERACTIVE,width=900, height=600)

Therefore, if we look at the unemployment rate in Europe in July 2017, we can see exactly what probably the majority of us knew about the european countries.  Therefore, Germany and Norway have the lowest unemployments rate, while Spain and Greece have the higher ones, with 16.9%, respectively 21%. 

## Exercise 2

In [None]:
## Exercise 2