# Table of Contents

### [Task 1. European Unemployment Rates Analysis - eurostat dataset](#1)
- [1.1. Pick a DataSet to analyse](#11)

- [1.2. Analyse Data and divide it into classes of interest](#12)

- [1.3. Build Choropleth Map](#13)

### [Task 2. Swiss Unemployment Rates Analysis - amstat dataset](#2)
- [2.1. Pick a DataSet to analyse](#21)

- [2.2. Analyse Data and divide it into classes of interest](#22)

- [2.3. Build Choropleth Map](#23)

- [2.4. Change categorization method](#24)

- [2.5. Build Choropleth Map for new categorization method](#25)

### [Task 3. Swiss Unemployment Rates Analysis (including Swiss and foreign workers distinction) - amstat dataset ](#3)
- [3.1. Pick a DataSet to analyse.](#31)

## Imports

**We start by importing some libraries that will be heplful during this homework**

In [89]:
import os
import pandas as pd
import json
import folium
import numpy as np
import matplotlib.pyplot as plt

## Task 1. European Unemployment Rates Analysis - *eurostat* dataset <a class="anchor" id="1"></a>

### Assignment Instructions
Go to the eurostat website and try to find a dataset that includes the european unemployment rates at a recent date.

Use this data to build a `Choropleth map` which shows the unemployment rate in Europe at a country level. Think about the colors you use, how you decided to split the intervals into data classes or which interactions you could add in order to make the visualization intuitive and expressive. Compare Switzerland's unemployment rate to that of the rest of Europe.

### 1.1. Pick a Dataset<a class="anchor" id="11"></a>

In this task, we are asked to pick a data set which includes the european unemployment rates at a recent date. 
For this reason, we pick a dataset which represents the monthly average of unemployment rate in the past 12 months, from September 2016 to September 2017.

Note that this file can be downloaded either in `.csv` format or `.tsv` format. We choose the second one, since TSV is a very efficient for Python to process, without losing any typing information, and also easy for humans to read. Let's load the selected file:

In [90]:
unemployment_file = r'tepsr_wc170.tsv'

unemployment_df = pd.read_csv(unemployment_file, delimiter='\t|,', engine='python', encoding="utf-8-sig")
unemployment_df.head()

Unnamed: 0,unit,sex,age,isced11,geo\time,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
0,PC,T,Y15-24,TOTAL,AT,11.0 b,9.8,9.4 b,8.5,10.7,9.5,8.9,9.4,9.7,10.3,10.6,11.2
1,PC,T,Y15-24,TOTAL,BE,21.5 b,20.5,18.8,18.0,21.9,22.4,18.7 b,19.8,23.7,23.2,22.1,20.1
2,PC,T,Y15-24,TOTAL,BG,22.3 b,19.5,15.1,12.7 b,16.2,21.9 b,25.0 b,28.1,28.4,23.8,21.6,17.2
3,PC,T,Y15-24,TOTAL,CH,8.8 b,7.7,7.1,7.0,8.5,8.2 b,7.7,8.3,8.7,8.6,8.8,8.6
4,PC,T,Y15-24,TOTAL,CY,13.9 b,10.0,10.2,9.0,13.8 b,16.6,22.4,27.7,38.9,36.0,32.8,29.1


We clean the spaces ('') in columns headers

In [91]:
unemployment_df.columns = unemployment_df.columns.str.strip()

Let's analyse which values features may take:

In [92]:
print("unit values: {}".format(np.unique(unemployment_df['unit'])))
print("sex values: {}".format(np.unique(unemployment_df['sex'])))
print("age values: {}".format(np.unique(unemployment_df['age'])))
print("isced11 values: {}".format(np.unique(unemployment_df['isced11'])))

unit values: ['PC']
sex values: ['T']
age values: ['Y15-24' 'Y15-74' 'Y20-64' 'Y25-29' 'Y25-54' 'Y55-64']
isced11 values: ['TOTAL']


The feature `isced11`, which may not be so intuitive, represents the `International Standard Classification of Education`. In this data set, the sum of all the categories is considered.

Furthermore, the feature `unit`, represents the unit of measure, which can be representad as a `percentage` or as `total`. In this case, it's always represented on percentage.

Note also that for `sex` they only consider the Total, which is M + F.

Let's proceed with filtering the desired data.

### 1.2. Analyse Data and divide it into classes of interest <a class="anchor" id="12"></a>

As we can see above, several filters can be applied in our data frame. Taking this into account, we create some functions to apply the most usual filters, depending on what we want to analyse:

In [93]:
def filter_df(unemployment_df, age, features):
    '''Function that filters a time interval'''
    unemployment_df = unemployment_df.loc[(unemployment_df['age']==age)][features]
    return unemployment_df

**Parameters Selection Criteria:** When selecting which classes to represent in our dataframe, several different approaches could be taken. The following choices are the ones that seem more meaningful to us:
- Age : Y15-74
- Sex : Total (M & F)
- Measurement Unit : In percentage of active population

In [94]:
#Define list with time interval to analyse
time_interval = ['2011', '2012', '2013', '2014', '2015', '2016']

#Define list with features to present in filtered data frame
features = ['geo\\time', 'age']
features.extend(time_interval)

unemployment_df = filter_df(unemployment_df, 'Y15-74', features)

** Data Cleaning **

On the original data set, we find rows related to groups of countries, such as: 
- EU28:	European Union (28 countries)
- EU27:	European Union (27 countries)
- EA19:	Euro area (19 countries)
- EA18:	Euro area (18 countries)

These rows play no role in our analysis, therefore we can drop them:

In [95]:
def delete_groupRows(df, group_codes):
    df = df[~df['geo\\time'].isin(group_codes)]
    return df

In [96]:
group_codes = ['EU28', 'EU27', 'EA19', 'EA18']
unemployment_df = delete_groupRows(unemployment_df, group_codes)

** Numerical Conversion **

In [97]:
for year in time_interval:
    unemployment_df[year] = pd.to_numeric(unemployment_df[year], errors = 'coerce')

### 1.3. Build Choropleth Map <a class="anchor" id="13"></a>

To build  map with the information above, we need somehow to associate the collected data with their locations.

For this, we start by importing the given `.json` file, which contains the geo-coordinates of European countries (liberal definition of EU):

In [98]:
# import json
europe_geo_path = r'./topojson/europe.topojson.json'
europe_json_data = json.loads(open(europe_geo_path).read())

We need to make sure that all codes in our dataframe have a correspondence in the json file:

In [99]:
#codes in df:
codes_df = np.unique(unemployment_df['geo\\time'])

#codes in JSON
codes_json = []
countries_JSON = europe_json_data['objects']['europe']['geometries']
for country in countries_JSON:
    codes_json.append(country['id'])

Let's *manually* correct the ones that have no correspondence:

In [100]:
#Codes in Data Frame with no correspondency on JSON file
no_correspondence = [code for code in unemployment_df['geo\\time'].astype(str) if  (code not in codes_json)]

In [101]:
# Update Greece and UK code
unemployment_df = unemployment_df.replace('EL', 'GR')
unemployment_df = unemployment_df.replace('UK', 'GB')

We still need to make sure we don't draw the countries whose data is missing in the dataframe, to prevent visual confusion. Therefore, we just use JSON data for the countries we have in our dataframe:

In [102]:
correspondence_JSON = [code_json in unemployment_df['geo\\time'].tolist() for code_json in codes_json]
europe_json_data['objects']['europe']['geometries'] = np.array(
    europe_json_data['objects']['europe']['geometries'])[correspondence_JSON].tolist()

Now that we made sure that all the codes have their correspondency in the json file, we can draw the map. For this, we create the following function:

In [107]:
def draw_Europe_map(df, date):
    europe_map = folium.Map(location=[46.519962, 6.633597], zoom_start=3)
    europe_map.choropleth(geo_data=europe_json_data,topojson='objects.europe', data=df,
    columns=['geo\\time', date],
    key_on='feature.id',
    fill_color='YlOrRd', fill_opacity=0.7, line_opacity=0.2, line_weight=2,
    legend_name='Unemployment rate in Europe', ) #threshold_scale=perc
    
    return europe_map

We begin by drawing a map representing the unemployment rate for 2016:

In [110]:
europe_map1 = draw_Europe_map(unemployment_df, '2015')
europe_map1 

On the map above we can see how the unemployment rate is distributed through Europe. We check that Greece is suffering from a high unemployment rate in comparison with its neighbors. Spain is the second country with the highest unemployment rate. 

We can also verify that Switzerland is one of the countries with the lowest unemployment rate.

In order to evaluate how this rate has evolved over time, we plot a simillar map, for 2015:

In [111]:
europe_map2 = draw_Europe_map(unemployment_df, '2016')
europe_map2

** Comparing two different year's scenarios **


Comparing both maps above, we infer that Switzerland kept its unemployment rate low. However, the same did not happen to Sweden, which saw its unemployment rate rising in comparison with last year.

## Task 2. Swiss Unemployment Rates Analysis - *amstat* dataset <a class="anchor" id="2"></a>

### Assignment Instructions

Go to the amstat website to find a dataset that includes the unemployment rates in Switzerland at a recent date.

The Swiss Confederation defines the rates you have just plotted as the number of people looking for a job divided by the size of the active population (scaled by 100). This is surely a valid choice, but as we discussed one could argue for a different categorization.

Copy the map you have just created, but this time don't count in your statistics people who already have a job and are looking for a new one. How do your observations change ? You can repeat this with different choices of categories to see how selecting different metrics can lead to different interpretations of the same data.

### 2.1. Pick a DataSet to analyse <a class="anchor" id="21"></a>



## Task 2. Swiss Unemployment Rates Analysis - *amstat* dataset <a class="anchor" id="2"></a>

### Assignment Instructions

Go to the amstat website to find a dataset that includes the unemployment rates in Switzerland at a recent date.

The Swiss Confederation defines the rates you have just plotted as the number of people looking for a job divided by the size of the active population (scaled by 100). This is surely a valid choice, but as we discussed one could argue for a different categorization.

Copy the map you have just created, but this time don't count in your statistics people who already have a job and are looking for a new one. How do your observations change ? You can repeat this with different choices of categories to see how selecting different metrics can lead to different interpretations of the same data.

### 2.1. Pick a DataSet to analyse <a class="anchor" id="21"></a>



### 2.2.  <a class="anchor" id="23"></a>


### 2.3.  <a class="anchor" id="24"></a>


### 2.4. <a class="anchor" id="25"></a>


## Task 3.  Task 3. Swiss Unemployment Rates Analysis (including Swiss and foreign workers distinction) - amstat dataset  <a class="anchor" id="3"></a>

### Assignment Instructions

Use the amstat website again to find a dataset that includes the unemployment rates in Switzerland at recent date, this time making a distinction between Swiss and foreign workers.

The Economic Secretary (SECO) releases a monthly report on the state of the employment market. In the latest report (September 2017), it is noted that there is a discrepancy between the unemployment rates for foreign (5.1%) and Swiss (2.2%) workers.

Show the difference in unemployment rates between the two categories in each canton on a Choropleth map (hint The easy way is to show two separate maps, but can you think of something better ?). Where are the differences most visible ? Why do you think that is ?

Now let's refine the analysis by adding the differences between age groups. As you may have guessed it is nearly impossible to plot so many variables on a map. Make a bar plot, which is a better suited visualization tool for this type of multivariate data.

### 3.1. Pick a DataSet <a class="anchor" id="31"></a>
