# Bonus Problem

This problem mainly consists of two parts. 

First, retrieving that each university belongs to which part of Switzerland. Second, visualizing the aggregated fund for each area.

## Data Extraction
In a high-level, we perform data extraction using the following algorithm:
First, we check if the university is in a canton which belongs only to a specific area (e.g. VD is only in French speacking area), we use the predefined area for it.
However, if the university belongs to a canton which is ambigous (e.g. FR can be both French speacking and German Speacking), we pass the name of the university to Yandex translation API to figure out in what language is the name of the university. This approach works fine except for two universities which we will fix them manually.

First, we import the necessary libraries.

In [106]:
import json
import os
import pandas as pd
import requests

We read the csv file containing the associated cantons to each university which we retrieved in part 1 of the homework:

In [183]:
df_uni_cantons=pd.read_csv("data/uni-cantons.csv")[["University", "Canton"]]
df_uni_cantons.head()

Unnamed: 0,University,Canton
0,Zürcher Fachhochschule (ohne PH) - ZFH,ZH
1,Forschungsanstalten Agroscope - AGS,ZH
2,Physikal.-Meteorolog. Observatorium Davos - PMOD,GR
3,Swiss Institute of Bioinformatics - SIB,VD
4,Weitere Institute - FINST,BS


The following function uses the Yandex translation API to find the language of the given text. In our case we pass the name of the university to this function:

In [186]:
yandexkey="REMOVED FOR PRIVACY REASONS!"

def getLang(uni_name):
   ylink="https://translate.yandex.net/api/v1.5/tr.json/detect?key={}&text={}&hint=fr,de,it,ro".format(yandexkey,uni_name)
   r = requests.get(ylink)
   return json.loads(r.text)['lang']

Next we add a column to the previous data frame which specifies the language of the university name.

In [187]:
df_uni_cantons_lang=df_uni_cantons.copy()
df_uni_cantons_lang["Lang"]=[getLang(a) for a in df_uni_cantons["University"]]
df_uni_cantons_lang.head()

Unnamed: 0,University,Canton,Lang
0,Zürcher Fachhochschule (ohne PH) - ZFH,ZH,de
1,Forschungsanstalten Agroscope - AGS,ZH,de
2,Physikal.-Meteorolog. Observatorium Davos - PMOD,GR,fr
3,Swiss Institute of Bioinformatics - SIB,VD,en
4,Weitere Institute - FINST,BS,fr


As you can see this information is not that precise. Anway, we do not need this information for the universities which are located in cantons without ambigouity of their spoken language. 

So, we only filter the elements which are located in the ambigous cantons:

In [188]:
amb_canton=pd.DataFrame(["BE","FR","GR","VS"], columns=["Canton"])
df_uni_cantons_amb=pd.merge(amb_canton, df_uni_cantons_lang)
df_uni_cantons_amb

Unnamed: 0,Canton,University,Lang
0,BE,Universität Bern - BE,de
1,BE,Berner Fachhochschule - BFH,de
2,BE,Pädagogische Hochschule Bern - PHBern,de
3,BE,Eidg. Hochschulinstitut für Berufsbildung - EHB,de
4,FR,Université de Fribourg - FR,fr
5,FR,Haute école pédagogique fribourgeoise - HEPFR,fr
6,GR,Physikal.-Meteorolog. Observatorium Davos - PMOD,fr
7,GR,AO Research Institute - AORI,fr
8,GR,Allergie- und Asthmaforschung - SIAF,de
9,GR,Institut für Kulturforschung Graubünden - IKG,de


We observe that for two cases the Yandex translation API did the job wrong. So we fix them manually:

In [189]:
df_uni_cantons_amb["Lang"][6]="de"
df_uni_cantons_amb["Lang"][7]="de"
df_uni_cantons_amb

Unnamed: 0,Canton,University,Lang
0,BE,Universität Bern - BE,de
1,BE,Berner Fachhochschule - BFH,de
2,BE,Pädagogische Hochschule Bern - PHBern,de
3,BE,Eidg. Hochschulinstitut für Berufsbildung - EHB,de
4,FR,Université de Fribourg - FR,fr
5,FR,Haute école pédagogique fribourgeoise - HEPFR,fr
6,GR,Physikal.-Meteorolog. Observatorium Davos - PMOD,de
7,GR,AO Research Institute - AORI,de
8,GR,Allergie- und Asthmaforschung - SIAF,de
9,GR,Institut für Kulturforschung Graubünden - IKG,de


For the rest of cantons, as we said before, the cantons are located in an area with a single spoken language. So we use a CSV file which contains this mapping:

In [202]:
canton_lang=pd.read_csv("data/canton_lang_man.csv")
canton_lang

Unnamed: 0,Canton,Lang
0,ZH,de
1,LU,de
2,UR,de
3,SZ,de
4,OW,de
5,NW,de
6,GL,de
7,ZG,de
8,SO,de
9,BS,de


Now we join the data frame of university and cantons wiht this data frame in order to add column which specifies the language spoken in the area of that university:

In [206]:
df_uni_cantons_auto=pd.merge(df_uni_cantons, canton_lang)
df_uni_cantons_auto.head()

Unnamed: 0,University,Canton,Lang
0,Zürcher Fachhochschule (ohne PH) - ZFH,ZH,de
1,Forschungsanstalten Agroscope - AGS,ZH,de
2,ETH Zürich - ETHZ,ZH,de
3,Universität Zürich - ZH,ZH,de
4,"Eidg. Forschungsanstalt für Wald,Schnee,Land -...",ZH,de


Now we append these two dataframes and retrieve a dataframe which contains all universities with the spoken language of the area in which the university is located.

In [207]:
df_uni_lang=df_uni_cantons_amb.append(df_uni_cantons_auto, ignore_index=True)
df_uni_lang.head()

Unnamed: 0,Canton,Lang,University
0,BE,de,Universität Bern - BE
1,BE,de,Berner Fachhochschule - BFH
2,BE,de,Pädagogische Hochschule Bern - PHBern
3,BE,de,Eidg. Hochschulinstitut für Berufsbildung - EHB
4,FR,fr,Université de Fribourg - FR


Finally, we write it into a csv file:

In [208]:
df_uni_lang.to_csv("data/uni_lang.csv", index=False)