# Interactive visualization homework overview
In this homework, we want to make an interactive visualization of the grants received from the SNSF in each canton. The data is the P3 data given on the [SNSF website](http://p3.snf.ch/), called P3_GrantExport.csv
To do so, 
* we first load the data with pandas;
* we only keep the columns of interest (University name and amount of money received for each project);
* and we only keep the rows of interest (corresponding to Swiss universities, that is any non-nan "University" entry is valid). 
* Then, we have to map the universities to their corresponding cantons using [Geonames Full Text Search API in JSON](http://www.geonames.org/export/web-services.html) together with some manual tuning. 
* We finally visualize the results thanks to folium on the map of Switzerland, using a cloropleth map.

## Import librarires and load data

In [1]:
import numpy as np
import pandas as pd
import folium
import requests

In [2]:
geo = r'ch-cantons.topojson.json' #Geolocalization of the cantons
grants_csv = r'P3_GrantExport.csv' #P3 data
grants_df = pd.read_csv(grants_csv,delimiter=';') #Read it as a csv file with delimiter ;

In [3]:
grants_df.head()

Unnamed: 0,"﻿""Project Number""",Project Title,Project Title English,Responsible Applicant,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Discipline Number,Discipline Name,Discipline Name Hierarchy,Start Date,End Date,Approved Amount,Keywords
0,1,Schlussband (Bd. VI) der Jacob Burckhardt-Biog...,,Kaegi Werner,Project funding (Div. I-III),Project funding,,Nicht zuteilbar - NA,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,11619.0,
1,4,Batterie de tests à l'usage des enseignants po...,,Massarenti Léonard,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,10104,Educational science and Pedagogy,"Human and Social Sciences;Psychology, educatio...",01.10.1975,30.09.1976,41022.0,
2,5,"Kritische Erstausgabe der ""Evidentiae contra D...",,Kommission für das Corpus philosophorum medii ...,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",10101,Philosophy,Human and Social Sciences;Linguistics and lite...,01.03.1976,28.02.1985,79732.0,
3,6,Katalog der datierten Handschriften in der Sch...,,Burckhardt Max,Project funding (Div. I-III),Project funding,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,52627.0,
4,7,Wissenschaftliche Mitarbeit am Thesaurus Lingu...,,Schweiz. Thesauruskommission,Project funding (Div. I-III),Project funding,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",10303,Ancient history and Classical studies,Human and Social Sciences;Theology & religious...,01.01.1976,30.04.1978,120042.0,


## Choose data of interest
We are only interested in the 'University' and 'Approved Amount' fields, so that we only keep then. Moreover, we can note that some entries contain 'Nicht zuteilbar - NA' to tell us that no information has been given. We thus set them to nan values. In the documentation of the P3 dataset, some more information has been given: the nan entries in the 'University'/'Institution' fields correspond to non-Swiss university partnerships, so that the values can easily been thrown away without consequence on what we want to analyse.

Choose rows and replace 'Nicht zuteilbar - NA' and 'data not included in P3' by nan.

In [12]:
grants_uni_df = grants_df[['Institution', 'University','Approved Amount']].replace('Nicht zuteilbar - NA', np.nan)
grants_uni_df = grants_uni_df[['Institution', 'University','Approved Amount']].replace('data not included in P3', np.nan)
grants_uni_df.head()

Unnamed: 0,Institution,University,Approved Amount
0,,,11619.0
1,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,41022.0
2,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",79732.0
3,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,52627.0
4,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",120042.0


Check how many null-entries there are.

In [13]:
null_uni = grants_uni_df[grants_uni_df['University'].isnull()].shape[0]
null_inst = grants_uni_df[grants_uni_df['Institution'].isnull()].shape[0]
null_amount = grants_uni_df[grants_uni_df['Approved Amount'].isnull()].shape[0]
print(null_inst)
print(null_uni)
print(null_amount)

5138
15576
10910


Drop null entries.

In [14]:
grants_uni_CH_df = grants_uni_df.dropna()
grants_uni_CH_df.head()

Unnamed: 0,Institution,University,Approved Amount
1,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,41022.0
2,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",79732.0
3,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,52627.0
4,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",120042.0
5,"Séminaire de politique économique, d'économie ...",Université de Fribourg - FR,53009.0


In [15]:
grants_uni_CH_df.describe()

Unnamed: 0,Institution,University,Approved Amount
count,47048,47048,47048.0
unique,5204,76,34404.0
top,Institut des sciences et ingénierie chimiques ...,Universität Zürich - ZH,10000.0
freq,411,6520,548.0


In [18]:
grants_uni_CH_df = grants_uni_CH_df.rename(columns={'Approved Amount':'Amount'})
grants_uni_CH_df.Amount = pd.to_numeric(grants_uni_CH_df.Amount)
grants_uni_CH_df.Amount.describe()

count    4.704800e+04
mean     2.685646e+05
std      3.263471e+05
min      0.000000e+00
25%      9.303575e+04
50%      1.912360e+05
75%      3.350000e+05
max      1.548775e+07
Name: Amount, dtype: float64

## Mapping from University to Canton

In [22]:
username = 'ochanon'
url='http://api.geonames.org/postalCodeSearchJSON?'
parameters={'username':username,'placename':'CH','maxRows':1,'operator':'OR'}
r=requests.get(url,params=parameters)
df=grants_uni_CH_df

In [23]:
df_final=pd.DataFrame({'Canton':[]})
not_found_list=[]
for block in df[['Institution','University']].itertuples(index=False):
    nan1=str(block[0])
    nan2=str(block[1])
    
    # By default take only the university.
    # The first time a value is added it is checked for differences if adding also the institution.
    if nan1=='nan':
        if nan2=='nan':
            raise('Bad preprocessing - double nan')
        query_string=block[1]
    elif nan2=='nan':
        query_string=block[0]
    else:
        query_string=block[0]+", "+block[1]
    
    
    # List of checks if already present in the dictionary:
    # 1- institution + university 
    # 2- university 
    # 3- query to geonames
    try:
        canton=correspondencies_dictionary[query_string]
        df2=pd.DataFrame({'Canton':[canton]})
        df_final=df_final.append(df2)
    except:
        try:
            query_string_university=str(block[1])
            canton=correspondencies_dictionary[query_string_university]
            df2=pd.DataFrame({'Canton':[canton]})
            df_final=df_final.append(df2)
        except:
            try:
                params['placename']=query_string
                r=requests.get(url,params=params)
                df1=pd.read_json(r.text,orient='records')
                canton=df1.postalCodes[0]

                if nan2!='nan':
                    query_string_short=block[1]
                    r=requests.get(url,params=params)
                    df1=pd.read_json(r.text,orient='records')
                    canton2=df1.postalCodes[0]
                    if canton2==canton:
                        query_string=query_string_short
                    else:
                        print(canton2,canton)

                if canton['countryCode']!='CH':
    #                 if nan2!='nan':
    #                     print(2140358234)
    #                 print(query_string)
    #                 print(canton['countryCode'])
                    continue
                df2=pd.DataFrame(canton,columns=['adminCode1'],index=['adminCode1'])
                df2=df2.rename(columns={'adminCode1':'Canton'})
                df_final=df_final.append(df2)
                correspondencies_dictionary[query_string]=df2.Canton[0]
            except:
                print(query_string)
                not_found_list.append(query_string)

Faculté de Psychologie et des Sciences de l'Education Université de Genève, Université de Genève - GE
Kommission für das Corpus philosophorum medii aevi der SGG, NPO (Biblioth., Museen, Verwalt.) - NPO
Abt. Handschriften und Alte Drucke Bibliothek der Universität Basel, Universität Basel - BS
Schweiz. Thesauruskommission, NPO (Biblioth., Museen, Verwalt.) - NPO
Séminaire de politique économique, d'économie internationale et d'économie régionale, Université de Fribourg - FR
Institut für ökumenische Studien Université de Fribourg, Université de Fribourg - FR
Ostasiatisches Seminar Universität Zürich, Universität Zürich - ZH
Laboratoire de Didactique et Epistémologie des Sciences Université de Genève, Université de Genève - GE
Klinische Psychologie und Psychotherapie Institut für Psychologie Universität Bern, Université de Fribourg - FR
Schweizerische Rechtsquellen c/o Universität Zürich / RWI, NPO (Biblioth., Museen, Verwalt.) - NPO
Département de Sociologie Faculté des Sciences de la So

## Interactive visualization using Folium

In [None]:
map = folium.Map(location=[46.8, 8], zoom_start=8)
map.choropleth(geo_path=geo, data=None,
             columns=['Canton', 'Amount'],
             key_on='feature.id',
             fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Amount of grants (CHF)')

In [None]:
map