Build a Choropleth map which shows intuitively (i.e., use colors wisely) how much grant money goes to each Swiss canton.

In [147]:
import pandas as pd
import numpy as np
import json
import geopy
from geopy.geocoders import geonames

In [106]:
p3_grant_export_data = pd.read_csv("P3_GrantExport.csv", sep=";")
p3_grant_export_data

Unnamed: 0,"﻿""Project Number""",Project Title,Project Title English,Responsible Applicant,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Discipline Number,Discipline Name,Discipline Name Hierarchy,Start Date,End Date,Approved Amount,Keywords
0,1,Schlussband (Bd. VI) der Jacob Burckhardt-Biog...,,Kaegi Werner,Project funding (Div. I-III),Project funding,,Nicht zuteilbar - NA,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,11619.00,
1,4,Batterie de tests à l'usage des enseignants po...,,Massarenti Léonard,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,10104,Educational science and Pedagogy,"Human and Social Sciences;Psychology, educatio...",01.10.1975,30.09.1976,41022.00,
2,5,"Kritische Erstausgabe der ""Evidentiae contra D...",,Kommission für das Corpus philosophorum medii ...,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",10101,Philosophy,Human and Social Sciences;Linguistics and lite...,01.03.1976,28.02.1985,79732.00,
3,6,Katalog der datierten Handschriften in der Sch...,,Burckhardt Max,Project funding (Div. I-III),Project funding,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,10302,Swiss history,Human and Social Sciences;Theology & religious...,01.10.1975,30.09.1976,52627.00,
4,7,Wissenschaftliche Mitarbeit am Thesaurus Lingu...,,Schweiz. Thesauruskommission,Project funding (Div. I-III),Project funding,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",10303,Ancient history and Classical studies,Human and Social Sciences;Theology & religious...,01.01.1976,30.04.1978,120042.00,
5,8,Die schweizerische Wirtschaftspolitik seit dem...,,Kleinewefers Henner,Project funding (Div. I-III),Project funding,"Séminaire de politique économique, d'économie ...",Université de Fribourg - FR,10203,Economics,"Human and Social Sciences;Economics, law",01.01.1976,31.12.1978,53009.00,
6,9,Theologische Forschungen zur Oekumene (Studien...,,Stirnimann Heinrich,Project funding (Div. I-III),Project funding,Institut für ökumenische Studien Université de...,Université de Fribourg - FR,10102,"Religious sciences, Theology",Human and Social Sciences;Theology & religious...,01.01.1976,31.12.1976,25403.00,
7,10,Konfuzianische Kulturwerte in der sozialen Ent...,,Deuchler Martina,Project funding (Div. I-III),Project funding,Ostasiatisches Seminar Universität Zürich,Universität Zürich - ZH,10301,History in general,Human and Social Sciences;Theology & religious...,01.10.1975,31.03.1977,47100.00,
8,11,Edizione degli scritti di Aurelio de' Giorgi B...,,Stäuble Antonio,Project funding (Div. I-III),Project funding,,Université de Lausanne - LA,10502,Romance languages and literature,Human and Social Sciences;Linguistics and lite...,01.10.1975,31.03.1977,25814.00,
9,13,La construction de nouveautés au sein des morp...,,Piaget Jean,Project funding (Div. I-III),Project funding,Laboratoire de Didactique et Epistémologie des...,Université de Genève - GE,10105,Psychology,"Human and Social Sciences;Psychology, educatio...",01.10.1975,30.09.1978,360000.00,


In [107]:
p3_grant_export_data.size

959535

In [108]:
# We keep only the rows which mention how much money has been granted (the amount column starts by a number)
p3_grant_export_data = p3_grant_export_data[p3_grant_export_data['Approved Amount'].apply(lambda x : x[0].isdigit())]

In [109]:
# Almost 200k rows have been removed
p3_grant_export_data.size

795885

In [110]:
# We don't need this data
p3_grant_export_data = p3_grant_export_data.drop(p3_grant_export_data.columns[[0]], axis = 1)
p3_grant_export_data = p3_grant_export_data.drop(['Project Title', 'Project Title English', 'Responsible Applicant', 'Discipline Number', 'Discipline Name', 'Discipline Name Hierarchy', 'Keywords'], axis=1)
p3_grant_export_data.size

371413

First, we will locate projcets according to the University name.
We will ignore all project in which the University is not mentioned : we assume that if it's not, the project is probably outside Switzerland.
If we have the time, a better solution would be taking the institution's location into account as well.

In [111]:
# Removing rows in which University is not mentioned
p3_grant_export_data = p3_grant_export_data.dropna(subset=['University'])
p3_grant_export_data.size

356146

In [112]:
p3_grant_export_data

Unnamed: 0,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Start Date,End Date,Approved Amount
0,Project funding (Div. I-III),Project funding,,Nicht zuteilbar - NA,01.10.1975,30.09.1976,11619.00
1,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,01.10.1975,30.09.1976,41022.00
2,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",01.03.1976,28.02.1985,79732.00
3,Project funding (Div. I-III),Project funding,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,01.10.1975,30.09.1976,52627.00
4,Project funding (Div. I-III),Project funding,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",01.01.1976,30.04.1978,120042.00
5,Project funding (Div. I-III),Project funding,"Séminaire de politique économique, d'économie ...",Université de Fribourg - FR,01.01.1976,31.12.1978,53009.00
6,Project funding (Div. I-III),Project funding,Institut für ökumenische Studien Université de...,Université de Fribourg - FR,01.01.1976,31.12.1976,25403.00
7,Project funding (Div. I-III),Project funding,Ostasiatisches Seminar Universität Zürich,Universität Zürich - ZH,01.10.1975,31.03.1977,47100.00
8,Project funding (Div. I-III),Project funding,,Université de Lausanne - LA,01.10.1975,31.03.1977,25814.00
9,Project funding (Div. I-III),Project funding,Laboratoire de Didactique et Epistémologie des...,Université de Genève - GE,01.10.1975,30.09.1978,360000.00


In [104]:
# We also delete every row that contains "Nicht zuteilbar - NA", which means that University is not mentioned.
p3_grant_export_data = p3_grant_export_data[p3_grant_export_data.University.str.contains('Nicht zuteilbar - NA') == False]
p3_grant_export_data.head()

Unnamed: 0,Funding Instrument,Funding Instrument Hierarchy,Institution,University,Start Date,End Date,Approved Amount
1,Project funding (Div. I-III),Project funding,Faculté de Psychologie et des Sciences de l'Ed...,Université de Genève - GE,01.10.1975,30.09.1976,41022.0
2,Project funding (Div. I-III),Project funding,Kommission für das Corpus philosophorum medii ...,"NPO (Biblioth., Museen, Verwalt.) - NPO",01.03.1976,28.02.1985,79732.0
3,Project funding (Div. I-III),Project funding,Abt. Handschriften und Alte Drucke Bibliothek ...,Universität Basel - BS,01.10.1975,30.09.1976,52627.0
4,Project funding (Div. I-III),Project funding,Schweiz. Thesauruskommission,"NPO (Biblioth., Museen, Verwalt.) - NPO",01.01.1976,30.04.1978,120042.0
5,Project funding (Div. I-III),Project funding,"Séminaire de politique économique, d'économie ...",Université de Fribourg - FR,01.01.1976,31.12.1978,53009.0


In [119]:
# The 'Approved Amount' column contains string types instead of numbers
type(p3_grant_export_data['Approved Amount'][0])

str

In [127]:
# Let's convert this column to float numbers, so we'll be able to do some maths
p3_grant_export_data['Approved Amount'] = p3_grant_export_data['Approved Amount'].apply(float)

In [128]:
# Now we definitely have numbers in the 'Approved Amount' column !
type(p3_grant_export_data['Approved Amount'][0])

numpy.float64

Now, time to locate universities...
For this, we are giong to use Geopy, which is a python client that works with most popular websites.

In [137]:
json_login=open('geonames_login.json').read()
login = json.loads(json_login)
geonames_login = login['login']
geonames_password = login['password']

We want to locate every university, then add the corresponding canton in a new column, on the dataframe we were dealing with before.

In [171]:
geolocator = geopy.geocoders.GeoNames(None, geonames_login)
test = geolocator.geocode("University of Geneva")
test

Location(University of Geneva, GE, CH, (46.19954, 6.14239, 0.0))

We'll create a table containing all the cantons corresponding to the Universities, then we'll add this new table at the end of our dataframe. So each row will be linked to a canton.

In [182]:
# Let's count the number of distinct universities we have
p3_grant_export_data.groupby('University').Institution.nunique().size

77

In [None]:
# So we will have to make about 77 request to Geonames, which isn't that much !
# We'll create a dataframe that will link each university to a canton.

In [206]:
university_canton_df = p3_grant_export_data.groupby('University').Institution.nunique()
university_canton_df
#newdf= set(olddf.University)

University
AO Research Institute - AORI                            2
Allergie- und Asthmaforschung - SIAF                    2
Berner Fachhochschule - BFH                            42
Biotechnologie Institut Thurgau - BITG                  1
Centre de rech. sur l'environnement alpin - CREALP      1
EPF Lausanne - EPFL                                   523
ETH Zürich - ETHZ                                     552
Eidg. Anstalt für Wasserversorgung - EAWAG             31
Eidg. Forschungsanstalt für Wald,Schnee,Land - WSL     17
Eidg. Hochschulinstitut für Berufsbildung - EHB         5
Eidg. Material und Prüfungsanstalt - EMPA              29
Ente Ospedaliero Cantonale - EOC                        9
Fachhochschule Kalaidos - FHKD                          3
Fachhochschule Nordwestschweiz (ohne PH) - FHNW        65
Fachhochschule Ostschweiz - FHO                        12
Facoltà di Teologia di Lugano - FTL                     2
Fernfachhochschule Schweiz (Mitglied SUPSI) - FFHS      1
Fir

In [177]:
cantons_table = []
for i in p3_grant_export_data['University']:
    canton = geolocator.geocode(i)
    cantons_table.append(canton)

GeocoderTimedOut: Service timed out

In [None]:
cantons_table