# Homework 3
In this homework, we analyze the data of grants awarded by the Swiss NSF to Swiss universities.<br>
We map each university to its respective Swiss canton, disregarding institutions and entities outside Switzerland. Then, we calculate the total sum of grant amounts awarded to each Swiss canton by SNSF. We present the data on a Choropleth map.

## Data Aquisition and Wrangling
We start by importing the required modules.

In [None]:
import pandas as pd
import requests
import folium
import geocoder
import json
import googlemaps
import geopandas

We downloaded the SNSF data once from the website and stored them locally using the pickle. We now read them from the pickle format.

In [None]:
pubs = pd.read_pickle("data/P3_PublicationExport")
ppl = pd.read_pickle("data/P3_PersonExport")
grants = pd.read_pickle("data/P3_GrantExport")

We first look at the grants DataFrame.

In [None]:
grants.head()

We realise that we only need three columns from this DataFrame, namely the University and the Approved Amount.

In [None]:
grant_useful = grants[["University", "Approved Amount"]]
grant_useful = grant_useful.rename(columns= {'Approved Amount': 'Amount'})
grant_useful.head()

Some of the values for University and Amount are not known, so we remove these from consideration.

In [None]:
unknown_uni = grant_useful.University.isnull()
unknown_amount = grant_useful.Amount.isin(['data not included in P3'])

In [None]:
def goodDataIndex(uni, amt):
    ans = []
    for i in range(len(uni)):
        ans.append(not(uni[i] or amt[i]))
    return ans

In [None]:
pick = goodDataIndex(unknown_uni, unknown_amount)

In [None]:
grant_final = grant_useful[pick]

From the clean DataFrame of grants, we get the universities present in the data.

In [None]:
all_unis = grant_final.University.dropna().unique()

Then, we read a csv file containing the names of Swiss cantons, their short names, and their language.

In [None]:
cantons = pd.read_csv("data/cantons.csv")
cantons.drop(['Canton'], axis=1, inplace=True)
cantons.index.name = 'Canton'
cantons

In [None]:
cantons_short = cantons['ABBR']
cantons_short

## University to Canton Mapping

A couple of other methods were attempted to generate the mappings, but they performed worse than the google maps API method shown here. The other methods are shown in appendix.

### Main Method: Google Maps API

Finally, we use the google maps API to get the mapping. This method returned the best results, finding mappings for 17 universities out of 79.<br>
Note that the key is removed here as well.

In [None]:
import googlemaps
from datetime import datetime

gmaps = googlemaps.Client(key='')

geocode_result = gmaps.geocode('University of Neuchatel',region='ch')
geocode_result

In [None]:
def get_canton_from_geodata(geodata):
    addr = geodata[0]['address_components']
    for i in range(len(addr)):
        if 'administrative_area_level_1' in addr[i]['types']:
            return addr[i]['short_name']

In [None]:
gmaps = googlemaps.Client(key='')

uni_dict3 = {}
i=0
for uni in all_unis:
    geocode_result = gmaps.geocode(uni,region='ch')
    if len(geocode_result) > 0:
        uni_dict3[uni] = get_canton_from_geodata(geocode_result)
    else:
        uni_dict3[uni] = 'NotFound'

In [None]:
uni_canton_df3 = pd.DataFrame.from_dict(uni_dict3,orient="index")
uni_canton_df3.columns.names = ['University']
uni_canton_df3 = uni_canton_df3.rename(columns={0:'Canton'})
uni_canton_df3.tail()

In [None]:
uni_canton_df3.Canton.value_counts()

In [None]:
uni_canton_df3.to_csv('data/uni_to_cantons3.csv',index_label='University')

### Fixing the missing values manually

Finally, we manually add entries to the missing values. We store that in a csv and read it here.<br>
In the end, we have mappings for 75 universities out of the original 79.

In [None]:
uni_canton_final = pd.read_csv('data/uni_to_cantons4.csv')
uni_canton_final = uni_canton_final.set_index('University')
uni_canton_final.Canton.value_counts()

## Amounts per canton

Here, we aggregate the amount of grants of universities per canton.<br>
We store this in a DataFrame, and also export it as a csv file.

In [None]:
def integrate_cantons(grants):
    return uni_canton_final.loc[grants.University]

In [None]:
all_cantons = integrate_cantons(grant_final)

In [None]:
grant_final = grant_final.set_index('University')

In [None]:
grant_by_cantons = pd.concat([grant_final, all_cantons],axis=1)
grant_by_cantons = grant_by_cantons.reset_index()
grant_by_cantons = grant_by_cantons.set_index('Canton')

In [None]:
grant_by_cantons.head()

In [None]:
def amounts_by_canton(grant_by_cantons):
    canton_amounts = {}
    for canton in cantons_short:
        try:
            this_canton_amounts = grant_by_cantons.Amount[canton]
            this_canton_sum = pd.to_numeric(this_canton_amounts).sum()
            canton_amounts[canton] = this_canton_sum/10**6
        except:
            canton_amounts[canton] = 0
    return canton_amounts

In [None]:
cantons_amounts = amounts_by_canton(grant_by_cantons)
amounts_by_canton_df = pd.DataFrame.from_dict(cantons_amounts,orient='index')
amounts_by_canton_df.columns.name = 'Canton'
amounts_by_canton_df = amounts_by_canton_df.rename(columns ={0:'Amount'})

In [None]:
amounts_by_canton_df.sort_values(by='Amount')

In [None]:
amounts_by_canton_df.to_csv('data/amounts.csv',index_label='Canton')

## Coordinate for each canton

Here, we get the coordinates of each Canton using Geocoder.<br>
We store that in a DataFrame.

In [None]:
def cantons_coordinates(all_cantons, coordinates):
    for canton in cantons.index:
        g = geocoder.google(cantons.loc[canton]['Name'],region='ch',timeout=15)
        if len(g.latlng) > 0:
            coordinates[cantons.loc[canton]['ABBR']] = g.latlng
    return coordinates

In [None]:
coordinates_of_cantons = {}
while len(coordinates_of_cantons) < 26:
    coordinates_of_cantons = cantons_coordinates(all_cantons, coordinates_of_cantons)
    print(len(coordinates_of_cantons))

In [None]:
coordinates_of_cantons

In [None]:
coordiantes_of_cantons_df = pd.DataFrame.from_dict(coordinates_of_cantons,orient='index')
coordiantes_of_cantons_df.columns.name = 'Canton'
coordiantes_of_cantons_df = coordiantes_of_cantons_df.rename(columns ={0:'Latitude',1:'Longitude'})

In [None]:
coordiantes_of_cantons_df

## Visualizing the Data

Here, we show a Choropleth map of the sum of grants awarded to each canton.

In [None]:
amount_data = pd.read_csv('data/amounts.csv')
amount_data

In [None]:
swiss = geocoder.google('Switzerland',timeout=15)

In [None]:
topo_path = r'ch-cantons.topojson.json'

The map below is saved to the cantons.html file for easier viewing.

In [None]:
swiss_map = folium.Map(location=swiss.latlng, zoom_start=7,tiles='cartodbpositron')
swiss_map.choropleth(
    geo_path=topo_path,topojson='objects.cantons',
    fill_color='red',
    fill_opacity=0.3,
    line_weight=2,
)
swiss_map.save('cantons.html')
swiss_map

We now use the data of the grants that was collected earlier to create the Choropleth map.<br>
This is saved in chloro.html

In [None]:
swiss_map = folium.Map(location=swiss.latlng, zoom_start=7,tiles='cartodbpositron')
swiss_map.choropleth(geo_path=topo_path,topojson='objects.cantons',data=amount_data,
             columns=['Canton', 'Amount'],
             key_on='feature.id',
             fill_color='YlGn', fill_opacity=0.7, line_opacity=0.2,
             legend_name='Amount',
            threshold_scale=[10,100,500,1000,2000])
swiss_map.save('chloro.html')
swiss_map

Lastly, we add a marker to each canton, that when clicked will produce a popup showing the canton short name, in addition to the amount of grant its universities received (in millions of francs).<br>
This is saved in popup.html

In [None]:
for canton in coordiantes_of_cantons_df.index:
    lat = coordiantes_of_cantons_df['Latitude'][canton]
    lng = coordiantes_of_cantons_df['Longitude'][canton]
    folium.Marker([lat, lng], popup = canton + ': ' + \
                  str(round(amounts_by_canton_df['Amount'][canton], 2)) + 'M CHF').add_to(swiss_map)
swiss_map.save('popup.html')
swiss_map

## Appendix

### Alternative Method 1: Geocoder

We use the following geocoder method to search for the Canton of each university.

In [None]:
g = geocoder.reverse("ETHZ",timeout=15)
g.province_long

We do this for all universities in our data and store the results in a dictionary.

In [None]:
uni_dict = {}
for uni in all_unis:
    g = geocoder.reverse(uni,timeout=60)
    uni_dict[uni] = g.province
    #print(uni,g.province)

We then convert this into a DataFrame that is more easily read.

In [None]:
uni_canton_df = pd.DataFrame.from_dict(uni_dict,orient="index")
uni_canton_df.columns.names = ['University']
uni_canton_df = uni_canton_df.rename(columns={0:'Canton'})
uni_canton_df.fillna("NotFound", inplace=True)
uni_canton_df.tail()

This method found mappings for 17 universities out of the total 79.

In [None]:
uni_canton_df.Canton.value_counts()

We store the results obtained in a csv file.

In [None]:
uni_canton_df.to_csv('data/uni_to_cantons.csv',index_label='University')

### Alternative Method 2: Geonames API

We use the geonames API to do the same thing. Note that the username is removed here.

In [None]:
_USERNAME = ''

uni_dict2 = {}
i=0
for uni in all_unis:
    query_url = ''.join(['http://api.geonames.org//searchJSON?formatted=true&q=',uni,'&country=ch&username=',_USERNAME])
    r = requests.get(query_url)
    geodata = json.loads(r.text)
    if geodata['totalResultsCount'] !=0:
        uni_dict2[uni] = geodata['geonames'][0]['adminCode1']
    else:
        uni_dict2[uni] = 'NotFound'

In [None]:
uni_canton_df2 = pd.DataFrame.from_dict(uni_dict2,orient="index")
uni_canton_df2.columns.names = ['University']
uni_canton_df2 = uni_canton_df2.rename(columns={0:'Canton'})
uni_canton_df2.tail()

This method only finds mappings for 5 universities out of the total 79.

In [None]:
uni_canton_df2.Canton.value_counts()

In [None]:
uni_canton_df2.to_csv('data/uni_to_cantons2.csv',index_label='University')