## Geospatial visualization

### Georeferentiation

To proceed with a spatial visualization, coordinates (defined as longitude and latitude) must be fetched. The visualizations are based on the same file used for the [cluster analysis](./cluster.ipynb): it will show the average value among the reference years (a.y. 2015/2016 - 2018/19) for the observations presented in all the four datasets (which coincide with the observation taken in 2016, which is the file containing fewer data).

In [3]:
import pandas as pd

myDf = pd.read_csv("../../data/output/averages.csv")

Georeferentiation is implemented through the library [GeoPy](https://geopy.readthedocs.io/en/stable/). For every university mentioned by the source dataset is associated to its latitude and longitude, which are added as new columns of the dataframe. A brute-force correction is deemed necessary to disambiguate or to better define possible locations (see <i>elif</i> statements). The result is saved in the CSV file which will be used in heatmap visualization. <br>
The reference script for this first section is available [here](../geoinfo.py). 

In [4]:
from geopy.geocoders import Nominatim

latList = []
lonList = []

for idx, row in myDf.iterrows():
    if row["uni"] == "Casamassima - G.Degennaro":
        uniname = "Casamassima"
    elif row["uni"] == "Rozzano (MI) Humanitas University":
        uniname = "Rozzano"
    elif row["uni"] == "Salento":
        uniname = "Lecce"
    elif row["uni"] == "Sannio":
        uniname = "Benevento"
    else:
        uniname = row["uni"]
    loc = Nominatim(user_agent="GetLoc")
    getLoc = loc.geocode(uniname)
    latList.append(getLoc.latitude)
    lonList.append(getLoc.longitude)

myDf["lat"] = latList
myDf["lon"] = lonList
myDf.to_csv("../../data/output/geocordinatesuni.csv", index=False)

### Heatmap visualization

Heatmaps are created through the Python library [folium](https://python-visualization.github.io/folium/), which allows to create geographical visualization and to export them in HTML through the <code>folium.Map.save(...)</code>. <br>
Data can be indeed plotted on maps through the plugin <code>HeatMap</code> of the folium library.

In [5]:
import pandas as pd
import folium
from folium import plugins
from folium.plugins import HeatMap

Visualizations are handled through the function <code>showmap</code>: starting form a dataframe and an input parameter (i.e. the name of the column containg the value to display), it creates the map, converts the data into a heatmap and adds marker for every observation. The <code>layer.Control</code> allow to toggle the visualization of the heatmap and of the marker. 

In [6]:
def showmap(myDf, param):
    if param == "relative_scholarship":
        label = "Scholarship"
    elif param == "paidfee":
        label = "Fee"
    elif param == "perc_intern":
        label = "International students (%)"
    lats_longs_weight = list(map(list, zip(myDf["lat"], myDf["lon"], round(myDf[param], 2))))
    map_obj = folium.Map(location = [43, 11], zoom_start = 6)
    heatmap_layer = HeatMap(lats_longs_weight, name = label).add_to(map_obj)
    

    group1 = folium.FeatureGroup(name='Markers')
    map_obj .add_child(group1)

    for idx, row in myDf.iterrows():
        textmarker = str(row["uni"] + "\n\n"+ "{0}: ".format(label) + str(round(row[param], 2)))
        folium.Marker(
            [row["lat"], row["lon"]], 
            popup= textmarker,
            icon=folium.Icon(color='red', icon='graduation-cap', prefix='fa') 
            ).add_to(group1)
    
    # add layer control to map (allows layers to be turned on or off)
    folium.LayerControl(collapsed=False).add_to(map_obj)
    
    return map_obj



As said before, the visualization will show the average value among the reference years (a.y. 2015/2016 - 2018/19) for the observations presented in all the four datasets. The CSV written in the first part of the code is then read; the data are then plotted and saved as HTML, in order to be linked to the main webpage through the <code>@href</code> attributes. <br>
The first one is referred to the percentage of international students:

In [7]:
geoDf = pd.read_csv("../../../initinere/data/output/geocordinatesuni.csv")

showmap(geoDf, "perc_intern").save("../../../initinere/assets/leafletgeomap/heat_map_perc.html")
showmap(geoDf, "perc_intern")

The second one to the amount of paid fees in euros:

In [8]:
showmap(geoDf, "paidfee").save("../../../initinere/assets/leafletgeomap/heat_map_paidfee.html")
showmap(geoDf, "paidfee")

The third one to the amount of funded scholarships per students in euros:

In [9]:
showmap(geoDf, "relative_scholarship").save("../../../initinere/assets/leafletgeomap/heat_map_dsu.html")
showmap(geoDf, "relative_scholarship")