# 06-Students

Given all the schools in Trentino, this notebook aims to scrape the number of students per school and per municipality, divided over gender and age. In the end, an interactive map is created with different layers, offering insights about:

* the number of students per municipality;
* the total population per municipality;
* the number of schools per municipality;
* the ratio students/population;
* the ratio students/population under 20 years old;
* the mean of students per school per municipality.

Data about students can be scraped from aprilascuola API given the provincial code of each institute, while they can be seen also from the dataset kindly offered by the Department of education and culture of Trentino Province after few meetings, with fresher data than those in the API. 


In [1]:
# Libraries
from geojson import FeatureCollection
import os
from turtle import left
import geojson
import numpy as np
import requests
import geopandas as gpd
import folium
import pandas as pd

# Ignoring all warnings
import warnings
warnings.filterwarnings('ignore')


## ISTAT population data preprocessing

Let's start by loading the ISTAT population data downloaded from the following [link](http://dati.istat.it/Index.aspx?QueryId=19101), only referred to Trentino's province at 1st January 2022. In this way, we obtain the population per age per each municipality in Trentino:

In [2]:
# Importing Trentino Population's Data
df = pd.read_csv(
    "../data/population/ISTAT_Trentino_population.csv", dtype="str")


In [3]:
df

Unnamed: 0,ITTER107,Territorio,TIPO_DATO15,Tipo di indicatore demografico,SEXISTAT1,Sesso,ETA1,Età,STATCIV2,Stato civile,TIME,Seleziona periodo,Value,Flag Codes,Flags
0,ITD2,Provincia Autonoma Trento,JAN,popolazione al 1º gennaio,1,maschi,Y0,0 anni,1,nubile/celibe,2021,2021,2098,,
1,ITD2,Provincia Autonoma Trento,JAN,popolazione al 1º gennaio,1,maschi,Y0,0 anni,99,totale,2021,2021,2098,,
2,ITD2,Provincia Autonoma Trento,JAN,popolazione al 1º gennaio,2,femmine,Y0,0 anni,1,nubile/celibe,2021,2021,1946,,
3,ITD2,Provincia Autonoma Trento,JAN,popolazione al 1º gennaio,2,femmine,Y0,0 anni,99,totale,2021,2021,1946,,
4,ITD2,Provincia Autonoma Trento,JAN,popolazione al 1º gennaio,9,totale,Y0,0 anni,1,nubile/celibe,2021,2021,4044,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
359851,022253,Novella,JAN,popolazione al 1º gennaio,9,totale,TOTAL,totale,2,coniugata/o,2021,2021,1703,,
359852,022254,Ville di Fiemme,JAN,popolazione al 1º gennaio,1,maschi,TOTAL,totale,3,divorziata/o,2021,2021,36,,
359853,022254,Ville di Fiemme,JAN,popolazione al 1º gennaio,2,femmine,TOTAL,totale,2,coniugata/o,2021,2021,571,,
359854,022254,Ville di Fiemme,JAN,popolazione al 1º gennaio,9,totale,TOTAL,totale,1,nubile/celibe,2021,2021,1235,,


Since we're interested in municipalities' data, we can erase Regional and Provincial cumulative data, and also the rows with cumulative data about age or civil state. 

In [4]:
# Get data about municipalities, removing
# data about the region and the province (cumulative)
df = df[(df['ITTER107'] != "ITD20") & (df['ITTER107'] != "ITD2") &
        (df['ETA1'] != "TOTAL") & (df['Stato civile'] == "totale")]

In [5]:
df

Unnamed: 0,ITTER107,Territorio,TIPO_DATO15,Tipo di indicatore demografico,SEXISTAT1,Sesso,ETA1,Età,STATCIV2,Stato civile,TIME,Seleziona periodo,Value,Flag Codes,Flags
13,022205,Trento,JAN,popolazione al 1º gennaio,1,maschi,Y0,0 anni,99,totale,2021,2021,483,,
15,022205,Trento,JAN,popolazione al 1º gennaio,2,femmine,Y0,0 anni,99,totale,2021,2021,417,,
17,022205,Trento,JAN,popolazione al 1º gennaio,9,totale,Y0,0 anni,99,totale,2021,2021,900,,
19,022001,Ala,JAN,popolazione al 1º gennaio,1,maschi,Y0,0 anni,99,totale,2021,2021,27,,
21,022001,Ala,JAN,popolazione al 1º gennaio,2,femmine,Y0,0 anni,99,totale,2021,2021,29,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
354343,022254,Ville di Fiemme,JAN,popolazione al 1º gennaio,2,femmine,Y92,92 anni,99,totale,2021,2021,1,,
354344,022252,Borgo d'Anaunia,JAN,popolazione al 1º gennaio,1,maschi,Y94,94 anni,99,totale,2021,2021,1,,
354345,022253,Novella,JAN,popolazione al 1º gennaio,9,totale,Y94,94 anni,99,totale,2021,2021,8,,
354346,022254,Ville di Fiemme,JAN,popolazione al 1º gennaio,2,femmine,Y94,94 anni,99,totale,2021,2021,1,,


Then, erase useless columns and rename some of them (*Id, Comune, Anni* and *Popolazione*). Since the age is inserted with the pattern "x anni", we can split the column according to the space and just consider the integer, to save all in the column *Anni*. We can convert the population column to integer too and the municipality name in lowercase, except for the first letter of each word. 

In [6]:
# Removing unnecessary columns
df.drop(['Flag Codes', 'Flags',
         'Seleziona periodo', "TIPO_DATO15", 'STATCIV2', 'Stato civile',
         "Tipo di indicatore demografico", "TIME", "SEXISTAT1", "ETA1"], axis=1, inplace=True)

In [7]:
# Renaming columns
df.rename(columns={
    'ITTER107': 'Id',
    'Territorio': 'Comune',
    'Età': 'Anni',
    'Value': 'Popolazione'
}, inplace=True)

# Converting years and population to int
df['Anni'] = [int(x.split(" ")[0]) for x in df['Anni']]
df['Popolazione'] = df['Popolazione'].astype("int32")

# Converting Municipality to Title
df['Comune'] = [x.title() for x in df['Comune']]

As can be see, we get a dataframe with the ID of the single municipality, its name, the gender, the age and the population associated. 

In [8]:
df

Unnamed: 0,Id,Comune,Sesso,Anni,Popolazione
13,022205,Trento,maschi,0,483
15,022205,Trento,femmine,0,417
17,022205,Trento,totale,0,900
19,022001,Ala,maschi,0,27
21,022001,Ala,femmine,0,29
...,...,...,...,...,...
354343,022254,Ville Di Fiemme,femmine,92,1
354344,022252,Borgo D'Anaunia,maschi,94,1
354345,022253,Novella,totale,94,8
354346,022254,Ville Di Fiemme,femmine,94,1


We can save results inside data directory as `trentino_pop_per_age.csv`:

In [9]:
# Saving the dataframe as csv
df.to_csv("../data/population/trentino_pop_per_age.csv", index=False)

## Merging schools and ISTAT data

Given ISTAT population data, we can merge school's aggregated data with ISTAT ones, but first we need to adjust some municipalities' names and then drop useless information for this task:

In [10]:
# Reading school files
schools = gpd.read_file(
    "../data/Trentino/schools/schools.geojson", geometry="geometry")

# Adjusting some municipalities names to match with ISTAT geojson
to_replace = {
    "Baselga Di Pine'": 'Baselga Di Pinè',
    "Campitello Di Fassa - Ciampedel": "Campitello Di Fassa",
    "Canazei - Cianacei": "Canazei",
    "Fiave'": "Fiavè",
    "Fierozzo - Vlarötz": "Fierozzo",
    "Male'": "Malé",
    'Moena - Moena': "Moena",
    "Rovere' Della Luna": "Roverè Della Luna",
    'San Giovanni Di Fassa - Sen Jan': "San Giovanni Di Fassa",
    'Contá': "Contà",
    "Luserna - Lusérn": "Luserna",
    "Panchia'": "Panchià",
    "Ruffre' - Mendola": "Ruffrè-Mendola",
    "Soraga - Soraga": "Soraga Di Fassa"
}
for key, value in to_replace.items():
    schools.replace(key, value, inplace=True)

In [11]:

# Save file with changes
schools.to_file("../data/Trentino/schools/schools.geojson")

# Drop useless columns for this task
schools.drop(['Id Istituto', 'Telefono', 'Fax', 'Email istituto',
             'Email segreteria', 'Sito web'], axis=1, inplace=True)

## Scraping Students' number in Aprilascuola

As forehead mentioned, we could scrape students' numbers per each school from aprilascuola API. Unfortunately, there is no documentation or instructions on how to use it, but we can get a school's information from a [specific URL](https://aprilascuola.provincia.tn.it/sei//api/istituzioneScolastica/istituzioni/0220227103), given its provincial code.

Starting from the schools dataset, we can create a students dataframe composed of all schools with a suitable ID for aprilascuola API (i.e. we can scrape their students' data). 

*Note that just over half of schools detain an ID, since primary schools are not considered from aprilascuola project.*

In [12]:
# First option: Scraping from Aprilascuola Project
students = schools[['Nome', 'lat', 'lon', 'Tipo Istituto',
                    'Gestione', 'Comune', 'geometry', 'Id']]

# Scraping data about students and classes based on the provincial ID
# Removing those schools with no ID
students = students[~students['Id'].isna()]

The following function, given the provincial ID of a school, retrieves the number of students and classes of that school. Despite aprilascuola offers the number of students and classes per each grade, we will just consider cumulative data (their sum for the entire school).

The scraped data is saved in the pickle file inside the data folder as `students.pkl`.

In [12]:
# Function to gather the number of students
# and classes for the current scolastic year
from tqdm import tqdm
def get_students_and_classes(id):
    if id == None:
        return [np.nan, np.nan]
    else:
        # 1. Get the resource at the url specified
        url = "https://www.istruzione.provincia.tn.it/services/sei/api/v1/institutes/students/{}"
        r = requests.get(url.format(id)).json()
        # 2. Sum students and classes for the current year
        alunni = 0
        classi = 0
        for ordine in r['alunniXClassiAnnoScolasticoCorrente']:
            alunni += ordine['numeroAlunni']
            classi += ordine['numeroClassi']
        return [alunni, classi]


# Inserting students and classes for each school with a provincial code
students[['Studenti', 'Classi']] = [
    get_students_and_classes(x) for x in tqdm(students['Id'])]

# Saving these information
students.to_pickle("../data/Trentino/schools/students.pkl")

NameError: name 'students' is not defined

## Using Dipartimento istruzione e cultura's official data

As forehead mentioned, after few meetings with [Francesco Pisanu](https://www.vivoscuola.it/Il-Dipartimento/Dipartimento-istruzione-e-cultura/Servizio-istruzione/Ufficio-per-la-valutazione-delle-politiche-scolastiche) (director of the department of education and culture of Trentino Province), the department showed its willingness in a collaboration and therefore agreed to share their official data about the number of students and classes per each school. 

*Notice that in further steps we will use these data because fresher than those scraped from aprilascuola. Some differences can be noticed in terms of students and classes for the same schools.*

In [13]:
# Now instead of scraping data from aprilascuola, use the official numbers provided
# by the province of Trento (more reliable)

# Reading aprilascuola data file with students and classes in December 2021
df = pd.read_csv("../data/population/students_per_school.csv",
                 sep=";", dtype=object)

# Renaming columns
df.rename(columns={
    'Istituzione Scolastica': 'Istituto',
    'Ordine Scolastico': 'Tipo Istituto',
    'Scuola/Indirizzo': 'Nome',
    'Scuola/Indirizzo - Codice PAT': 'Id',
    'Numero Iscritti': 'Studenti',
    'Numero Classi': 'Classi'
}, inplace=True)

# Applying some transformations
df['Studenti'] = df['Studenti'].astype("int32")
df['Classi'] = df['Classi'].astype("int32")
df[['Istituto', 'Tipo Istituto', 'Nome']] = df[['Istituto',
                                                'Tipo Istituto', 'Nome']].applymap(lambda s: s.title())

# Erasing duplicated lines
df.drop_duplicates(inplace=True)

In [139]:
df.head(10)

Unnamed: 0,Istituto,Tipo Istituto,Nome,Id,Studenti,Classi
0,Associazione Pedagogica Steineriana - Trento,Primaria,Associazione R. Steiner - Primaria,222052148,117,5
1,Associazione Pedagogica Steineriana - Trento,Secondaria Di Primo Grado,Associazione R. Steiner - Secondaria I Grado,222053018,63,3
2,Collegio Arcivescovile C.Endrici - Trento,Primaria,Scuola Primaria Arcivescovile Trento,222052149,130,6
3,Collegio Arcivescovile C.Endrici - Trento,Secondaria Di Primo Grado,Scuola Secondaria Di Primo Grado Arcivescovile...,222053011,274,11
4,Collegio Arcivescovile C.Endrici - Trento,Secondaria Di Secondo Grado,Istituto Tecnico Per Il Settore Economico,222055432,56,5
5,Collegio Arcivescovile C.Endrici - Trento,Secondaria Di Secondo Grado,Istituto Tecnico Per Il Settore Tecnologico,222055433,65,4
6,Collegio Arcivescovile C.Endrici - Trento,Secondaria Di Secondo Grado,Liceo Classico Arcivescovile - Trento,222057214,46,4
7,Collegio Arcivescovile C.Endrici - Trento,Secondaria Di Secondo Grado,Liceo Linguistico Arcivescovile - Trento,222057138,7,1
8,Collegio Arcivescovile C.Endrici - Trento,Secondaria Di Secondo Grado,Liceo Scientifico Arcivescovile - Trento,222057116,96,6
9,Collegio Arcivescovile Dame Inglesi - Rovereto,Primaria,Scuola Primaria Arcivescovile Dame Inglesi Rov...,221612103,158,8


We can merge our schools data with the one with students and classes, based on the id, school's name and affiliated institute. 

In [14]:
# Merging schools data with students data
students = pd.merge(schools, df, on=["Id", 'Nome', 'Istituto'])
students.drop(['Tipo Istituto_x'], axis=1, inplace=True)
students.rename(columns={'Tipo Istituto_y': 'Tipo Istituto'}, inplace=True)

In [141]:
students.head()

Unnamed: 0,Id,Nome,lat,lon,Istituto,Gestione,Indirizzo,Comune,Codice MIUR,CAP,geometry,Tipo Istituto,Studenti,Classi
0,220629699,Settore Industria E Artigianato,46.369265,11.031445,Centro Formazione Professionale Enaip - Cles,Paritaria,"Via Mitterer, 10",Cles,TNFP251STA,38023,POINT (11.03144 46.36927),Formazione Professionale,160,8
1,221319699,Settore Servizi,46.310258,10.745046,Centro Formazione Professionale Enaip - Ossana,Paritaria,"Via Di S Antonio, 1",Ossana,TNFP252STA,38026,POINT (10.74505 46.31026),Formazione Professionale,104,8
2,222049699,Settore Industria E Artigianato,46.176075,11.833824,Centro Formazione Professionale Enaip - Primiero,Paritaria,"Via Forno - Transacqua, 12",Primiero San Martino Di Castrozza,TNFP253STA,38054,POINT (11.83382 46.17608),Formazione Professionale,27,3
3,222049698,Settore Servizi,46.176075,11.833824,Centro Formazione Professionale Enaip - Primiero,Paritaria,"Via Forno - Transacqua, 12",Primiero San Martino Di Castrozza,TNFP253STA,38054,POINT (11.83382 46.17608),Formazione Professionale,48,7
4,221969698,Settore Industria E Artigianato,46.287118,11.511975,Centro Formazione Professionale Enaip - Tesero,Paritaria,"Via Caltrezza, 13",Tesero,TNFP254STA,38038,POINT (11.51198 46.28712),Formazione Professionale,101,5


Let's explore the data.

First, we can see that the students distribution follows a long tail, meaning that Trento and Rovereto have the highest number of students, while the other 129 municipalities have smaller numbers (for instance in less populated areas). On the other side, municipalities as Castelnuovo, Bedollo and San Lorenzo detain less than 70 students. 

*Note that it is possible to select a subset of municipalities from the barplot, such as the ones with the highest/lowest number of students*. 

*Remember that the numbers shown DO NOT include all the actual students in Trentino, but just those in the schools whose data are available.*

In [18]:
import plotly.express as px

# STUDENTS PER MUNICIPALITY
px.bar(students.groupby(['Comune'], as_index=False).sum().sort_values(['Studenti'], ascending=False).head(100), 
       x='Comune',y='Studenti')

Speaking of the number of classes, it comes natural that the more students there are, the more classes you will need (or the more students there will be per class). In fact, the top 10 municipalities with highest number of classes are the same as before (it may change the order in the ranking). 

However, when looking at the municipalities with the lowest number of classes, most of them have $5$ classes (from the schools of which we hold data). 

In [19]:
import plotly.express as px

# STUDENTS PER MUNICIPALITY
px.bar(students.groupby(['Comune'], as_index=False).sum().sort_values(['Classi'], ascending=False).head(100), 
       x='Comune',y='Classi')

## Aggregated data

We can aggregate students data based on the municipality, summing the number of students and classes inside `stud_agg` and the number of schools inside `stud_schools`:

In [15]:
# Group by Municipality to get the total number of students and classes
stud_agg = students.groupby(['Comune'], as_index=False).sum()[
    ['Comune', 'Studenti', 'Classi']]
stud_agg = stud_agg.set_index('Comune')
stud_agg.head(10)

Unnamed: 0_level_0,Studenti,Classi
Comune,Unnamed: 1_level_1,Unnamed: 2_level_1
Ala,682,34
Albiano,145,9
Aldeno,262,16
Altavalle,59,5
Altopiano Della Vigolana,407,26
Andalo,164,11
Arco,2133,113
Avio,347,18
Baselga Di Pinè,459,24
Bedollo,61,5


In [16]:
# Schools of which we know the students
stud_schools = students.groupby(['Comune']).size().to_frame('Schools')
stud_schools

Unnamed: 0_level_0,Schools
Comune,Unnamed: 1_level_1
Ala,3
Albiano,2
Aldeno,2
Altavalle,1
Altopiano Della Vigolana,4
...,...
Villa Lagarina,2
Ville D'Anaunia,3
Ville Di Fiemme,2
Volano,1


In the following chunks, our goal will be to create a geojson with aggregated information for each municipality, which will be needed for the map inside this notebook, as well as [R analysis](07-SpatialRegression.Rmd). 

First, we download the ISTAT boundaries for each municipality (if you followed the previous notebooks, this directory should already exist inside data folder.)

In [17]:
# If data is not downloaded yet, request from ISTAT
if not os.path.exists('../data/Limiti01012021_g'):
    # download the data
    import requests
    import zipfile
    import io
    zip_file_url = 'https://www.istat.it/storage/cartografia/confini_amministrativi/generalizzati/Limiti01012021_g.zip'
    # request the file
    r = requests.get(zip_file_url, verify=False)
    z = zipfile.ZipFile(io.BytesIO(r.content))
    # unzip the file
    z.extractall("../data/")

Then we can read municipality file, select the Trentino's province (`COD_PROV = 22`), set the proper CRS and select only municipality name, id and polygon. Notice that some municipalities detain multiple territories with a specific area, also called Multipolygons. 

In [18]:
trentino = gpd.read_file(
    "../data/Limiti01012021_g/Com01012021_g", encoding="utf-8")
trentino = trentino[trentino['COD_PROV'] == 22]
trentino = trentino.to_crs(4326)
trentino = trentino[['COMUNE', 'PRO_COM_T', 'geometry']].reset_index(drop=True)
trentino.rename(columns={
    'COMUNE': 'Comune',
    'PRO_COM_T': 'Id'
}, inplace=True)
trentino['Comune'] = [x.title() for x in trentino['Comune']]

In [24]:
trentino.head(5)

Unnamed: 0,Comune,Id,geometry
0,Ala,22001,"POLYGON ((11.00066 45.82692, 11.00103 45.82634..."
1,Albiano,22002,"POLYGON ((11.20754 46.15620, 11.20994 46.15454..."
2,Aldeno,22003,"POLYGON ((11.10303 45.99093, 11.11333 45.98973..."
3,Andalo,22005,"MULTIPOLYGON (((10.95317 46.13797, 10.95315 46..."
4,Arco,22006,"POLYGON ((10.90371 45.98843, 10.90498 45.98776..."


We can use municipalities as index of the dataframe and then create different columns:

* Total schools inside the municipalities (`Scuole totali`);
* Total schools inside the municipalities of which we know the number of students/classes (`Scuole studenti`);
* Students and classes from the aggregated data per municipality (`stud_agg`);
* Mean of students per class (`Media stud per classe`);
* Mean of students per school (`Media stud per scuola`).

In [19]:
trentino.set_index("Comune", inplace=True)
trentino['Scuole totali'] = schools.groupby(['Comune']).size().to_frame("Scuole Totali")
trentino['Scuole studenti'] = students.groupby(['Comune']).size().to_frame("Scuole_studenti")
trentino[['Studenti', 'Classi']] = stud_agg
trentino['Media stud per classe'] = round(
    trentino['Studenti']/trentino['Classi'], 2)
trentino['Media stud per scuola'] = round(
    trentino['Studenti']/trentino['Scuole studenti'], 2)

Next, we can insert some information related to the Population, such as total population, population under 20, ratio of students over population and the ratio of students over the population under 20.

*Notice that we consider 20 as the threshold that separates students from non-students citizens of Trentino. Based on the threshold you may choose, the data and the consequent map could change.*

We start reading the file about Trentino population per age, we convert age and population as integer and then consider the total population inside `pop_age_tot` as the sum of people below 21 years old, with no distinction of gender. 

In [20]:
# Loading data about Trentino Population per age
pop_age = pd.read_csv(
    "../data/population/trentino_pop_per_age.csv", dtype="str")
pop_age.replace("San Giovanni Di Fassa-Sèn Jan",
                "San Giovanni Di Fassa", inplace=True)
pop_age['Anni'] = pop_age['Anni'].astype("int32")
pop_age['Popolazione'] = pop_age['Popolazione'].astype("int32")

pop_age

Unnamed: 0,Id,Comune,Sesso,Anni,Popolazione
0,022205,Trento,maschi,0,483
1,022205,Trento,femmine,0,417
2,022205,Trento,totale,0,900
3,022001,Ala,maschi,0,27
4,022001,Ala,femmine,0,29
...,...,...,...,...,...
50293,022254,Ville Di Fiemme,femmine,92,1
50294,022252,Borgo D'Anaunia,maschi,94,1
50295,022253,Novella,totale,94,8
50296,022254,Ville Di Fiemme,femmine,94,1


We can then group for the total age and obtain the total population and the population under 21 (20 included).

In [21]:
# Grouping by municipality and keeping only data of people below 22 years
pop_age_tot = pop_age[(pop_age['Sesso'] == "totale")].groupby(['Comune']).sum()
pop_age_tot['Pop under 20'] = pop_age[(pop_age['Sesso'] == "totale") & 
                                      (pop_age['Anni'] <= 20) & (pop_age['Anni'] >= 5)].groupby(['Comune']).sum()['Popolazione']

pop_age_tot.drop(['Anni'], axis=1, inplace=True)
pop_age_tot

Unnamed: 0_level_0,Popolazione,Pop under 20
Comune,Unnamed: 1_level_1,Unnamed: 2_level_1
Ala,8792,1517
Albiano,1500,254
Aldeno,3187,517
Altavalle,1612,253
Altopiano Della Vigolana,5074,888
...,...,...
Villa Lagarina,3825,695
Ville D'Anaunia,4736,683
Ville Di Fiemme,2631,436
Volano,3020,489


We can assign these two columns to `trentino` dataframe, together with the ratio of students population (under 21) over total population and students over population under 21. We can then save these aggregated data both as ESRI shapefile and geojson. 

In [22]:
trentino[['Popolazione', 'Pop under 20']]=pop_age_tot
trentino['Pop_stud/Pop_tot']=round(trentino['Pop under 20'] /
                                   trentino['Popolazione'], 2)
trentino['Stud/Pop_stud']=round(trentino['Studenti'] /
                                trentino['Pop under 20'], 2)

trentino=trentino.fillna(np.nan)
trentino.to_file("../data/aggregated_data_per_municipality.geojson")
trentino.to_file("../data/aggregated_data_per_municipality",
                 driver="ESRI Shapefile")


In [156]:
trentino

Unnamed: 0_level_0,Id,geometry,Scuole totali,Scuole studenti,Studenti,Classi,Media stud per classe,Media stud per scuola,Popolazione,Pop under 20,Pop_stud/Pop_tot,Stud/Pop_stud
Comune,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Ala,022001,"POLYGON ((11.00066 45.82692, 11.00103 45.82634...",8.0,3.0,682.0,34.0,20.06,227.33,8792,1872,0.21,0.36
Albiano,022002,"POLYGON ((11.20754 46.15620, 11.20994 46.15454...",3.0,2.0,145.0,9.0,16.11,72.50,1500,323,0.22,0.45
Aldeno,022003,"POLYGON ((11.10303 45.99093, 11.11333 45.98973...",3.0,2.0,262.0,16.0,16.38,131.00,3187,644,0.20,0.41
Andalo,022005,"MULTIPOLYGON (((10.95317 46.13797, 10.95315 46...",3.0,2.0,164.0,11.0,14.91,82.00,1268,227,0.18,0.72
Arco,022006,"POLYGON ((10.90371 45.98843, 10.90498 45.98776...",15.0,11.0,2133.0,113.0,18.88,193.91,17798,3595,0.20,0.59
...,...,...,...,...,...,...,...,...,...,...,...,...
San Giovanni Di Fassa,022250,"POLYGON ((11.64313 46.47494, 11.64458 46.47282...",8.0,6.0,764.0,49.0,15.59,127.33,3698,812,0.22,0.94
Terre D'Adige,022251,"MULTIPOLYGON (((11.04934 46.17583, 11.05003 46...",4.0,2.0,191.0,11.0,17.36,95.50,3053,644,0.21,0.30
Borgo D'Anaunia,022252,"MULTIPOLYGON (((11.21582 46.44893, 11.21697 46...",5.0,3.0,348.0,21.0,16.57,116.00,2487,515,0.21,0.68
Novella,022253,"POLYGON ((11.06230 46.49919, 11.06404 46.49319...",5.0,3.0,263.0,16.0,16.44,87.67,3599,685,0.19,0.38


Let's reload the data to check if they work and create a GeoDataframe from them:

In [23]:
# Geodata with information to make popup in the map
geo_data = geojson.load(
    open("../data/aggregated_data_per_municipality.geojson", encoding="utf-8"))

# Creating an aggregated dataframe to use in further analysis
tn = gpd.GeoDataFrame.from_features(FeatureCollection(geo_data))
tn = tn.set_crs("EPSG:4326")


## Interactive Map

We've come to the main part of this notebook, where we will build an interactive folium map with multiple layers (through `folium.FeatureGroup`):

* Number of Students (sum of all schools' students within a municipality);
* Total population;
* Number of schools;
* Ratio of students over the population
* Ratio of students over population under 21. 

Each layer will have its own quantiles for changing gradient based on the specific fill_color gradient chosen. Most of layers will have a gradient from yellow to dark red, while one layer (Studenti/Popolazione Under 20) will use red-white-blue to depict opposite situations in terms of lack/abundance of students over the young population.

For every layer, a Choropleth is added, showing each municipality with a different color and information when hovering. 

*Note that a legend would be useful in this situation, but every layer would need its own and since for students, population and classes there is a long tail distribution (few municipalities with many, many municipalities with few), the legend would have been unreadable for relatively small values.*

*Also, some municipalities are white because of missing values of students and classes.*

In [26]:
# MAP
m = folium.Map(location=[46.1, 11.2],
               zoom_start=9,
               tiles=None,
               overlay=False)

fg1 = folium.FeatureGroup(name='Studenti', overlay=False).add_to(m)
fg2 = folium.FeatureGroup(name='Popolazione', overlay=False).add_to(m)
fg3 = folium.FeatureGroup(name='Scuole', overlay=False).add_to(m)
fg4 = folium.FeatureGroup(name='Studenti/Popolazione', overlay=False).add_to(m)
fg5 = folium.FeatureGroup(
    name='Studenti/Popolazione under 20', overlay=False).add_to(m)
fg6 = folium.FeatureGroup(
    name='Media studenti per scuola', overlay=False).add_to(m)

# STUDENTS LAYER
bins = list(tn["Studenti"].quantile([0, 0.3, 0.7, 0.95, 0.99, 0.995, 1]))
students = folium.Choropleth(
    geo_data=geo_data,
    data=tn,
    columns=['Comune', 'Studenti'],
    key_on='feature.properties.Comune',
    bins=bins,  # use the custom scale we created for legend
    fill_color='YlOrRd',
    nan_fill_color="White",  # Use white color if there is no data available 
    fill_opacity=0.7,
    line_opacity=0.2,
    highlight=True,
    overlay=True)

students.geojson.add_to(fg1)
# Information to visualize when hovering
folium.GeoJsonTooltip(fields=['Comune', 'Studenti', 'Classi',
                              'Scuole totali', 'Media stud per classe',
                              'Media stud per scuola'],
                      aliases=['Comune', 'Studenti', 'Classi',
                               'N. Scuole', 'Media studenti per classe',
                               'Media studenti per scuola']).add_to(students.geojson)

# POPOLATION LAYER
bins = list(tn["Popolazione"].quantile([0, 0.4, 0.7, 0.9, 0.97, 0.99, 1]))
pop = folium.Choropleth(
    geo_data=geo_data,
    name="choropleth",
    data=tn,
    columns=["Comune", "Popolazione"],
    key_on="feature.properties.Comune",
    fill_color="YlOrRd",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Number of students",
    bins=bins,
    highlight=True,
    reset=True,
    nan_fill_color="White",
)
pop.geojson.add_to(fg2)

folium.GeoJsonTooltip(fields=['Comune', 'Popolazione', 'Studenti', 'Scuole totali'],
                      aliases=['Comune', 'Popolazione', 'Studenti', 'N. Scuole']).add_to(pop.geojson)


# NUMBER OF SCHOOLS LAYER
bins = list(tn["Scuole totali"].quantile(
    [0, 0.6, 0.85, 0.95, 0.975, 0.993, 1]))
scu = folium.Choropleth(
    geo_data=geo_data,
    name="choropleth",
    data=tn,
    columns=["Comune", "Scuole totali"],
    key_on="feature.properties.Comune",
    fill_color="YlOrRd",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Number of students",
    bins=bins,
    highlight=True,
    reset=True,
    nan_fill_color="White",
)
scu.geojson.add_to(fg3)

folium.GeoJsonTooltip(fields=['Comune', 'Studenti', 'Classi',
                              'Scuole totali', 'Media stud per classe',
                              'Media stud per scuola', 'Popolazione'],
                      aliases=['Comune', 'Studenti', 'Classi',
                               'N. Scuole', 'Media studenti per classe',
                               'Media studenti per scuola', 'Popolazione']).add_to(scu.geojson)

bins = list(tn['Pop_stud/Pop_tot'].quantile([0, 0.4, 0.72, 0.95, 0.99, 1]))
den = folium.Choropleth(
    geo_data=geo_data,
    name="choropleth",
    data=tn,
    columns=["Comune", 'Pop_stud/Pop_tot'],
    key_on="feature.properties.Comune",
    fill_color="YlOrRd",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Students over population",
    bins=bins,
    highlight=True,
    reset=True,
    nan_fill_color="White",
)
den.geojson.add_to(fg4)

folium.GeoJsonTooltip(fields=['Comune', 'Studenti', 'Pop_stud/Pop_tot',
                              'Scuole totali', 'Media stud per classe',
                              'Media stud per scuola', 'Popolazione'],
                      aliases=['Comune', 'Studenti', 'Studenti/Popolazione',
                               'N. Scuole', 'Media studenti per classe',
                               'Media studenti per scuola', 'Popolazione']).add_to(den.geojson)

bins = list(
    tn["Stud/Pop_stud"].quantile([0, 0.25, 0.50, 0.75, 0.90, 0.945, 1]))
den2 = folium.Choropleth(
    geo_data=geo_data,
    name="choropleth",
    data=tn,
    columns=["Comune", "Stud/Pop_stud"],
    key_on="feature.properties.Comune",
    fill_color="RdBu",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Students over population",
    bins=bins,
    highlight=True,
    reset=True,
    nan_fill_color="White",
)
den2.geojson.add_to(fg5)
folium.GeoJsonTooltip(fields=['Comune', 'Studenti', 'Pop_stud/Pop_tot',
                              'Scuole totali', 'Pop under 20', 'Stud/Pop_stud'],
                      aliases=['Comune', 'Studenti', 'Studenti/Popolazione',
                               'N. Scuole', 'Popolazione (<21 anni)', 'Densità di studenti su popolazione studentesca']).add_to(den2.geojson)

# MEDIA STUDENTI PER SCUOLA
bins = list(
    tn["Media stud per scuola"].quantile([0, 0.25, 0.50, 0.75, 0.90, 0.945, 1]))
med = folium.Choropleth(
    geo_data=geo_data,
    name="choropleth",
    data=tn,
    columns=["Comune", "Media stud per scuola"],
    key_on="feature.properties.Comune",
    fill_color="YlOrRd",
    fill_opacity=0.7,
    line_opacity=0.2,
    bins=bins,
    highlight=True,
    reset=True,
    nan_fill_color="White",
)
med.geojson.add_to(fg6)
folium.GeoJsonTooltip(fields=['Comune', 'Studenti', 'Media stud per scuola'],
                      aliases=['Comune', 'Studenti', 'Media studenti per scuola']).add_to(med.geojson)


folium.TileLayer('cartodbpositron', overlay=True,
                 control=False, name="Light Mode").add_to(m)
folium.LayerControl(collapsed=False).add_to(m)
m
#m.save("../viz/students_population.html")


In [45]:
tn

Unnamed: 0,geometry,Comune,Id,Scuole totali,Scuole studenti,Studenti,Classi,Media stud per classe,Media stud per scuola,Popolazione,Pop under 20,Pop_stud/Pop_tot,Stud/Pop_stud
0,"POLYGON ((11.00066 45.82692, 11.00103 45.82634...",Ala,022001,8.0,3.0,682.0,34.0,20.06,227.33,8792,1517,0.17,0.45
1,"POLYGON ((11.20754 46.15620, 11.20994 46.15454...",Albiano,022002,3.0,2.0,145.0,9.0,16.11,72.50,1500,254,0.17,0.57
2,"POLYGON ((11.10303 45.99093, 11.11332 45.98973...",Aldeno,022003,3.0,2.0,262.0,16.0,16.38,131.00,3187,517,0.16,0.51
3,"MULTIPOLYGON (((10.95317 46.13797, 10.95315 46...",Andalo,022005,3.0,2.0,164.0,11.0,14.91,82.00,1268,181,0.14,0.91
4,"POLYGON ((10.90371 45.98843, 10.90498 45.98776...",Arco,022006,15.0,11.0,2133.0,113.0,18.88,193.91,17798,2887,0.16,0.74
...,...,...,...,...,...,...,...,...,...,...,...,...,...
161,"POLYGON ((11.64313 46.47494, 11.64458 46.47282...",San Giovanni Di Fassa,022250,8.0,6.0,764.0,49.0,15.59,127.33,3698,662,0.18,1.15
162,"MULTIPOLYGON (((11.04934 46.17583, 11.05003 46...",Terre D'Adige,022251,4.0,2.0,191.0,11.0,17.36,95.50,3053,518,0.17,0.37
163,"MULTIPOLYGON (((11.21582 46.44893, 11.21697 46...",Borgo D'Anaunia,022252,5.0,3.0,348.0,21.0,16.57,116.00,2487,413,0.17,0.84
164,"POLYGON ((11.06230 46.49919, 11.06404 46.49319...",Novella,022253,5.0,3.0,263.0,16.0,16.44,87.67,3599,521,0.14,0.50


In [47]:
import plotly.express as px

# STUDENTS PER MUNICIPALITY
px.bar(tn.sort_values(['Pop_stud/Pop_tot']), x='Comune',y='Pop_stud/Pop_tot')

In [54]:
tn[tn['Stud/Pop_stud']<0.2]['Comune']

124    Terragnolo
130    Trambileno
Name: Comune, dtype: object

In [49]:
import plotly.express as px

# STUDENTS PER MUNICIPALITY
px.bar(tn.sort_values(['Stud/Pop_stud']), x='Comune',y='Stud/Pop_stud')

## Additional stats per community

The Trentino Province is divided into 16 communities all over its territory and its municipality belongs to one of them. Inside the file comunita_di_valle.json inside the data folder, there is, for each community, the list of the municipalities it contains.

### Data preparation

We can start by importing the json file with communities as keys and lists of municipalities as values. We can then explode the dataset, obtaining 166 rows (one for each municipality), with its associated community. 

In [27]:
import json

# Load communities file
with open("../data/Trentino/comunita_di_valle.json", "r", encoding='utf-8') as f:
    comunita = json.load(f)
    
# Creating a dataframe
comunita_df = pd.DataFrame()
comunita_df['Comunità'] = comunita.keys()
comunita_df['Comune'] = [value for _, value in comunita.items()]
# Creating a row for each municipality with its community
comunita_df = comunita_df.explode('Comune')
comunita_df['Comune'] = [x.title() for x in comunita_df['Comune']]

In [159]:
comunita_df

Unnamed: 0,Comunità,Comune
0,Comunità Della Valle Dei Laghi,Cavedine
0,Comunità Della Valle Dei Laghi,Madruzzo
0,Comunità Della Valle Dei Laghi,Vallelaghi
1,Comunità Della Paganella,Andalo
1,Comunità Della Paganella,Cavedago
...,...,...
15,Comunità Territoriale della Val di Fiemme,Predazzo
15,Comunità Territoriale della Val di Fiemme,Tesero
15,Comunità Territoriale della Val di Fiemme,Valfloriana
15,Comunità Territoriale della Val di Fiemme,Ville Di Fiemme


We can merge these data with the schools' one saved in tn and convert the dataframe to geodataframe.

In [28]:
comunita_df = pd.merge(comunita_df, tn, how='left', on='Comune')
comunita_df = gpd.GeoDataFrame(comunita_df, geometry = 'geometry')

In [161]:
comunita_df

Unnamed: 0,Comunità,Comune,geometry,Id,Scuole totali,Scuole studenti,Studenti,Classi,Media stud per classe,Media stud per scuola,Popolazione,Pop under 20,Pop_stud/Pop_tot,Stud/Pop_stud
0,Comunità Della Valle Dei Laghi,Cavedine,"POLYGON ((10.95268 46.02056, 10.96121 46.01852...",022053,4.0,2.0,229.0,12.0,19.08,114.50,3003,521,0.17,0.44
1,Comunità Della Valle Dei Laghi,Madruzzo,"POLYGON ((10.97786 46.06112, 10.97732 46.05899...",022243,5.0,2.0,165.0,10.0,16.50,82.50,2933,630,0.21,0.26
2,Comunità Della Valle Dei Laghi,Vallelaghi,"POLYGON ((11.06203 46.15178, 11.06541 46.15078...",022248,7.0,3.0,391.0,21.0,18.62,130.33,5159,1071,0.21,0.37
3,Comunità Della Paganella,Andalo,"MULTIPOLYGON (((10.95317 46.13797, 10.95315 46...",022005,3.0,2.0,164.0,11.0,14.91,82.00,1268,227,0.18,0.72
4,Comunità Della Paganella,Cavedago,"POLYGON ((11.02917 46.20174, 11.03018 46.20190...",022052,1.0,,,,,,545,106,0.19,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
161,Comunità Territoriale della Val di Fiemme,Predazzo,"POLYGON ((11.59264 46.38298, 11.59289 46.38088...",022147,6.0,5.0,580.0,37.0,15.68,116.00,4522,830,0.18,0.70
162,Comunità Territoriale della Val di Fiemme,Tesero,"POLYGON ((11.54557 46.34558, 11.55725 46.35095...",022196,5.0,4.0,535.0,29.0,18.45,133.75,2935,628,0.21,0.85
163,Comunità Territoriale della Val di Fiemme,Valfloriana,"POLYGON ((11.35954 46.26545, 11.37023 46.26245...",022209,2.0,1.0,15.0,1.0,15.00,15.00,458,59,0.13,0.25
164,Comunità Territoriale della Val di Fiemme,Ville Di Fiemme,"POLYGON ((11.49684 46.36064, 11.49898 46.36067...",022254,5.0,2.0,120.0,9.0,13.33,60.00,2631,540,0.21,0.22


Before starting to visualize these data, let's compute the aggregated ones about communities. In particular, we consider Trentino as their parent community and then dissolve the municipalities geographical coordinates to get the entire community territory. 

In [29]:
communities = pd.DataFrame()
communities[['Comune','geometry']] = comunita_df.dissolve(by='Comunità', as_index=False)[['Comunità','geometry']]
communities = gpd.GeoDataFrame(communities, geometry='geometry',crs="EPSG:4326")
communities['Comunità'] = ["Trentino" for x in range(len(communities))]

In [300]:
communities

Unnamed: 0,Comune,geometry,Comunità
0,Comun General de Fascia,"POLYGON ((11.84369 46.40992, 11.84242 46.40864...",Trentino
1,Comunità Alta Valsugana e Bersntol,"POLYGON ((11.31173 45.96053, 11.30380 45.95982...",Trentino
2,Comunità Alto Garda e Ledro,"POLYGON ((10.91672 45.87649, 10.91833 45.87503...",Trentino
3,Comunità Della Paganella,"POLYGON ((10.94694 46.14313, 10.93979 46.14468...",Trentino
4,Comunità Della Vallagarina,"POLYGON ((11.14580 45.71538, 11.14572 45.71490...",Trentino
5,Comunità Della Valle Dei Laghi,"POLYGON ((11.02698 46.00986, 11.02759 46.00580...",Trentino
6,Comunità Delle Giudicarie,"POLYGON ((10.60098 45.80300, 10.60063 45.80291...",Trentino
7,Comunità Rotaliana-Königsberg,"MULTIPOLYGON (((11.13599 46.17917, 11.13579 46...",Trentino
8,Comunità Territoriale della Val di Fiemme,"POLYGON ((11.37209 46.26375, 11.37238 46.26387...",Trentino
9,Comunità Valsugana e Tesino,"POLYGON ((11.49120 46.00855, 11.48968 46.00782...",Trentino


### Communities map

For informational purposes, a map with Trentino communities and the municipalities they contain will be offered. Each community will have its color, as defined in the colors dictionary. Municipalities will inherit their color from the community they belong to. 

In [30]:
colors = {
    "Territorio della Val d'Adige":"#54478C",
    "Comunità Della Vallagarina": "#2C699A",
    "Comunità Alta Valsugana e Bersntol": "#048BA8",
    "Comunità della Val di Non": "#0DB39E",
    "Comunità Delle Giudicarie": "#16DB93",
    "Comunità Valsugana e Tesino": "#83E377",
    "Comunità Della Valle Dei Laghi": "#B9E769", 
    "Comunità Alto Garda e Ledro": "#EFEA5A",
    "Comun General de Fascia": "#F1C453",
    "Comunità Territoriale della Val di Fiemme":"#F29E4C",
    "Comunità di Primiero": "#EA7434",
    "Comunità della Valle di Cembra": "#E35F3B",
    "Comunità della Valle di Sole": "#DF2A39",
    "Comunità Rotaliana-Königsberg":"#E22869",
    "Comunità Della Paganella": "#E95D8E",
    "Magnifica Comunità degli Altipiani Cimbri":"#9F528D"
}

In [31]:
communities['color'] = [colors[x] for x in communities['Comune']]
comunita_df['color'] = [colors[x] for x in comunita_df['Comunità']]

We can build the map with two feature groups: one for communities and one for municipalities.

In [33]:
# Base map
m = folium.Map(location=[46.1, 11.2],
               zoom_start=9,
               tiles=None,
               overlay=False)
# Feature groups
fg1 = folium.FeatureGroup(name = "Comunità")
fg2 = folium.FeatureGroup(name = "Comuni")
    
def style(feature):
    return {
        'fillColor': feature['properties']['color'],
        'color' : feature['properties']['color'],
        'fillOpacity' : 0.2,
        'weight': 1
    }

# Adding communities
folium.GeoJson(communities,
               style_function=style,
               tooltip=folium.features.GeoJsonTooltip(fields = ['Comune'],
                                                      aliases = ['Comunità'],
                                                      labels = True,
                                                      sticky = False)).add_to(fg1)

fg1.add_to(m)

# Adding municipalities
folium.GeoJson(comunita_df,
               style_function=style,
               tooltip=folium.features.GeoJsonTooltip(fields = ['Comune', 'Comunità'], 
                                                      labels = True,
                                                      sticky = False)).add_to(fg2)

fg2.add_to(m)

# Adding light mode layer
folium.TileLayer('cartodbpositron', overlay=True,
                 control=False, name="Light Mode").add_to(m)

# Layer control for selecting communities and/or municipalities
folium.LayerControl(collapsed=False).add_to(m)

m
#m.save("../viz/comunità.html")

Now that we can recognize Trentino's communities and their municipalities, we can compare their schools' numbers through treemaps. This type of visualization has been preferred to others, such as barplots or sunbursts, in order to vary and present a different visualization that may have a major impact on the user. 

In order to build a treemap, we need to define a hierarchy (trentino > communities > municipalities), with respective data, which necessitate to be aggregated if we consider the root of the tree. That's why we are going to compute the sum and the mean of all municipalities values:

In [34]:
communities[['Scuole totali','Scuole studenti','Studenti','Classi','Popolazione','Pop under 20']] = comunita_df.groupby(['Comunità'], as_index=False).sum()[['Scuole totali','Scuole studenti','Studenti','Classi','Popolazione','Pop under 20']]

In [35]:
communities['Media stud per classe'] = communities['Studenti'] / communities['Classi']
communities['Media stud per scuola'] = communities['Studenti'] / communities['Scuole studenti']
communities['Pop_stud/Pop_tot'] = communities['Pop under 20'] / communities['Popolazione']
communities['Stud/Pop_stud'] = communities['Studenti'] / communities['Pop under 20']


In [36]:
communities

Unnamed: 0,Comune,geometry,Comunità,color,Scuole totali,Scuole studenti,Studenti,Classi,Popolazione,Pop under 20,Media stud per classe,Media stud per scuola,Pop_stud/Pop_tot,Stud/Pop_stud
0,Comun General de Fascia,"POLYGON ((11.84369 46.40992, 11.84242 46.40864...",Trentino,#F1C453,16.0,10.0,1205.0,80.0,10393,1613,15.0625,120.5,0.155201,0.747055
1,Comunità Alta Valsugana e Bersntol,"POLYGON ((11.31173 45.96053, 11.30380 45.95982...",Trentino,#048BA8,69.0,45.0,6207.0,351.0,55076,9446,17.683761,137.933333,0.171508,0.657104
2,Comunità Alto Garda e Ledro,"POLYGON ((10.91672 45.87649, 10.91833 45.87503...",Trentino,#EFEA5A,55.0,38.0,6847.0,377.0,51162,8549,18.161804,180.184211,0.167097,0.800912
3,Comunità Della Paganella,"POLYGON ((10.94694 46.14313, 10.93979 46.14468...",Trentino,#E95D8E,11.0,6.0,357.0,25.0,5119,738,14.28,59.5,0.144169,0.48374
4,Comunità Della Vallagarina,"POLYGON ((11.14580 45.71538, 11.14572 45.71490...",Trentino,#2C699A,109.0,67.0,13484.0,741.0,91474,15061,18.197031,201.253731,0.164648,0.895292
5,Comunità Della Valle Dei Laghi,"POLYGON ((11.02698 46.00986, 11.02759 46.00580...",Trentino,#B9E769,16.0,7.0,785.0,43.0,11095,1814,18.255814,112.142857,0.163497,0.432745
6,Comunità Delle Giudicarie,"POLYGON ((10.60098 45.80300, 10.60063 45.80291...",Trentino,#16DB93,65.0,41.0,4364.0,276.0,36859,5981,15.811594,106.439024,0.162267,0.729644
7,Comunità Rotaliana-Königsberg,"MULTIPOLYGON (((11.13599 46.17917, 11.13579 46...",Trentino,#E22869,30.0,21.0,4316.0,243.0,30649,5130,17.761317,205.52381,0.167379,0.841326
8,Comunità Territoriale della Val di Fiemme,"POLYGON ((11.37209 46.26375, 11.37238 46.26387...",Trentino,#F29E4C,38.0,25.0,2349.0,141.0,20065,3228,16.659574,93.96,0.160877,0.727695
9,Comunità Valsugana e Tesino,"POLYGON ((11.49120 46.00855, 11.48968 46.00782...",Trentino,#83E377,48.0,31.0,3047.0,199.0,26861,4064,15.311558,98.290323,0.151297,0.749754


In [37]:
com = pd.concat([comunita_df, communities])

And now let's get all values for Trentino:

In [38]:
total = ['',"Trentino","",""]
total.append(com[com['Comunità']=='Trentino'].sum()[['Scuole totali']].values[0])
total.append(com[com['Comunità']=='Trentino'].sum()[['Scuole studenti']].values[0])
total.append(com[com['Comunità']=='Trentino'].sum()[['Studenti']].values[0])
total.append(com[com['Comunità']=='Trentino'].sum()[['Classi']].values[0])
total.append(total[5]/total[6])
total.append(total[5]/total[4])
total.append(com[com['Comunità']=='Trentino'].sum()[['Popolazione']].values[0])
total.append(com[com['Comunità']=='Trentino'].sum()[['Pop under 20']].values[0])
print(total)
total.append(total[10]/total[9])
total.append(total[5]/total[10])
total.append("#B1BBBE")

['', 'Trentino', '', '', 723.0, 449.0, 71967.0, 4060.0, 0.0062389706393208, 0.6210235131396957, 542166, 86888]


In [39]:
com.loc[len(com)] = total

The following chunk offers an example of treemap based on the number of students, but then we will generate more of them to allow the user choosing the feature to represent and through which compare the communities and municipalities. 

In [40]:
import plotly.express as px

fig = px.treemap(
    com,
    names='Comune',
    parents='Comunità',
    values='Studenti',
    color_discrete_sequence=list(colors.values()),
    branchvalues='total',
    hover_data=['Studenti','Scuole totali','Classi','Popolazione','Pop under 20']
)

fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()

In [41]:
import plotly.express as px
for feature in ['Scuole totali', 'Popolazione', 'Classi', 'Studenti', 'Pop under 20']:
    fig = px.treemap(
        com,
        names='Comune',
        parents='Comunità',
        values=feature,
        color_discrete_sequence=list(colors.values()),
        branchvalues='total',
        hover_data=['Studenti','Scuole totali','Classi','Popolazione','Pop under 20']
    )

    fig.update_traces(root_color="lightgrey")
    fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
    fig.write_html("../viz/trees/"+feature+".html")