**Exercise 2**

The file limits_IT_province.geojson includes the contour of all the italian "province"
The file polveri.csv includes the data about pollution captured by different sensor in the province of veneto in different years (number of days over limits)

Task: 
- compute the average values measured in the different provinces in 2022 and 2012
- create a choroplet map with the provinces colored with a categorical colormap
- add a symbol map (scattered_geo) with size representing the average number of days over limits in 2022 and color representing the increase/decrease with respect to 2012 (with a smart color mapping)


In [1]:
import pandas as pd
import plotly.express as px
import numpy as np

pm25 = pd.read_csv("polveri.csv")
pm25.head()

#need to group, average

Unnamed: 0,Provincia,Comune,Stazionediriferimento,CodiceStazione,Tipologiastazione,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,Belluno,Belluno,BL_ParcoCittàdiBologna,IT1594A,FU,22.0,19.0,17.0,16.0,16.0,14.0,15.0,13.0,15.0,14.0,13.0,13.0,13.0,14.0
1,Belluno,Feltre,AreaFeltrina,IT1619A,FU/FS,27.0,24.0,25.0,23.0,22.0,18.0,21.0,20.0,21.0,18.0,18.0,19.0,16.0,16.0
2,Padova,Padova,PD_Mandria,IT1453A,FU,32.0,31.0,34.0,32.0,28.0,24.0,31.0,30.0,34.0,27.0,24.0,25.0,21.0,23.0
3,Padova,Padova,PD_aps1,99902,IU,32.0,33.0,37.0,29.0,27.0,23.0,28.0,25.0,29.0,26.0,26.0,28.0,24.0,25.0
4,Padova,Padova,PD_aps2,99903,IU,29.0,26.0,29.0,28.0,26.0,22.0,28.0,24.0,26.0,24.0,24.0,25.0,22.0,24.0


**First question:** compute the average values measured in the different provinces in 2022 and 2012

In [2]:
# Compute the average values for each provincia in 2022
avg_2022 = pm25.groupby('Provincia')['2022'].mean().reset_index()
avg_2022.columns = ['Provincia', 'Avg_2022']

# Compute the average values for each provincia in 2012
avg_2012 = pm25.groupby('Provincia')['2012'].mean().reset_index()
avg_2012.columns = ['Provincia', 'Avg_2012']

# Merge the two dataframes
avg_values = pd.merge(avg_2022, avg_2012, on='Provincia')

avg_values

Unnamed: 0,Provincia,Avg_2022,Avg_2012
0,Belluno,15.0,19.5
1,Padova,21.2,29.666667
2,Rovigo,23.0,22.0
3,Treviso,18.0,26.0
4,Venezia,21.333333,30.0
5,Verona,18.0,24.0
6,Vicenza,19.75,24.5


**Second question:** Create a choropleth map with each “provincia” is represented with a categorical color

In [3]:
import json
province = "limits_IT_provinces.geojson"
fig = px.choropleth(data_frame=avg_values,
                    geojson=province, 
                    locations='Provincia',
                    featureidkey="properties.prov_name",
                    color='Provincia',
                    scope="europe",
                    )

fig.update_geos(showcountries=False, showcoastlines=False, showland=False, fitbounds="locations")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

what's wrong in the previous visualization?

A categorical map is useful for visually distinguishing provinces by assigning each a unique color, but it does not allow for the representation of numerical values or the analysis of quantitative differences between them. This approach is suitable for highlighting geographical boundaries or classifications, but it limits the ability to identify patterns or trends in the data. A quantitative map, on the other hand, uses a continuous color scale to represent numerical values, allowing for a more meaningful visualization of data distribution and intensity. By combining both maps, it is possible to meet the categorization requirement while providing a deeper analysis of the information.

To solve this problem, I show the two possible quantitative maps for the mean value of the provinces in 2022 and 2012.

In [4]:
fig = px.choropleth(
    data_frame = avg_values,
    geojson = province, 
    locations = 'Provincia',
    featureidkey = 'properties.prov_name',
    color = 'Avg_2022', 
    color_continuous_scale = px.colors.sequential.Viridis, 
    scope = 'europe',
    hover_data = {'Provincia': True, 'Avg_2022': True, 'Avg_2012': True} 
)

fig.update_geos(
    showcountries = True, #display country borders
    countrycolor = 'black', #set color for country borders
    showcoastlines = False, #hide coastlines for a cleaner look
    showland = False, #hide land background
    fitbounds = 'locations' #adjust the map to fit the province locations
)

fig.update_layout(
    title = {
        'text': 'Average Values by Province (2022)',
        'x': 0.5, 
        'xanchor': 'center', 
        'font': {'size': 20, 'weight': 'bold'} 
    },
    autosize = True, 
    margin = {'r': 0, 't': 50, 'l': 0, 'b': 0}, 
    coloraxis_colorbar = dict(
        title = 'Average (2022)', 
        tickformat = ".2f" 
    ),
    template = 'plotly', 
)

fig.show()

In [7]:
fig = px.choropleth(
    data_frame = avg_values,
    geojson = province, 
    locations = 'Provincia',
    featureidkey = 'properties.prov_name',
    color = 'Avg_2012', 
    color_continuous_scale = px.colors.sequential.Viridis, 
    scope = 'europe',
    hover_data = {'Provincia': True, 'Avg_2022': True, 'Avg_2012': True} 
)

fig.update_geos(
    showcountries = True, 
    countrycolor = 'black', 
    showcoastlines = False, 
    showland = False, 
    fitbounds = 'locations' 
)

fig.update_layout(
    title = {
        'text': 'Average Values by Province (2012)',
        'x': 0.5, 
        'xanchor': 'center', 
        'font': {'size': 20, 'weight': 'bold'} 
    },
    autosize = True, 
    margin = {'r': 0, 't': 50, 'l': 0, 'b': 0}, 
    coloraxis_colorbar = dict(
        title = 'Average (2022)', 
        tickformat = ".2f" 
    ),
    template = 'plotly', 
)

fig.show()

**Third question:** Add a symbol map (scattered_geo) with dot size representing the average number of days over limits in 2022 and color representing the increase/decrease with respect to 2012 (think of an optimal colormap to highlight improvements/deteriorations)

In [8]:
# Calculate the difference between 2022 and 2012
avg_values['Difference'] = avg_values['Avg_2022'] - avg_values['Avg_2012']
avg_values

Unnamed: 0,Provincia,Avg_2022,Avg_2012,Difference
0,Belluno,15.0,19.5,-4.5
1,Padova,21.2,29.666667,-8.466667
2,Rovigo,23.0,22.0,1.0
3,Treviso,18.0,26.0,-8.0
4,Venezia,21.333333,30.0,-8.666667
5,Verona,18.0,24.0,-6.0
6,Vicenza,19.75,24.5,-4.75


In [9]:
fig_choropleth = px.choropleth(data_frame=avg_values,
                               geojson=province,
                               locations='Provincia',
                               featureidkey="properties.prov_name",
                               color='Provincia'
                               )

fig_choropleth.update_geos(showcountries=False, showcoastlines=False, showland=False, fitbounds="locations")
fig_choropleth.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig_scatter_geo = px.scatter_geo(avg_values,
                                 geojson=province,
                                 locations='Provincia',
                                 featureidkey="properties.prov_name",
                                 size='Avg_2022',
                                 color='Difference',
                                 color_continuous_scale=px.colors.diverging.RdBu,
                                 size_max=12,
                                 projection="mercator"
                                 )

fig_choropleth.update_layout(coloraxis_colorbar=dict(thickness=25, len=0.5, yanchor='bottom', y=0.05),coloraxis_colorbar_title='Difference')
fig_choropleth.add_trace(fig_scatter_geo.data[0])
for i, frame in enumerate(fig_choropleth.frames):
    fig_choropleth.frames[i].data += (fig_scatter_geo.frames[i].data[0],)
fig_choropleth.show()