# Interactive Map of pollution in Occitanie

The goal of this notebook is to provide an interactive map (using `folium`) comparing the level of ozone pollution in Occitanie and in Paris (Paris 13), or only in Occitanie. It will allow you to choose a month and visualize a map with colored circle showing the level of pollution in each station. There could be more than one station per city. In the different parts of these notebook, you will find studies at different time scales.

In [15]:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.colors as colors

from numpy import array
from numpy import max
import numpy as np
import pandas as pd
import math
import folium
from folium import IFrame
from download import download
import branca.colormap as cm

# 1 - Monthly study in 2018

To cover an entire year, only 2018 is available to study precisely the ozone level in Occitanie and Paris at the same time.

In [2]:
# We choose monthly data
url = "https://opendata.arcgis.com/datasets/3acfa2aa5c0346a18ba7749c6885e503_0.csv"
path_target = "datasets/Mesure_mensuelle_Region_Occitanie_Polluants_Principaux.csv"
download(url, path_target, replace=False)

paris_df = pd.read_csv('datasets/PA13_2018.csv', sep=';',
                          comment='#',
                          na_values="n/d",
                          converters={'heure': str})

Replace is False and data exists, so doing nothing. Use replace==True to re-download the data.


### Data treatment:

Treatment of Occitanie data: We select ozone and variables we care about, we also transform with a good format the date.

In [3]:
occ_df = pd.read_csv(path_target)
occ_df = occ_df[occ_df['nom_poll'] == 'O3'] # only ozone
occ_df['month'] = pd.to_datetime(occ_df['date_debut']).dt.to_period('M') # good format for month
variables = ['X', 'Y', 'nom_com', 'nom_station', 'valeur', 'month'] # variables we care about
occ_df = occ_df[variables]



January is missing for 2018:

In [4]:
occ_df.month.unique()

<PeriodArray>
['2018-02', '2018-03', '2018-04', '2018-05', '2018-06', '2018-07', '2018-08',
 '2018-09', '2018-10', '2018-11', '2018-12']
Length: 11, dtype: period[M]

Treatment of Paris data: We reconstruct the data to have the same variables than Occitanie 

In [5]:
paris_df = paris_df[paris_df.date.isna()==False] # delete NaN row
paris_df = paris_df[paris_df['O3']!='n/d'] # no line without O3 data
paris_df['O3'] = paris_df['O3'].astype('float') # convert type data as float
paris_df['month'] = pd.to_datetime(paris_df['date']).dt.to_period('M') # good format for month
par2018 = paris_df.groupby('month').agg({'O3':'mean'}) # We only care about month in this study

par2018['month'] = pd.PeriodIndex(['2018-01', '2018-02', '2018-03', '2018-04', 
                    '2018-05', '2018-06', '2018-07', '2018-08', 
                    '2018-09', '2018-10','2018-11','2018-12'], dtype='period[M]', freq='M')

par2018['nom_com'] = ['PARIS']*12
par2018['nom_station'] = ['Paris 13ème']*12
par2018['X'] = [2.3488]*12
par2018['Y'] = [48.8534]*12
par2018['valeur'] = par2018['O3']
par2018 = par2018[variables]
par2018 = par2018.iloc[1:12]

In [6]:
df_2018 = pd.concat([occ_df, par2018]) # data frame with Paris and Occitanie data
df_2018.reset_index(drop = True) # to clean the index

Unnamed: 0,X,Y,nom_com,nom_station,valeur,month
0,3.07218,44.1062,MILLAU,Millau Urbain,50.800000,2018-02
1,3.07218,44.1062,MILLAU,Millau Urbain,67.800000,2018-03
2,3.07218,44.1062,MILLAU,Millau Urbain,77.700000,2018-04
3,3.07218,44.1062,MILLAU,Millau Urbain,66.700000,2018-05
4,3.07218,44.1062,MILLAU,Millau Urbain,66.900000,2018-06
...,...,...,...,...,...,...
309,2.34880,48.8534,PARIS,Paris 13ème,50.294737,2018-08
310,2.34880,48.8534,PARIS,Paris 13ème,39.766667,2018-09
311,2.34880,48.8534,PARIS,Paris 13ème,36.401617,2018-10
312,2.34880,48.8534,PARIS,Paris 13ème,26.974895,2018-11


In [7]:
# Standardized data for a good color scale
df_2018['standard'] = (df_2018[['valeur']] - np.mean(df_2018[['valeur']]))/ np.std(df_2018[['valeur']])

The available cities in this exemple are: 

In [8]:
df_2018.nom_com.unique()

array(['MILLAU', 'NIMES', 'BESSIERES', 'PEYRUSSE-VIEILLE', 'SAZE',
       'TOULOUSE', 'MONTGISCARD', 'BIARS-SUR-CERE', 'SAINT-ESTEVE',
       'BELESTA-EN-LAURAGAIS', 'CORNEILHAN', 'AGDE', 'LATTES', 'TARBES',
       'LA CALMETTE', 'SAINT-GELY-DU-FESC', 'MIRAMONT-DE-COMMINGES',
       'MONTPELLIER', 'LOURDES', 'PERPIGNAN', 'FRAISSE-SUR-AGOUT',
       'RODEZ', 'CARCASSONNE', 'MENDE', 'ALBI', 'CASTRES', 'PAMIERS',
       'VALLABREGUES', 'PARIS'], dtype=object)

### Interactive map for different months in 2018:

In [9]:
import branca.colormap as cm

linear = cm.LinearColormap(
    ['green', 'yellow', 'red'],
    vmin=-3.5, vmax=1.5
)
# colors

In [10]:
from ipywidgets import interact  # widget manipulation
from IPython.display import HTML

def interactive_map(mois = '2018-02'):
    
    map_2018 = df_2018[df_2018['month'] == mois]
    
    map_int = folium.Map(location = [46, 2.15], 
                         zoom_start = 6, 
                         tiles = 'Stamen Terrain')
    
    for i in range(0, len(map_2018)):
        folium.Circle(
            location = [map_2018.iloc[i]['Y'], map_2018.iloc[i]['X']],
            popup = map_2018.iloc[i]['nom_station'],
            radius = map_2018.iloc[i]['valeur']*500,
            color = 'black',
            fill = True,
            fill_color = linear(map_2018.iloc[i]['standard']),
            fill_opacity = 0.5,
            opacity = 0.4,
        ).add_to(map_int)
    
    return(map_int)

In [11]:
interact(interactive_map, mois=df_2018.month.unique())

interactive(children=(Dropdown(description='mois', options=(Period('2018-02', 'M'), Period('2018-03', 'M'), Pe…

<function __main__.interactive_map(mois='2018-02')>

## 2  - Annual study


Let's now have a look on annual data. The goal here is to check if there are cities more polluted than others in 2018 and 2017. We don't have enough data to study the year 2019.

### Data Import and treatment:

In [12]:
annual = 'datasets\Mesure_annuelle_Region_Occitanie_Polluants_Principaux.csv'
occ_1718 = pd.read_csv(annual)
occ_1718 = occ_1718[occ_1718['nom_poll'] == 'O3']
occ_1718['year'] = pd.to_datetime(occ_1718['date_debut']).dt.to_period('Y')
variables2 = ['X', 'Y', 'nom_com', 'nom_station', 'valeur', 'year']
occ_1718 = occ_1718[variables2]

In [13]:
paris_ann = pd.read_csv('datasets\PA13_1718.csv', sep=';',
                          comment='#',
                          na_values="n/d",
                          converters={'heure': str})
paris_ann = paris_ann[paris_ann.date.isna()==False]
paris_ann = paris_ann[paris_ann['O3']!='n/d']
paris_ann['O3'] = paris_ann['O3'].astype('float')
paris_ann['year'] = pd.to_datetime(paris_ann['date']).dt.to_period('Y')
par1718 = paris_ann.groupby('year').agg({'O3':'mean'})
par1718['year'] = pd.PeriodIndex(['2017', '2018'], dtype='period[Y]', freq='Y')
par1718['nom_com'] = ['PARIS']*2
par1718['nom_station'] = ['Paris 13ème']*2
par1718['X'] = [2.3488]*2
par1718['Y'] = [48.8534]*2
par1718['valeur'] = par1718['O3']
par1718 = par1718[variables2]

In [14]:
df_1718 = pd.concat([occ_1718, par1718])
df_1718['standard'] = (df_1718[['valeur']] - np.mean(df_1718[['valeur']]))/ np.std(df_1718[['valeur']])
df_1718.reset_index(drop = True)

Unnamed: 0,X,Y,nom_com,nom_station,valeur,year,standard
0,1.41861,43.5756,TOULOUSE,Toulouse-Jacquier Urbain,56.3,2018,-0.615187
1,1.41861,43.5756,TOULOUSE,Toulouse-Jacquier Urbain,54.3,2017,-0.910936
2,0.179722,43.6303,PEYRUSSE-VIEILLE,Peyrusse Vieille Rural,68.9,2018,1.24803
3,0.179722,43.6303,PEYRUSSE-VIEILLE,Peyrusse Vieille Rural,67.3,2017,1.011431
4,1.43861,43.6236,TOULOUSE,Toulouse-Mazades Urbain,58.1,2018,-0.349013
5,1.43861,43.6236,TOULOUSE,Toulouse-Mazades Urbain,54.6,2017,-0.866574
6,2.14611,43.9281,ALBI,Albi Urbain,54.5,2018,-0.881361
7,2.14611,43.9281,ALBI,Albi Urbain,51.1,2017,-1.384134
8,1.44389,43.5872,TOULOUSE,Toulouse-Berthelot Urbain,57.0,2018,-0.511675
9,1.44389,43.5872,TOULOUSE,Toulouse-Berthelot Urbain,54.9,2017,-0.822211


### Interactive annual map for Occitanie and Paris:

In [15]:
def interactive_map2(an = '2018'):
    
    map_1718 = df_1718[df_1718['year'] == an]
    
    map_int2 = folium.Map(location = [46, 2.15], 
                         zoom_start = 6, 
                         tiles = 'Stamen Terrain')
    
    for i in range(0, len(map_1718)):
        folium.Circle(
            location = [map_1718.iloc[i]['Y'], map_1718.iloc[i]['X']],
            popup = map_1718.iloc[i]['nom_station'],
            radius = map_1718.iloc[i]['valeur']*500,
            color = 'black',
            fill = True,
            fill_color = linear(map_1718.iloc[i]['standard']),
            fill_opacity = 0.5,
            opacity = 0.4,
        ).add_to(map_int2)
    
    return(map_int2)

In [16]:
interact(interactive_map2, an=df_1718.year.unique())

interactive(children=(Dropdown(description='an', options=(Period('2018', 'A-DEC'), Period('2017', 'A-DEC')), v…

<function __main__.interactive_map2(an='2018')>

# 3 - Occitanie and all his polluting particles

Let's now focus on Occitanie, but not only for ozone particles in 2018

### Data import and treatment:

In [17]:
pollu = 'datasets\Mesure_annuelle_Region_Occitanie_Polluants_Principaux.csv'
occ_d = pd.read_csv(pollu)
occ_d['year'] = pd.to_datetime(occ_d['date_debut']).dt.to_period('Y')
variables3 = ['X', 'Y', 'nom_com', 'nom_station', 'nom_poll', 'valeur', 'year']
occ_d = occ_d[variables3]
occ_d = occ_d[occ_d['year'] == '2018']


There are not enough data for H2S, SO2 and PM2.5, so we delete these variables:

In [18]:
occ_d = occ_d[occ_d['nom_poll']!='H2S']
occ_d = occ_d[occ_d['nom_poll']!='SO2']
occ_d = occ_d[occ_d['nom_poll']!='PM2.5']
occ_d.nom_poll.unique()

array(['NO', 'NO2', 'NOX as NO2', 'O3', 'PM10'], dtype=object)

We have the following polluting particles:

In [19]:
occ_mm = occ_d.groupby('nom_poll').agg({'valeur': ['min', 'max']})
occ_mm.loc['NO'][1]
occ_mm

Unnamed: 0_level_0,valeur,valeur
Unnamed: 0_level_1,min,max
nom_poll,Unnamed: 1_level_2,Unnamed: 2_level_2
NO,0.05,83.0
NO2,1.12,67.7
NOX as NO2,1.2,193.2
O3,54.1,71.0
PM10,12.4,27.5


### Interactive map for differents particles in 2018 for Occitanie: 

In [20]:
def interactive_map3(poll):
    
    map_d = occ_d[occ_d['nom_poll'] == poll]
    
    map_int3 = folium.Map(location = [43, 2], 
                         zoom_start = 7.4, 
                         tiles = 'Stamen Terrain')
    
    linear3 = cm.LinearColormap(
        ['green', 'yellow', 'red'],
        vmin=occ_mm.loc['NO'][0], vmax=occ_mm.loc['NO'][1]
)
    
    for i in range(0, len(map_d)):
        folium.Circle(
            location = [map_d.iloc[i]['Y'], map_d.iloc[i]['X']],
            popup = map_d.iloc[i]['nom_station'],
            radius = 10000,
            color = 'black',
            fill = True,
            fill_color = linear3(map_d.iloc[i]['valeur']),
            fill_opacity = 0.5,
            opacity = 0.4,
        ).add_to(map_int3)
    
    return(map_int3)

In [21]:
interact(interactive_map3, poll=occ_d.nom_poll.unique())

interactive(children=(Dropdown(description='poll', options=('NO', 'NO2', 'NOX as NO2', 'O3', 'PM10'), value='N…

<function __main__.interactive_map3(poll)>