# Interactive Map of pollution in Occitanie

The goal of this notebook is to provide an interactive map (using `folium`) comparing the level of ozone pollution in Occitanie and in Paris (Paris 13), or only in Occitanie. It will allow you to choose a month and visualize a map with colored circle showing the level of pollution in each station. There could be more than one station per city. In the different parts of these notebook, you will find studies at different time scales.

In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.colors as colors

from numpy import array
from numpy import max
import numpy as np
import pandas as pd
import math
import folium
from download import download

# 1 - Monthly study in 2018:

To cover an entire year, only 2018 is available to study precisely the ozone level in Occitanie and Paris at the same time.

In [2]:
# We choose monthly data
url = "https://opendata.arcgis.com/datasets/3acfa2aa5c0346a18ba7749c6885e503_0.csv"
path_target = "datasets/Mesure_mensuelle_Region_Occitanie_Polluants_Principaux.csv"
download(url, path_target, replace=False)

paris_df = pd.read_csv('PA13_2018.csv', sep=';',
                          comment='#',
                          na_values="n/d",
                          converters={'heure': str})

Replace is False and data exists, so doing nothing. Use replace==True to re-download the data.


### Data treatment:

Treatment of Occitanie data: We select ozone and variables we care about, we also transform with a good format the date.

In [3]:
occ_df = pd.read_csv(path_target)
occ_df = occ_df[occ_df['nom_poll'] == 'O3'] # only ozone
occ_df['month'] = pd.to_datetime(occ_df['date_debut']).dt.to_period('M') # good format for month
variables = ['X', 'Y', 'nom_com', 'nom_station', 'valeur', 'month'] # variables we care about
occ_df = occ_df[variables]



Treatment of Paris data: We reconstruct the data to have the same variables than Occitanie 

In [4]:
paris_df = paris_df[paris_df.date.isna()==False] # delete NaN row
paris_df = paris_df[paris_df['O3']!='n/d'] # no line without O3 data
paris_df['O3'] = paris_df['O3'].astype('float') # convert type data as float
paris_df['month'] = pd.to_datetime(paris_df['date']).dt.to_period('M') # good format for month
par2018 = paris_df.groupby('month').agg({'O3':'mean'}) # We only care about month in this study

par2018['month'] = pd.PeriodIndex(['2018-01', '2018-02', '2018-03', '2018-04', 
                    '2018-05', '2018-06', '2018-07', '2018-08', 
                    '2018-09', '2018-10','2018-11','2018-12'], dtype='period[M]', freq='M')



par2018['nom_com'] = ['PARIS']*12
par2018['nom_station'] = ['Paris 13ème']*12
par2018['X'] = [2.3488]*12
par2018['Y'] = [48.8534]*12
par2018['valeur'] = par2018['O3']
par2018 = par2018[variables]

In [5]:
df_2018 = pd.concat([occ_df, par2018]) # data frame with Paris and Occitanie data

In [6]:
# Standardized data for a good color scale
df_2018['standard'] = (df_2018[['valeur']] - np.mean(df_2018[['valeur']]))/ np.std(df_2018[['valeur']])

The available cities in this exemple are: 

In [7]:
df_2018.nom_com.unique()

array(['MILLAU', 'NIMES', 'BESSIERES', 'PEYRUSSE-VIEILLE', 'SAZE',
       'TOULOUSE', 'MONTGISCARD', 'BIARS-SUR-CERE', 'SAINT-ESTEVE',
       'BELESTA-EN-LAURAGAIS', 'CORNEILHAN', 'AGDE', 'LATTES', 'TARBES',
       'LA CALMETTE', 'SAINT-GELY-DU-FESC', 'MIRAMONT-DE-COMMINGES',
       'MONTPELLIER', 'LOURDES', 'PERPIGNAN', 'FRAISSE-SUR-AGOUT',
       'RODEZ', 'CARCASSONNE', 'MENDE', 'ALBI', 'CASTRES', 'PAMIERS',
       'VALLABREGUES', 'PARIS'], dtype=object)

In [8]:
df_2018.reset_index(drop = True) # to clean the index

Unnamed: 0,X,Y,nom_com,nom_station,valeur,month,standard
0,3.07218,44.1062,MILLAU,Millau Urbain,50.800000,2018-02,-0.605128
1,3.07218,44.1062,MILLAU,Millau Urbain,67.800000,2018-03,0.374121
2,3.07218,44.1062,MILLAU,Millau Urbain,77.700000,2018-04,0.944389
3,3.07218,44.1062,MILLAU,Millau Urbain,66.700000,2018-05,0.310758
4,3.07218,44.1062,MILLAU,Millau Urbain,66.900000,2018-06,0.322278
...,...,...,...,...,...,...,...
310,2.34880,48.8534,PARIS,Paris 13ème,50.294737,2018-08,-0.634232
311,2.34880,48.8534,PARIS,Paris 13ème,39.766667,2018-09,-1.240679
312,2.34880,48.8534,PARIS,Paris 13ème,36.401617,2018-10,-1.434516
313,2.34880,48.8534,PARIS,Paris 13ème,26.974895,2018-11,-1.977522


### Interactive map for different months in 2018:

In [9]:
import branca.colormap as cm

linear = cm.LinearColormap(
    ['green', 'yellow', 'red'],
    vmin=-3, vmax=1
)
# colors

In [10]:
from ipywidgets import interact  # widget manipulation
from IPython.display import HTML

def interactive_map(mois = '2018-02'):
    
    map_2018 = df_2018[df_2018['month'] == mois]
    
    map_int = folium.Map(location = [46, 2.15], 
                         zoom_start = 6, 
                         tiles = 'Stamen Terrain')
    
    for i in range(0, len(map_2018)):
        folium.Circle(
            location = [map_2018.iloc[i]['Y'], map_2018.iloc[i]['X']],
            popup = map_2018.iloc[i]['nom_station'],
            radius = map_2018.iloc[i]['valeur']*500,
            color = 'black',
            fill = True,
            fill_color = linear(map_2018.iloc[i]['standard']),
            fill_opacity = 0.5,
            opacity = 0.4,
        ).add_to(map_int)
    
    return(map_int)

In [11]:
interact(interactive_map, mois=df_2018.month.unique())

interactive(children=(Dropdown(description='mois', options=(Period('2018-02', 'M'), Period('2018-03', 'M'), Pe…

<function __main__.interactive_map(mois='2018-02')>

## 2  - Annual study


Let's now have a look on annual data. The goal here is to check if there are cities more polluted than others in 2018 and 2017. We don't have enough data to study the year 2019.

### Data Import and treatment

In [23]:
annual = 'datasets\Mesure_annuelle_Region_Occitanie_Polluants_Principaux.csv'
occ_1718 = pd.read_csv(annual)
occ_1718 = occ_1718[occ_1718['nom_poll'] == 'O3']
occ_1718['year'] = pd.to_datetime(occ_1718['date_debut']).dt.to_period('Y')
variables2 = ['X', 'Y', 'nom_com', 'nom_station', 'valeur', 'year']
occ_1718 = occ_1718[variables2]

Unnamed: 0,X,Y,nom_com,nom_station,valeur,year
6,1.41861,43.5756,TOULOUSE,Toulouse-Jacquier Urbain,56.3,2018
7,1.41861,43.5756,TOULOUSE,Toulouse-Jacquier Urbain,54.3,2017
24,0.179722,43.6303,PEYRUSSE-VIEILLE,Peyrusse Vieille Rural,68.9,2018
25,0.179722,43.6303,PEYRUSSE-VIEILLE,Peyrusse Vieille Rural,67.3,2017
32,1.43861,43.6236,TOULOUSE,Toulouse-Mazades Urbain,58.1,2018
33,1.43861,43.6236,TOULOUSE,Toulouse-Mazades Urbain,54.6,2017
44,2.14611,43.9281,ALBI,Albi Urbain,54.5,2018
45,2.14611,43.9281,ALBI,Albi Urbain,51.1,2017
54,1.44389,43.5872,TOULOUSE,Toulouse-Berthelot Urbain,57.0,2018
55,1.44389,43.5872,TOULOUSE,Toulouse-Berthelot Urbain,54.9,2017


In [13]:
paris_ann = pd.read_csv('PA13.csv', sep=';',
                          comment='#',
                          na_values="n/d",
                          converters={'heure': str})


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
