# Sujet de stage data sciences AFPA
Les émissions atmosphériques liées à l’industrie, le trafic routier ou encore la météo ont un
impact sur la qualité de l’air. A partir de bases de données en open data (csv, api, etc..),
étudier l’impact des ces paramètres sur la qualité de l’air afin de mettre en évidence un
modèle permettant de prédire un indice de la qualité de l’air.

https://www.atmo-hdf.fr/tout-savoir-sur-l-air/mesures-de-la-qualite-de-l-air.html
https://www.cerema.fr/fr/actualites/emissions-routieres-polluants-atmospheriquescourbes
https://atmo-france.org/les-effets-de-la-meteo-sur-lair/

Quelques bases de données à disposition :

- Données météorologiques
https://public.opendatasoft.com/explore/?sort=modified
Observation météorologique historique France (SYNOP) : données sur différentes années
pour plusieurs zones géographiques , dont Lille-Lesquin. Téléchargement au format csv
(36000 lignes).

- Données sur les polluants
https://data-atmo-hdf.opendata.arcgis.com/
Portail Open Data d’Atmo Hauts-de-France : Données sur les concentrations mesurées,
émissions atmosphériques, indice de la qualité de l’air, épisode de pollution, etc.
- Indice de la qualité de l’air : ind_hdf_agglo (tableau de données et visualisation de la
qualité de l’air en Hauts-de-France sur 1 an glissant. Téléchargement au format csv (4799
lignes). Du 1er janvier 2020 au 1er avril 2021. Données régionnales dont Lille.
- Concentration mesurées : mes hdf journalier poll princ, tableau de données des
concentrations moyennes journalières des principaux polluants. Téléchargement au format
csv (44959 lignes). Données sur 2021 et 2022.
https://ec.europa.eu/eurostat/fr/web/environment/air-emissionsEurostat publie des données sur 2 types d’émissions atmosphériques:
- gaz à effet de serre: ils couvrent 7 gaz, y compris le CO2, à l’origine du changement
climatique.
- Polluants atmosphériques: il s’agit de 7 substances nocives pour la santé humaine (par
exemple, pouvant provoquer des affections respiratoires) et préjudiciables à
l’environnement et à la biodiversité.
https://www.atmo-hdf.fr/tout-savoir-sur-l-air/inventaire-des-emissions-de-polluants.html
https://www.georisques.gouv.fr/risques/registre-des-emissions-polluantes

- Données sur la qualité de l’air
https://atmo-france.org/les-donnees/
Les données open data des AASQA : Les AASQA ont toujours œuvré à la transparence de
l’information sur la qualité de l’air. Grâce à leurs stations de mesure, elles produisent des
données accessibles en accès libre (open data) afin de faciliter leur appropriation et leur
réutilisation par tous.
Elles sont accessibles via Atmo Data, une diffusion agrégée, au niveau national, des
données régionales produites par les associations du réseau ou via les portails open data
de chaque AASQA.
Atmo Data, un accès unique aux données open data produites par les AASQA
Atmo Data s’adresse à un public connaisseur : presse, associations, entreprises privées et
publiques via leurs développeurs, géomaticiens, etc. Elle propose quatre services et accès
aux données : une visualisation cartographique, un widget, une API, et un service Web
Feature Service (WFS) pour la diffusion des données.
Accès à l’API: https://admindata.atmo-france.org/api/doc
Document PDF pour utiliser l’API :
https://atmo-france.org/wp-content/uploads/2022/03/FAQ_API_Atmo_Data_20220330.pdf

- Données sur le trafic routier
http://trafic-routier.data.cerema.fr/metropole-europeenne-de-lille-r75.html
Données sur le trafic routier de la Métropole Européenne de Lille (MEL) dont l’Historique
de comptages Sidero, données en 2021 et 2022.
https://opendata.lillemetropole.fr/explore/dataset/comptage_siredo_historique/
information/?disjunctive.ville
Bison futé https://www.bison-fute.gouv.fr/donnees-sur-le-rrn.htmlTrafic moyen journalier annuel sur le réseau routier national :
https://www.data.gouv.fr/fr/datasets/trafic-moyen-journalier-annuel-sur-le-reseau-routiernational/

# Récupérations et assemblages de dataset - Création d'un jeu de données 

# 1) Importation des bibliothèques

In [1]:
import pandas as pd
import numpy as np

# Regex
import re

# 2) Dataset concernant les émissions polluantes

https://data-atmo-hdf.opendata.arcgis.com/datasets/atmo-hdf::mes-hdf-journalier-poll-princ/explore?location=50.150665%2C2.776740%2C9.46&showTable=true

In [2]:
emission = pd.read_csv('mes_hdf_journalier_poll_princ.csv')
pd.set_option('display.max_columns', None)
emission

Unnamed: 0,X,Y,objectid,nom_dept,nom_com,insee_com,nom_station,code_station,typologie,influence,nom_poll,id_poll_ue,valeur,unite,metrique,date_debut,date_fin,statut_valid,x_wgs84,y_wgs84,x_reg,y_reg,ObjectId2
0,390513.226482,6.508595e+06,755510,NORD,Valenciennes,59606,Valenciennes Acacias,FR06001,urban,2,Particules PM10,5,10.6,ug.m-3,journaliere,2021/05/06 00:00:00+00,2021/05/06 23:59:59+00,t,3.50804,50.3585,736201,7029100.0,1
1,390513.226482,6.508595e+06,755629,NORD,Valenciennes,59606,Valenciennes Acacias,FR06001,urban,2,Particules PM10,5,11.8,ug.m-3,journaliere,2021/05/07 00:00:00+00,2021/05/07 23:59:59+00,t,3.50804,50.3585,736201,7029100.0,2
2,390513.226482,6.508595e+06,755748,NORD,Valenciennes,59606,Valenciennes Acacias,FR06001,urban,2,Particules PM10,5,13.0,ug.m-3,journaliere,2021/05/08 00:00:00+00,2021/05/08 23:59:59+00,t,3.50804,50.3585,736201,7029100.0,3
3,390513.226482,6.508595e+06,755867,NORD,Valenciennes,59606,Valenciennes Acacias,FR06001,urban,2,Particules PM10,5,14.4,ug.m-3,journaliere,2021/05/09 00:00:00+00,2021/05/09 23:59:59+00,t,3.50804,50.3585,736201,7029100.0,4
4,390513.226482,6.508595e+06,755986,NORD,Valenciennes,59606,Valenciennes Acacias,FR06001,urban,2,Particules PM10,5,11.5,ug.m-3,journaliere,2021/05/10 00:00:00+00,2021/05/10 23:59:59+00,t,3.50804,50.3585,736201,7029100.0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
44957,315714.321034,6.520835e+06,803830,PAS-DE-CALAIS,Lens,62498,Lens Varsovie,FR28115,suburban,4,Particules PM10,5,23.0,ug.m-3,journaliere,2022/05/01 00:00:00+00,2022/05/01 23:59:59+00,t,2.83611,50.4286,688338,7036800.0,44958
44958,315714.321034,6.520835e+06,803956,PAS-DE-CALAIS,Lens,62498,Lens Varsovie,FR28115,suburban,4,Particules PM10,5,35.8,ug.m-3,journaliere,2022/05/02 00:00:00+00,2022/05/02 23:59:59+00,t,2.83611,50.4286,688338,7036800.0,44959
44959,315714.321034,6.520835e+06,804207,PAS-DE-CALAIS,Lens,62498,Lens Varsovie,FR28115,suburban,4,Particules PM10,5,28.6,ug.m-3,journaliere,2022/05/03 00:00:00+00,2022/05/03 23:59:59+00,t,2.83611,50.4286,688338,7036800.0,44960
44960,315714.321034,6.520835e+06,804208,PAS-DE-CALAIS,Lens,62498,Lens Varsovie,FR28115,suburban,4,Particules PM10,5,25.2,ug.m-3,journaliere,2022/05/04 00:00:00+00,2022/05/04 23:59:59+00,t,2.83611,50.4286,688338,7036800.0,44961


In [3]:
emission.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44962 entries, 0 to 44961
Data columns (total 23 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   X             44962 non-null  float64
 1   Y             44962 non-null  float64
 2   objectid      44962 non-null  int64  
 3   nom_dept      44962 non-null  object 
 4   nom_com       44962 non-null  object 
 5   insee_com     44962 non-null  int64  
 6   nom_station   44962 non-null  object 
 7   code_station  44962 non-null  object 
 8   typologie     44962 non-null  object 
 9   influence     44962 non-null  int64  
 10  nom_poll      44962 non-null  object 
 11  id_poll_ue    44962 non-null  int64  
 12  valeur        38727 non-null  float64
 13  unite         44962 non-null  object 
 14  metrique      44962 non-null  object 
 15  date_debut    44962 non-null  object 
 16  date_fin      44962 non-null  object 
 17  statut_valid  44962 non-null  object 
 18  x_wgs84       44962 non-nu

In [4]:
emission['nom_dept'].value_counts()

NORD             20075
PAS-DE-CALAIS    12842
OISE              5475
SOMME             3650
AISNE             2920
Name: nom_dept, dtype: int64

In [5]:
emission['nom_com'].value_counts()

Dunkerque                  3650
Lille                      2555
Calais                     2190
Valenciennes               2190
Béthune                    1674
Cartignies                 1460
Saint-Quentin              1460
Creil                      1460
Amiens                     1460
Douai                      1460
Cappelle-la-Grande         1460
Neuilly-Saint-Front        1460
Sangatte                   1313
Salouël                    1095
Nogent-sur-Oise            1095
Beauvais                   1095
Roubaix                    1095
Saint-Omer                 1095
Harnes                     1095
Campagne-lès-Boulonnais    1095
Grande-Synthe              1095
Outreau                    1095
Saint-Laurent-Blangy       1095
Noeux-les-Mines            1095
Denain                     1095
Maubeuge                   1095
Rieux                      1095
Tillé                       730
Halluin                     730
Arrest                      730
Boulogne-sur-Mer            730
Gravelin

In [60]:
emission['nom_station'].value_counts()

Bethune Stade             1698
Lille Fives               1460
Mardyck                   1460
Faiencerie Creil          1460
Cappelle                  1460
Calais Parmentier         1460
Douai Theuriet            1460
Neuilly-Saint-Front       1460
St Pol mer - cheminots    1460
Cartignies                1460
Sangatte                  1325
Trafic Beauvais 1         1095
Harnes Serres             1095
St Pierre Amiens          1095
St-Laurent-Blangy         1095
SMVO Rieux                1095
Noeux-les-Mines           1095
P. Roth St Quentin        1095
Salouel                   1095
Nogent sur Oise           1095
Roubaix Serres            1095
Valenciennes Acacias      1095
Lille Leeds               1095
Valenciennes Wallon       1095
Campagne les B.           1095
St Omer Ribot             1095
Grande-synthe             1095
Outreau                   1095
Maubeuge Joyeuse          1095
Denain Villars            1095
Aéroport de BEAUVAIS       730
Malo-les-Bains             730
Arrest  

In [6]:
emission['nom_poll'].value_counts()

Particules PM10           13830
Dioxyde d'azote           11680
Ozone                     10220
Particules fines PM2.5     5947
Dioxyde de soufre          2555
Benzène                     730
Name: nom_poll, dtype: int64

In [3]:
# Conversion de la variable 'date_debut' en type Date
emission["date_debut"] = pd.to_datetime(emission["date_debut"],  errors='coerce').dt.date
emission["date_debut"] = pd.to_datetime(emission["date_debut"])
emission = emission.sort_values(by='date_debut', ascending=True)
emission

Unnamed: 0,X,Y,objectid,nom_dept,nom_com,insee_com,nom_station,code_station,typologie,influence,nom_poll,id_poll_ue,valeur,unite,metrique,date_debut,date_fin,statut_valid,x_wgs84,y_wgs84,x_reg,y_reg,ObjectId2
0,390513.226482,6.508595e+06,755510,NORD,Valenciennes,59606,Valenciennes Acacias,FR06001,urban,2,Particules PM10,5,10.6,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,3.50804,50.3585,736201,7029100.0,1
6935,428583.379139,6.462276e+06,755524,NORD,Cartignies,59134,Cartignies,FR06133,rural-regional,2,Dioxyde d'azote,8,3.0,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,3.85003,50.0923,760889,6999650.0,6936
31608,275448.948019,6.319025e+06,755596,OISE,Creil,60175,Faiencerie Creil,FR18043,urban,2,Particules PM10,5,7.6,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,2.47440,49.2596,661733,6906780.0,31609
12410,196943.103527,6.612436e+06,755544,PAS-DE-CALAIS,Sangatte,62774,Sangatte,FR10025,suburban,2,Particules PM10,5,,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,1.76917,50.9499,613325,7095550.0,12411
37448,363039.576155,6.304488e+06,761811,AISNE,Neuilly-Saint-Front,2543,Neuilly-Saint-Front,FR18057,rural-regional,2,Dioxyde d'azote,8,5.0,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,3.26124,49.1743,667447,,37449
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12992,196943.103527,6.612436e+06,804245,PAS-DE-CALAIS,Sangatte,62774,Sangatte,FR10025,suburban,2,Particules PM10,5,,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,f,1.76917,50.9499,613325,7095550.0,12993
12991,196943.103527,6.612436e+06,804246,PAS-DE-CALAIS,Sangatte,62774,Sangatte,FR10025,suburban,2,Particules PM10,5,17.6,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,t,1.76917,50.9499,613325,7095550.0,12992
31972,275448.948019,6.319025e+06,804298,OISE,Creil,60175,Faiencerie Creil,FR18043,urban,2,Particules PM10,5,11.5,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,t,2.47440,49.2596,661733,6906780.0,31973
12409,259637.127547,6.627098e+06,804239,NORD,Dunkerque,59183,St Pol mer - cheminots,FR10017,urban,2,Dioxyde d'azote,8,19.0,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,t,2.33236,51.0328,653062,7104320.0,12410


# 3) Dataset concernant les paramètres météorologiques

https://www.historique-meteo.net/

# 4) Extraction des données relatives à la qualité de l'air via l'API

https://atmo-france.org/les-donnees/
    
Accès à l’API: https://admindata.atmo-france.org/api/doc
        
Document PDF pour utiliser l’API :
https://atmo-france.org/wp-content/uploads/2022/03/FAQ_API_Atmo_Data_20220330.pdf

In [63]:
# URL concernant Lille (code INSEE: 59350) à partir du 1er janvier 2021
url='https://admindata.atmo-france.org/api/data/112/%7b%22code_zone%22:%7b%22operator%22:%22=%22,%22value%22:%2259350%22%7d,%22date_ech%22:%7b%22operator%22:%22%3e=%22,%22value%22:%222020-01-01%22%7d%7d?withGeom=false'

In [64]:
# Requête à l'API
import requests
data = requests.get(url)
data

<Response [200]>

In [65]:
# Affichage de la réponse à la requête au format JSON
import json
data = json.loads(data.content)
data

{'type': 'FeatureCollection',
 'name': 'national_data.national_ind_atmo_2021',
 'crs': {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:EPSG::3857'}},
 'features': [{'type': 'Feature',
   'properties': {'gml_id': 365166443,
    'aasqa': '32',
    'date_maj': '2021/10/07 17:38:37.808+02',
    'partition_field': '322021w1',
    'code_no2': 2,
    'code_o3': 1,
    'code_pm10': 1,
    'code_pm25': 2,
    'code_qual': 2,
    'code_so2': 1,
    'code_zone': '59350',
    'coul_qual': '#50CCAA',
    'date_dif': '2021/10/07',
    'date_ech': '2021-01-04',
    'epsg_reg': '2154',
    'lib_qual': 'Moyen',
    'lib_zone': 'LILLE',
    'source': 'Atmo HDF',
    'type_zone': 'commune',
    'x_reg': 703330.0,
    'x_wgs84': 3.04699,
    'y_reg': 7059432.0,
    'y_wgs84': 50.63186},
   'geometry': None},
  {'type': 'Feature',
   'properties': {'gml_id': 365170232,
    'aasqa': '32',
    'date_maj': '2021/10/07 17:38:37.808+02',
    'partition_field': '322021w1',
    'code_no2': 2,
    'code_o

In [66]:
# Transformation des données au format JSON en dataset (ici, ce que l'on cherche à récupérer est 'properties')
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.json_normalize(data['features'])
df

Unnamed: 0,type,geometry,properties.gml_id,properties.aasqa,properties.date_maj,properties.partition_field,properties.code_no2,properties.code_o3,properties.code_pm10,properties.code_pm25,properties.code_qual,properties.code_so2,properties.code_zone,properties.coul_qual,properties.date_dif,properties.date_ech,properties.epsg_reg,properties.lib_qual,properties.lib_zone,properties.source,properties.type_zone,properties.x_reg,properties.x_wgs84,properties.y_reg,properties.y_wgs84
0,Feature,,365166443,32,2021/10/07 17:38:37.808+02,322021w1,2,1,1,2,2,1,59350,#50CCAA,2021/10/07,2021-01-04,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
1,Feature,,365170232,32,2021/10/07 17:38:37.808+02,322021w1,2,1,2,2,2,1,59350,#50CCAA,2021/10/07,2021-01-05,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
2,Feature,,365174021,32,2021/10/07 17:38:37.808+02,322021w1,2,1,2,3,3,1,59350,#F0E641,2021/10/07,2021-01-06,2154,Dégradé,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
3,Feature,,365177810,32,2021/10/07 17:38:37.808+02,322021w1,2,2,1,2,2,1,59350,#50CCAA,2021/10/07,2021-01-07,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
4,Feature,,365181599,32,2021/10/07 17:38:37.808+02,322021w1,3,1,3,4,4,1,59350,#FF5050,2021/10/07,2021-01-09,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
5,Feature,,365185388,32,2021/10/07 17:38:37.808+02,322021w1,2,1,2,4,4,1,59350,#FF5050,2021/10/07,2021-01-10,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
6,Feature,,365397572,32,2021/10/07 17:38:37.808+02,322021w10,2,2,2,4,4,1,59350,#FF5050,2021/10/07,2021-03-08,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
7,Feature,,365401361,32,2021/10/07 17:38:37.808+02,322021w10,2,2,2,4,4,1,59350,#FF5050,2021/10/07,2021-03-09,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
8,Feature,,365405151,32,2021/10/07 17:38:37.808+02,322021w10,1,2,2,4,4,1,59350,#FF5050,2021/10/07,2021-03-10,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
9,Feature,,365408939,32,2021/10/07 17:38:37.808+02,322021w10,1,2,1,1,2,1,59350,#50CCAA,2021/10/07,2021-03-11,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186


In [None]:
df.info()

In [67]:
# Conversion de la colonne 'properties.date_ech' en date
df["properties.date_ech"] = pd.to_datetime(df["properties.date_ech"],  errors='coerce').dt.date
df["properties.date_ech"] = pd.to_datetime(df["properties.date_ech"])
df["properties.date_ech"].dtype

dtype('<M8[ns]')

In [68]:
df = df.sort_values(by='properties.date_ech', ascending=True)

In [69]:
df

Unnamed: 0,type,geometry,properties.gml_id,properties.aasqa,properties.date_maj,properties.partition_field,properties.code_no2,properties.code_o3,properties.code_pm10,properties.code_pm25,properties.code_qual,properties.code_so2,properties.code_zone,properties.coul_qual,properties.date_dif,properties.date_ech,properties.epsg_reg,properties.lib_qual,properties.lib_zone,properties.source,properties.type_zone,properties.x_reg,properties.x_wgs84,properties.y_reg,properties.y_wgs84
421,Feature,,365158865,32,2021/10/07 17:38:37.808+02,322021w53,4,1,3,4,4,1,59350,#FF5050,2021/10/07,2021-01-02,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
422,Feature,,365162654,32,2021/10/07 17:38:37.808+02,322021w53,3,1,2,4,4,1,59350,#FF5050,2021/10/07,2021-01-03,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
0,Feature,,365166443,32,2021/10/07 17:38:37.808+02,322021w1,2,1,1,2,2,1,59350,#50CCAA,2021/10/07,2021-01-04,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
1,Feature,,365170232,32,2021/10/07 17:38:37.808+02,322021w1,2,1,2,2,2,1,59350,#50CCAA,2021/10/07,2021-01-05,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
2,Feature,,365174021,32,2021/10/07 17:38:37.808+02,322021w1,2,1,2,3,3,1,59350,#F0E641,2021/10/07,2021-01-06,2154,Dégradé,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
3,Feature,,365177810,32,2021/10/07 17:38:37.808+02,322021w1,2,2,1,2,2,1,59350,#50CCAA,2021/10/07,2021-01-07,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
4,Feature,,365181599,32,2021/10/07 17:38:37.808+02,322021w1,3,1,3,4,4,1,59350,#FF5050,2021/10/07,2021-01-09,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
5,Feature,,365185388,32,2021/10/07 17:38:37.808+02,322021w1,2,1,2,4,4,1,59350,#FF5050,2021/10/07,2021-01-10,2154,Mauvais,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
76,Feature,,365189177,32,2021/10/07 17:38:37.808+02,322021w2,1,2,1,2,2,1,59350,#50CCAA,2021/10/07,2021-01-11,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186
77,Feature,,365192966,32,2021/10/07 17:38:37.808+02,322021w2,2,2,1,1,2,1,59350,#50CCAA,2021/10/07,2021-01-12,2154,Moyen,LILLE,Atmo HDF,commune,703330.0,3.04699,7059432.0,50.63186


In [None]:
# Enregistrement au format csv
#df.to_csv('ind_lille.csv', index=False)

In [None]:
# A faire pour les 19 villes sélectionnées

# 5) Fusion des tables météo pour les années 2021 et 2022 de chaque ville retenue

In [87]:
# Création d'une liste de tuples des fichiers météo
tuple_list = [('export-amiens2021.csv', 'export-amiens2022.csv'), ('export-beauvais-oise2021.csv', 'export-beauvais-oise2022.csv'), ('export-bethune2021.csv', 'export-bethune2022.csv'), ('export-boulogne-sur-mer2021.csv', 'export-boulogne-sur-mer2022.csv'), ('export-calais2021.csv', 'export-calais2022.csv'), ('export-creil2021.csv', 'export-creil2022.csv'), ('export-douai2021.csv', 'export-douai2022.csv'), ('export-dunkerque2021.csv', 'export-dunkerque2022.csv'), ('export-gravelines2021.csv', 'export-gravelines2022.csv'), ('export-lens2021.csv', 'export-lens2022.csv'), ('export-lille2021.csv', 'export-lille2022.csv'), ('export-maubeuge2021.csv', 'export-maubeuge2022.csv'), ('export-nogent-sur-oise2021.csv', 'export-nogent-sur-oise2022.csv'), ('export-roubaix2021.csv', 'export-roubaix2022.csv'), ('export-roye2021.csv', 'export-roye2022.csv'), ('export-saint-amand-les-eaux-parc-naturel-regional-scarpe-escaut2021.csv', 'export-saint-amand-les-eaux-parc-naturel-regional-scarpe-escaut2022.csv'), ('export-saint-omer2021.csv', 'export-saint-omer2022.csv'), ('export-sangatte2021.csv', 'export-sangatte2022.csv'), ('export-valenciennes2021.csv', 'export-valenciennes2022.csv')]


In [88]:
# test
print(tuple_list[1][0])

export-beauvais-oise2021.csv


In [4]:
# Utilisation d'une boucle et d'un regex pour créer la table fusionnée pour chaque ville

In [89]:
for i in tuple_list:
    #print(i[0])
    met_2021 = pd.read_csv(f'meteo_datasets/{i[0]}', delimiter=',', skiprows=3)
    met_2022 = pd.read_csv(f'meteo_datasets/{i[1]}', delimiter=',', skiprows=3)
    x = re.search('export-(.+?)20', f'meteo_datasets/{i[1]}').group(1)
    town_name = re.search('export-(.+?)20', f'meteo_datasets/{i[0]}').group(1)
    town_name = pd.concat([met_2021, met_2022], axis = 0)
    town_name['DATE'] = pd.to_datetime(town_name['DATE']).dt.date
    town_name['DATE'] = pd.to_datetime(town_name['DATE'])
    #print(town_name)
    town_name.to_csv(f'meteo_datasets/met_{x}.csv', index=False)

In [14]:
# Affichage de la table météo pour Amiens par exemple
pd.read_csv('meteo_datasets/met_amiens.csv')

Unnamed: 0,DATE,MAX_TEMPERATURE_C,MIN_TEMPERATURE_C,WINDSPEED_MAX_KMH,TEMPERATURE_MORNING_C,TEMPERATURE_NOON_C,TEMPERATURE_EVENING_C,PRECIP_TOTAL_DAY_MM,HUMIDITY_MAX_PERCENT,VISIBILITY_AVG_KM,PRESSURE_MAX_MB,CLOUDCOVER_AVG_PERCENT,HEATINDEX_MAX_C,DEWPOINT_MAX_C,WINDTEMP_MAX_C,WEATHER_CODE_MORNING,WEATHER_CODE_NOON,WEATHER_CODE_EVENING,TOTAL_SNOW_MM,UV_INDEX,SUNHOUR,OPINION,SUNSET,SUNRISE,TEMPERATURE_NIGHT_C
0,2021-01-01,3,0,9,0,2,2,3.1,97,6.250,1014,86.125,3,2,-3,116,332,317,2.2,1,3.2,météo très défavorable,17:00:00,08:49:00,0
1,2021-01-02,5,0,8,0,4,2,0.0,89,9.375,1016,39.625,5,3,-2,116,122,116,0.0,2,3.2,météo très défavorable,17:01:00,08:49:00,0
2,2021-01-03,3,0,13,2,3,2,0.2,96,5.625,1018,87.875,3,2,-2,143,143,368,0.0,1,3.2,météo très défavorable,17:03:00,08:49:00,2
3,2021-01-04,2,1,22,2,2,2,0.5,96,8.375,1016,100.000,2,2,-3,122,332,332,0.2,1,3.2,météo très défavorable,17:04:00,08:48:00,2
4,2021-01-05,2,-3,20,2,2,1,0.4,97,8.750,1018,99.875,2,1,-3,332,332,326,0.1,1,3.2,météo très défavorable,17:05:00,08:48:00,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
480,2022-04-26,16,4,19,4,14,15,0.0,95,10.000,1023,38.250,16,7,1,116,116,113,0.0,3,13.4,météo correcte,21:01:00,06:37:00,6
481,2022-04-27,18,2,19,2,14,18,0.0,97,8.000,1026,21.875,18,10,-1,143,113,113,0.0,4,14.5,météo favorable,21:03:00,06:35:00,3
482,2022-04-28,20,4,22,4,19,20,0.0,92,10.000,1029,4.625,20,12,1,113,113,113,0.0,4,14.5,météo favorable,21:05:00,06:33:00,5
483,2022-04-29,17,7,15,7,15,16,0.0,88,10.000,1030,61.375,17,9,5,116,122,116,0.0,4,12.4,météo correcte,21:06:00,06:31:00,7


In [15]:
pd.read_csv('meteo_datasets/met_amiens.csv').info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 485 entries, 0 to 484
Data columns (total 25 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   DATE                    485 non-null    object 
 1   MAX_TEMPERATURE_C       485 non-null    int64  
 2   MIN_TEMPERATURE_C       485 non-null    int64  
 3   WINDSPEED_MAX_KMH       485 non-null    int64  
 4   TEMPERATURE_MORNING_C   485 non-null    int64  
 5   TEMPERATURE_NOON_C      485 non-null    int64  
 6   TEMPERATURE_EVENING_C   485 non-null    int64  
 7   PRECIP_TOTAL_DAY_MM     485 non-null    float64
 8   HUMIDITY_MAX_PERCENT    485 non-null    int64  
 9   VISIBILITY_AVG_KM       485 non-null    float64
 10  PRESSURE_MAX_MB         485 non-null    int64  
 11  CLOUDCOVER_AVG_PERCENT  485 non-null    float64
 12  HEATINDEX_MAX_C         485 non-null    int64  
 13  DEWPOINT_MAX_C          485 non-null    int64  
 14  WINDTEMP_MAX_C          485 non-null    in

In [8]:
# Remarque la conversion de 'DATE en date n'est pas effectuée

# 6) Fusion des tables météo et indice de la qualité de l'air pour chaque ville

In [15]:
# Création d'une liste de tuples des fichiers météo et iaq
tuple_list = [('met_amiens.csv', 'ind_amiens.csv'), ('met_beauvais-oise.csv', 'ind_beauvais.csv'), ('met_bethune.csv', 'ind_bethune.csv'), ('met_boulogne-sur-mer.csv', 'ind_boulogne_sur_mer.csv'), ('met_calais.csv', 'ind_calais.csv'), ('met_creil.csv', 'ind_creil.csv'), ('met_douai.csv', 'ind_douai.csv'), ('met_dunkerque.csv', 'ind_dunkerque.csv'), ('met_gravelines.csv', 'ind_gravelines.csv'), ('met_lens.csv', 'ind_lens.csv'), ('met_lille.csv', 'ind_lille.csv'), ('met_maubeuge.csv', 'ind_maubeuge.csv'), ('met_nogent-sur-oise.csv', 'ind_nogent_sur_oise.csv'), ('met_roubaix.csv', 'ind_roubaix.csv'), ('met_roye.csv', 'ind_roye.csv'), ('met_saint-amand-les-eaux-parc-naturel-regional-scarpe-escaut.csv', 'ind_saint_amand_les_eaux.csv'), ('met_saint-omer.csv', 'ind_saint_omer.csv'), ('met_sangatte.csv', 'ind_sangatte.csv'), ('met_valenciennes.csv', 'ind_valenciennes.csv')]

In [18]:
for i in tuple_list:
    #print(i[0])
    met = pd.read_csv(f'meteo_datasets/{i[0]}')
    met['DATE'] = pd.to_datetime(met['DATE']).dt.date
    met['DATE'] = pd.to_datetime(met['DATE'])
    iaq = pd.read_csv(f'indice_air_quality_datasets/{i[1]}')
    iaq.rename(columns={'properties.date_ech': 'DATE'}, inplace=True)
    iaq['DATE'] = pd.to_datetime(iaq['DATE']).dt.date
    iaq['DATE'] = pd.to_datetime(iaq['DATE'])
    x = re.search('ind_(.+?).csv', f'indice_air_quality_datasets/{i[1]}').group(1)
    iaq_met = re.search('ind_(.+?).csv', f'indice_air_quality_datasets/{i[1]}').group(1)
    ind_met = pd.merge(iaq, met, on='DATE', how='inner')
    ind_met.to_csv(f'iaq_met_datasets/iaq_met_{x}.csv', index=False)

In [19]:
# Exemple: résultat pour la ville de Calais
pd.set_option('display.max_columns', None)
pd.read_csv('iaq_met_datasets/iaq_met_calais.csv')

Unnamed: 0,type,geometry,properties.gml_id,properties.aasqa,properties.date_maj,properties.partition_field,properties.code_no2,properties.code_o3,properties.code_pm10,properties.code_pm25,properties.code_qual,properties.code_so2,properties.code_zone,properties.coul_qual,properties.date_dif,DATE,properties.epsg_reg,properties.lib_qual,properties.lib_zone,properties.source,properties.type_zone,properties.x_reg,properties.x_wgs84,properties.y_reg,properties.y_wgs84,MAX_TEMPERATURE_C,MIN_TEMPERATURE_C,WINDSPEED_MAX_KMH,TEMPERATURE_MORNING_C,TEMPERATURE_NOON_C,TEMPERATURE_EVENING_C,PRECIP_TOTAL_DAY_MM,HUMIDITY_MAX_PERCENT,VISIBILITY_AVG_KM,PRESSURE_MAX_MB,CLOUDCOVER_AVG_PERCENT,HEATINDEX_MAX_C,DEWPOINT_MAX_C,WINDTEMP_MAX_C,WEATHER_CODE_MORNING,WEATHER_CODE_NOON,WEATHER_CODE_EVENING,TOTAL_SNOW_MM,UV_INDEX,SUNHOUR,OPINION,SUNSET,SUNRISE,TEMPERATURE_NIGHT_C
0,Feature,,365160040,32,2021/10/07 17:38:37.808+02,322021w53,1,2,1,1,2,1,62193,#50CCAA,2021/10/07,2021-01-02,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,7,2,26,6,7,7,7.4,84,10.00,1017,85.500,7,4,3,353,353,353,0.0,3,3.0,météo très défavorable,16:58:00,08:55:00,6
1,Feature,,365163829,32,2021/10/07 17:38:37.808+02,322021w53,2,2,1,2,2,1,62193,#50CCAA,2021/10/07,2021-01-03,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,6,4,22,6,5,5,1.1,83,10.00,1019,90.500,6,3,1,122,353,176,0.0,1,3.0,météo très défavorable,16:59:00,08:55:00,5
2,Feature,,365167618,32,2021/10/07 17:38:37.808+02,322021w1,1,2,1,2,2,1,62193,#50CCAA,2021/10/07,2021-01-04,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,5,4,41,5,5,5,1.4,84,10.00,1019,100.000,5,3,-1,362,362,362,0.0,2,3.2,météo très défavorable,17:01:00,08:55:00,5
3,Feature,,365171407,32,2021/10/07 17:38:37.808+02,322021w1,1,2,1,2,2,1,62193,#50CCAA,2021/10/07,2021-01-05,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,4,3,29,4,4,4,1.3,88,9.00,1020,94.875,4,2,-2,371,371,362,0.2,1,3.2,météo très défavorable,17:02:00,08:55:00,4
4,Feature,,365175196,32,2021/10/07 17:38:37.808+02,322021w1,1,2,1,2,2,1,62193,#50CCAA,2021/10/07,2021-01-06,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,4,0,13,2,3,3,1.7,91,8.75,1019,93.750,4,2,-1,371,371,371,0.6,1,3.2,météo très défavorable,17:03:00,08:54:00,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
637,Feature,,1738399027,32,2022/04/29 13:32:26.692+02,322022w17,1,3,2,2,3,1,62193,#F0E641,2022/04/29,2022-04-28,2154,Dégradé,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,14,9,25,9,13,14,0.0,88,10.00,1031,3.875,14,10,5,113,113,113,0.0,4,14.5,météo correcte,21:09:00,06:32:00,9
638,Feature,,1738399028,32,2022/04/29 13:32:26.692+02,322022w17,1,2,2,2,2,1,62193,#50CCAA,2022/04/29,2022-04-29,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,11,9,27,9,11,11,0.0,85,10.00,1031,46.500,11,7,7,122,119,113,0.0,4,12.4,météo défavorable,21:11:00,06:30:00,9
639,Feature,,1743145976,32,2022/04/30 13:31:26.763+02,322022w17,1,2,2,1,2,1,62193,#50CCAA,2022/04/30,2022-04-29,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,11,9,27,9,11,11,0.0,85,10.00,1031,46.500,11,7,7,122,119,113,0.0,4,12.4,météo défavorable,21:11:00,06:30:00,9
640,Feature,,1743145977,32,2022/04/30 13:31:26.763+02,322022w17,2,2,1,1,2,1,62193,#50CCAA,2022/04/30,2022-04-30,2154,Moyen,CALAIS,Atmo HDF,commune,620796.0,1.87525,7095520.0,50.95059,13,8,25,8,11,11,0.0,82,10.00,1030,25.625,13,8,5,116,113,113,0.0,4,13.5,météo correcte,21:13:00,06:28:00,8


In [20]:
pd.read_csv('iaq_met_datasets/iaq_met_calais.csv').info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 642 entries, 0 to 641
Data columns (total 49 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   type                        642 non-null    object 
 1   geometry                    0 non-null      float64
 2   properties.gml_id           642 non-null    int64  
 3   properties.aasqa            642 non-null    int64  
 4   properties.date_maj         642 non-null    object 
 5   properties.partition_field  642 non-null    object 
 6   properties.code_no2         642 non-null    int64  
 7   properties.code_o3          642 non-null    int64  
 8   properties.code_pm10        642 non-null    int64  
 9   properties.code_pm25        642 non-null    int64  
 10  properties.code_qual        642 non-null    int64  
 11  properties.code_so2         642 non-null    int64  
 12  properties.code_zone        642 non-null    int64  
 13  properties.coul_qual        642 non

In [None]:
# Remarque: encore une fois la 'DATE' n'est pas convertie

# 7) Travail sur la table emission

In [22]:
emission

Unnamed: 0,X,Y,objectid,nom_dept,nom_com,insee_com,nom_station,code_station,typologie,influence,nom_poll,id_poll_ue,valeur,unite,metrique,date_debut,date_fin,statut_valid,x_wgs84,y_wgs84,x_reg,y_reg,ObjectId2
0,390513.226482,6.508595e+06,755510,NORD,Valenciennes,59606,Valenciennes Acacias,FR06001,urban,2,Particules PM10,5,10.6,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,3.50804,50.3585,736201,7029100.0,1
6935,428583.379139,6.462276e+06,755524,NORD,Cartignies,59134,Cartignies,FR06133,rural-regional,2,Dioxyde d'azote,8,3.0,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,3.85003,50.0923,760889,6999650.0,6936
31608,275448.948019,6.319025e+06,755596,OISE,Creil,60175,Faiencerie Creil,FR18043,urban,2,Particules PM10,5,7.6,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,2.47440,49.2596,661733,6906780.0,31609
12410,196943.103527,6.612436e+06,755544,PAS-DE-CALAIS,Sangatte,62774,Sangatte,FR10025,suburban,2,Particules PM10,5,,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,1.76917,50.9499,613325,7095550.0,12411
37448,363039.576155,6.304488e+06,761811,AISNE,Neuilly-Saint-Front,2543,Neuilly-Saint-Front,FR18057,rural-regional,2,Dioxyde d'azote,8,5.0,ug.m-3,journaliere,2021-05-06,2021/05/06 23:59:59+00,t,3.26124,49.1743,667447,,37449
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12992,196943.103527,6.612436e+06,804245,PAS-DE-CALAIS,Sangatte,62774,Sangatte,FR10025,suburban,2,Particules PM10,5,,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,f,1.76917,50.9499,613325,7095550.0,12993
12991,196943.103527,6.612436e+06,804246,PAS-DE-CALAIS,Sangatte,62774,Sangatte,FR10025,suburban,2,Particules PM10,5,17.6,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,t,1.76917,50.9499,613325,7095550.0,12992
31972,275448.948019,6.319025e+06,804298,OISE,Creil,60175,Faiencerie Creil,FR18043,urban,2,Particules PM10,5,11.5,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,t,2.47440,49.2596,661733,6906780.0,31973
12409,259637.127547,6.627098e+06,804239,NORD,Dunkerque,59183,St Pol mer - cheminots,FR10017,urban,2,Dioxyde d'azote,8,19.0,ug.m-3,journaliere,2022-05-05,2022/05/05 23:59:59+00,t,2.33236,51.0328,653062,7104320.0,12410


In [23]:
# Analyse de la colonne 'nom_poll'
town_list= ['Amiens', 'Beauvais', 'Béthune', 'Boulogne-sur-Mer', 'Calais', 'Creil', 'Douai', 'Dunkerque', 'Gravelines', 'Lens', 'Lille', 'Maubeuge', 'Nogent-sur-Oise', 'Roubaix', 'Roye', 'Saint-Amand-les-Eaux', 'Saint-Omer', 'Sangatte', 'Valenciennes']

In [68]:
for i in town_list:
    print('\n')
    print(i, emission['nom_poll'][emission['nom_com'] == i].value_counts())



Amiens Particules fines PM2.5    365
Particules PM10           365
Ozone                     365
Dioxyde d'azote           365
Name: nom_poll, dtype: int64


Beauvais Particules PM10           365
Particules fines PM2.5    365
Dioxyde d'azote           365
Name: nom_poll, dtype: int64


Béthune Particules PM10           472
Particules fines PM2.5    472
Ozone                     365
Dioxyde d'azote           365
Name: nom_poll, dtype: int64


Boulogne-sur-Mer Particules PM10           365
Particules fines PM2.5    365
Name: nom_poll, dtype: int64


Calais Particules PM10           730
Dioxyde de soufre         365
Particules fines PM2.5    365
Ozone                     365
Dioxyde d'azote           365
Name: nom_poll, dtype: int64


Creil Particules PM10           365
Particules fines PM2.5    365
Dioxyde d'azote           365
Ozone                     365
Name: nom_poll, dtype: int64


Douai Particules PM10           365
Ozone                     365
Dioxyde d'azote           365
Pa

In [45]:
#Amiens
amiens_pol = emission[emission['nom_com'] == 'Amiens']
amiens_pol = amiens_pol[['date_debut', 'nom_poll', 'valeur']]
amiens_pol = amiens_pol.pivot(index=['date_debut'], columns='nom_poll')
amiens_pol.set_axis(['Dioxyde d\'azote', 'Ozone','Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
amiens_pol['DATE'] = amiens_pol.index
amiens_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
amiens_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
amiens_pol.to_csv('polluant_values_dataset/pol_amiens.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_amiens.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,11.0,,55.0,8.8,,2021-05-06
1,,13.0,,53.0,12.2,,2021-05-07
2,,8.0,,43.0,10.4,,2021-05-08
3,,7.0,,48.0,13.8,,2021-05-09
4,,5.0,,60.0,7.8,,2021-05-10
...,...,...,...,...,...,...,...
360,,11.0,,65.0,15.0,13.0,2022-05-01
361,,14.0,,73.0,27.2,25.0,2022-05-02
362,,13.0,,66.0,19.0,17.0,2022-05-03
363,,13.0,,59.0,16.8,12.0,2022-05-04


In [46]:
#Beauvais
Beauvais_pol = emission[emission['nom_com'] == 'Beauvais']
Beauvais_pol = Beauvais_pol[['date_debut', 'nom_poll', 'valeur']]
Beauvais_pol = Beauvais_pol.pivot(index=['date_debut'], columns='nom_poll')
Beauvais_pol
Beauvais_pol.set_axis(['Dioxyde d\'azote', 'Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Beauvais_pol['DATE'] = Beauvais_pol.index
Beauvais_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Beauvais_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Beauvais_pol.insert(3, "Ozone", 'NaN', allow_duplicates=False)
Beauvais_pol.to_csv('polluant_values_dataset/pol_beauvais.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_beauvais.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,45.0,,,12.4,11.0,2021-05-06
1,,39.0,,,19.1,12.0,2021-05-07
2,,30.0,,,15.9,,2021-05-08
3,,26.0,,,20.1,,2021-05-09
4,,32.0,,,16.5,10.0,2021-05-10
...,...,...,...,...,...,...,...
360,,12.0,,,12.6,11.0,2022-05-01
361,,18.0,,,28.1,22.0,2022-05-02
362,,17.0,,,22.9,14.0,2022-05-03
363,,15.0,,,14.9,11.0,2022-05-04


In [47]:
#Béthune
Bethune_pol = emission[emission['nom_com'] == 'Béthune']
Bethune_pol = Bethune_pol[['date_debut', 'nom_poll', 'valeur']]
Bethune_pol = Bethune_pol.pivot_table(index=['date_debut'], columns='nom_poll')
Bethune_pol
Bethune_pol.set_axis(['Dioxyde d\'azote', 'Ozone','Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Bethune_pol['DATE'] = Bethune_pol.index
Bethune_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Bethune_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Bethune_pol.to_csv('polluant_values_dataset/pol_bethune.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_bethune.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,9.0,,58.0,8.9,5.0,2021-05-06
1,,9.0,,62.0,9.2,6.0,2021-05-07
2,,11.0,,53.0,10.5,7.0,2021-05-08
3,,8.0,,66.0,13.9,8.0,2021-05-09
4,,5.0,,69.0,8.4,4.0,2021-05-10
...,...,...,...,...,...,...,...
357,,10.0,,63.0,21.0,13.0,2022-05-01
358,,15.0,,65.0,,24.0,2022-05-02
359,,12.0,,63.0,,16.0,2022-05-03
360,,11.0,,59.0,,15.0,2022-05-04


In [69]:
#Boulogne-sur-Mer
Boulogne_sur_Mer_pol = emission[emission['nom_com'] == 'Boulogne-sur-Mer']
Boulogne_sur_Mer_pol = Boulogne_sur_Mer_pol[['date_debut', 'nom_poll', 'valeur']]
Boulogne_sur_Mer_pol = Boulogne_sur_Mer_pol.pivot(index=['date_debut'], columns='nom_poll')
Boulogne_sur_Mer_pol
Boulogne_sur_Mer_pol.set_axis(['Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Boulogne_sur_Mer_pol['DATE'] = Boulogne_sur_Mer_pol.index
Boulogne_sur_Mer_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Boulogne_sur_Mer_pol.insert(1, "Dioxyde d'azote", 'NaN', allow_duplicates=False)
Boulogne_sur_Mer_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Boulogne_sur_Mer_pol.insert(3, "Ozone", 'NaN', allow_duplicates=False)
Boulogne_sur_Mer_pol.to_csv('polluant_values_dataset/pol_boulogne_sur_mer.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_boulogne_sur_mer.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,,,,,6.0,2021-05-06
1,,,,,,6.0,2021-05-07
2,,,,,,5.0,2021-05-08
3,,,,,,6.0,2021-05-09
4,,,,,8.3,4.0,2021-05-10
...,...,...,...,...,...,...,...
360,,,,,,,2022-05-01
361,,,,,,,2022-05-02
362,,,,,,19.0,2022-05-03
363,,,,,,13.0,2022-05-04


In [49]:
#Calais
Calais_pol = emission[emission['nom_com'] == 'Calais']
Calais_pol = Calais_pol[['date_debut', 'nom_poll', 'valeur']]
Calais_pol = Calais_pol.pivot_table(index=['date_debut'], columns='nom_poll')
Calais_pol
Calais_pol.set_axis(['Dioxyde d\'azote', 'Dioxyde de soufre', 'Ozone','Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Calais_pol['DATE'] = Calais_pol.index
Calais_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Calais_pol.to_csv('polluant_values_dataset/pol_calais.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_calais.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,14.0,1.0,58.0,9.85,5.0,2021-05-06
1,,12.0,2.0,67.0,13.40,5.0,2021-05-07
2,,8.0,3.0,55.0,11.25,5.0,2021-05-08
3,,13.0,2.0,53.0,13.70,5.0,2021-05-09
4,,6.0,5.0,72.0,12.05,4.0,2021-05-10
...,...,...,...,...,...,...,...
360,,23.0,-1.0,53.0,15.00,9.0,2022-05-01
361,,28.0,3.0,52.0,18.85,13.0,2022-05-02
362,,12.0,0.0,63.0,20.55,12.0,2022-05-03
363,,14.0,-3.0,52.0,19.15,10.0,2022-05-04


In [50]:
#Creil
Creil_pol = emission[emission['nom_com'] == 'Creil']
Creil_pol = Creil_pol[['date_debut', 'nom_poll', 'valeur']]
Creil_pol = Creil_pol.pivot(index=['date_debut'], columns='nom_poll')
Creil_pol
Creil_pol.set_axis(['Dioxyde d\'azote', 'Ozone', 'Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Creil_pol['DATE'] = Creil_pol.index
Creil_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Creil_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Creil_pol.to_csv('polluant_values_dataset/pol_creil.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_creil.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,19.0,,41.0,7.6,5.0,2021-05-06
1,,15.0,,49.0,11.9,6.0,2021-05-07
2,,15.0,,50.0,10.6,8.0,2021-05-08
3,,10.0,,63.0,12.8,8.0,2021-05-09
4,,7.0,,68.0,7.1,4.0,2021-05-10
...,...,...,...,...,...,...,...
360,,13.0,,72.0,16.2,11.0,2022-05-01
361,,19.0,,69.0,28.1,19.0,2022-05-02
362,,15.0,,70.0,21.2,16.0,2022-05-03
363,,14.0,,60.0,18.3,14.0,2022-05-04


In [51]:
#Douai
Douai_pol = emission[emission['nom_com'] == 'Douai']
Douai_pol = Douai_pol[['date_debut', 'nom_poll', 'valeur']]
Douai_pol = Douai_pol.pivot(index=['date_debut'], columns='nom_poll')
Douai_pol
Douai_pol.set_axis(['Dioxyde d\'azote', 'Ozone', 'Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Douai_pol['DATE'] = Douai_pol.index
Douai_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Douai_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Douai_pol.to_csv('polluant_values_dataset/pol_douai.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_douai.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,9.0,,56.0,8.3,,2021-05-06
1,,12.0,,55.0,10.7,,2021-05-07
2,,8.0,,54.0,11.5,,2021-05-08
3,,5.0,,67.0,12.0,,2021-05-09
4,,5.0,,70.0,7.6,,2021-05-10
...,...,...,...,...,...,...,...
360,,16.0,,64.0,20.5,12.0,2022-05-01
361,,19.0,,71.0,28.9,19.0,2022-05-02
362,,20.0,,59.0,24.8,14.0,2022-05-03
363,,19.0,,57.0,23.5,11.0,2022-05-04


In [52]:
#Dunkerque
Dunkerque_pol = emission[emission['nom_com'] == 'Dunkerque']
Dunkerque_pol = Dunkerque_pol[['date_debut', 'nom_poll', 'valeur']]
Dunkerque_pol = Dunkerque_pol.pivot_table(index=['date_debut'], columns='nom_poll')
Dunkerque_pol
Dunkerque_pol.set_axis(['Benzène', 'Dioxyde d\'azote', 'Dioxyde de soufre', 'Ozone', 'Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Dunkerque_pol['DATE'] = Dunkerque_pol.index
Dunkerque_pol.to_csv('polluant_values_dataset/pol_dunkerque.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_dunkerque.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,0.27,13.5,1.0,55.0,11.033333,5.0,2021-05-06
1,0.75,10.0,1.0,66.0,14.366667,,2021-05-07
2,0.22,9.5,0.5,52.0,10.733333,5.0,2021-05-08
3,3.88,13.5,-0.5,52.0,18.166667,,2021-05-09
4,0.18,6.5,0.5,68.0,8.700000,,2021-05-10
...,...,...,...,...,...,...,...
360,1.40,20.0,10.0,55.0,24.433333,15.0,2022-05-01
361,0.99,22.0,1.0,61.0,22.550000,15.0,2022-05-02
362,1.03,12.0,1.0,72.0,21.750000,10.0,2022-05-03
363,0.46,17.5,2.0,48.0,28.550000,12.0,2022-05-04


In [53]:
#Gravelines
Gravelines_pol = emission[emission['nom_com'] == 'Gravelines']
Gravelines_pol = Gravelines_pol[['date_debut', 'nom_poll', 'valeur']]
Gravelines_pol = Gravelines_pol.pivot(index=['date_debut'], columns='nom_poll')
Gravelines_pol
Gravelines_pol.set_axis(['Dioxyde de soufre', 'Particules PM10'], axis='columns', inplace=True)
Gravelines_pol['DATE'] = Gravelines_pol.index
Gravelines_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Gravelines_pol.insert(1, "Dioxyde d'azote", 'NaN', allow_duplicates=False)
Gravelines_pol.insert(4, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Gravelines_pol.insert(3, "Ozone", 'NaN', allow_duplicates=False)
Gravelines_pol.to_csv('polluant_values_dataset/pol_gravelines.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_gravelines.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,,0.0,,10.9,,2021-05-06
1,,,0.0,,12.0,,2021-05-07
2,,,0.0,,10.9,,2021-05-08
3,,,1.0,,14.2,,2021-05-09
4,,,0.0,,11.2,,2021-05-10
...,...,...,...,...,...,...,...
360,,,3.0,,19.0,,2022-05-01
361,,,4.0,,30.4,,2022-05-02
362,,,4.0,,24.0,,2022-05-03
363,,,3.0,,21.2,,2022-05-04


In [54]:
#Lens
Lens_pol = emission[emission['nom_com'] == 'Lens']
Lens_pol = Lens_pol[['date_debut', 'nom_poll', 'valeur']]
Lens_pol = Lens_pol.pivot(index=['date_debut'], columns='nom_poll')
Lens_pol
Lens_pol.set_axis(['Particules PM10'], axis='columns', inplace=True)
Lens_pol['DATE'] = Dunkerque_pol.index
Lens_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Lens_pol.insert(1, "Dioxyde d'azote", 'NaN', allow_duplicates=False)
Lens_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Lens_pol.insert(3, "Ozone", 'NaN', allow_duplicates=False)
Lens_pol.insert(4, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Lens_pol.to_csv('polluant_values_dataset/pol_lens.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_lens.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules fines PM2.5,Particules PM10,DATE
0,,,,,,9.1,2021-05-06
1,,,,,,10.7,2021-05-07
2,,,,,,9.2,2021-05-08
3,,,,,,12.3,2021-05-09
4,,,,,,,2021-05-10
...,...,...,...,...,...,...,...
360,,,,,,23.0,2022-05-01
361,,,,,,35.8,2022-05-02
362,,,,,,28.6,2022-05-03
363,,,,,,25.2,2022-05-04


In [55]:
#Lille
Lille_pol = emission[emission['nom_com'] == 'Lille']
Lille_pol = Lille_pol[['date_debut', 'nom_poll', 'valeur']]
Lille_pol = Lille_pol.pivot_table(index=['date_debut'], columns='nom_poll')
Lille_pol
Lille_pol.set_axis(['Benzène', 'Dioxyde d\'azote', 'Ozone', 'Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Lille_pol['DATE'] = Lille_pol.index
Lille_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Lille_pol.to_csv('polluant_values_dataset/pol_lille.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_lille.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,0.26,19.0,,47.0,11.5,6.0,2021-05-06
1,0.34,20.0,,53.0,14.4,8.0,2021-05-07
2,0.22,11.5,,51.0,13.9,8.5,2021-05-08
3,0.12,8.5,,64.0,15.9,8.5,2021-05-09
4,0.10,9.5,,63.0,12.5,4.0,2021-05-10
...,...,...,...,...,...,...,...
345,,22.0,,,,16.0,2022-05-01
346,,29.0,,,,21.0,2022-05-02
347,,24.0,,,,18.0,2022-05-03
348,,20.0,,,,13.0,2022-05-04


In [56]:
#Maubeuge
Maubeuge_pol = emission[emission['nom_com'] == 'Maubeuge']
Maubeuge_pol = Maubeuge_pol[['date_debut', 'nom_poll', 'valeur']]
Maubeuge_pol = Maubeuge_pol.pivot(index=['date_debut'], columns='nom_poll')
Maubeuge_pol
Maubeuge_pol.set_axis(['Dioxyde d\'azote', 'Ozone', 'Particules PM10'], axis='columns', inplace=True)
Maubeuge_pol['DATE'] = Maubeuge_pol.index
Maubeuge_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Maubeuge_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Maubeuge_pol.insert(4, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Maubeuge_pol.to_csv('polluant_values_dataset/pol_maubeuge.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_maubeuge.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules fines PM2.5,Particules PM10,DATE
0,,12.0,,61.0,,10.9,2021-05-06
1,,15.0,,62.0,,10.7,2021-05-07
2,,6.0,,66.0,,8.7,2021-05-08
3,,5.0,,73.0,,11.6,2021-05-09
4,,7.0,,72.0,,9.1,2021-05-10
...,...,...,...,...,...,...,...
360,,9.0,,72.0,,14.2,2022-05-01
361,,13.0,,70.0,,20.4,2022-05-02
362,,13.0,,70.0,,18.5,2022-05-03
363,,11.0,,64.0,,21.6,2022-05-04


In [57]:
#Nogent-sur-Oise
Nogent_sur_Oise_pol = emission[emission['nom_com'] == 'Nogent-sur-Oise']
Nogent_sur_Oise_pol = Nogent_sur_Oise_pol[['date_debut', 'nom_poll', 'valeur']]
Nogent_sur_Oise_pol = Nogent_sur_Oise_pol.pivot(index=['date_debut'], columns='nom_poll')
Nogent_sur_Oise_pol
Nogent_sur_Oise_pol.set_axis(['Dioxyde d\'azote', 'Ozone', 'Particules PM10'], axis='columns', inplace=True)
Nogent_sur_Oise_pol['DATE'] = Nogent_sur_Oise_pol.index
Nogent_sur_Oise_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Nogent_sur_Oise_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Nogent_sur_Oise_pol.insert(4, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Nogent_sur_Oise_pol.to_csv('polluant_values_dataset/pol_nogent_sur_oise.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_nogent_sur_oise.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules fines PM2.5,Particules PM10,DATE
0,,20.0,,38.0,,10.3,2021-05-06
1,,16.0,,49.0,,10.4,2021-05-07
2,,13.0,,51.0,,7.3,2021-05-08
3,,9.0,,63.0,,11.3,2021-05-09
4,,8.0,,64.0,,9.8,2021-05-10
...,...,...,...,...,...,...,...
360,,,,,,,2022-05-01
361,,,,,,,2022-05-02
362,,,,,,,2022-05-03
363,,,,,,,2022-05-04


In [58]:
#Roubaix
Roubaix_pol = emission[emission['nom_com'] == 'Roubaix']
Roubaix_pol = Roubaix_pol[['date_debut', 'nom_poll', 'valeur']]
Roubaix_pol = Roubaix_pol.pivot(index=['date_debut'], columns='nom_poll')
Roubaix_pol
Roubaix_pol.set_axis(['Dioxyde d\'azote', 'Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Roubaix_pol['DATE'] = Roubaix_pol.index
Roubaix_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Roubaix_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Roubaix_pol.insert(3, "Ozone", 'NaN', allow_duplicates=False)
Roubaix_pol.to_csv('polluant_values_dataset/pol_roubaix.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_roubaix.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,28.0,,,14.1,10.0,2021-05-06
1,,36.0,,,17.9,13.0,2021-05-07
2,,16.0,,,13.8,11.0,2021-05-08
3,,11.0,,,13.9,9.0,2021-05-09
4,,16.0,,,9.6,7.0,2021-05-10
...,...,...,...,...,...,...,...
360,,24.0,,,26.0,19.0,2022-05-01
361,,27.0,,,,25.0,2022-05-02
362,,23.0,,,,20.0,2022-05-03
363,,28.0,,,,16.0,2022-05-04


In [59]:
#Roye
Roye_pol = emission[emission['nom_com'] == 'Roye']
Roye_pol = Roye_pol[['date_debut', 'nom_poll', 'valeur']]
Roye_pol = Roye_pol.pivot(index=['date_debut'], columns='nom_poll')
Roye_pol
Roye_pol.set_axis(['Ozone'], axis='columns', inplace=True)
Roye_pol['DATE'] = Roye_pol.index
Roye_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Roye_pol.insert(1, "Dioxyde d'azote", 'NaN', allow_duplicates=False)
Roye_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Roye_pol.insert(4, "Particules PM10", 'NaN', allow_duplicates=False)
Roye_pol.insert(5, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Roye_pol.to_csv('polluant_values_dataset/pol_roye.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_roye.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,,,55.0,,,2021-05-06
1,,,,56.0,,,2021-05-07
2,,,,57.0,,,2021-05-08
3,,,,62.0,,,2021-05-09
4,,,,65.0,,,2021-05-10
...,...,...,...,...,...,...,...
360,,,,72.0,,,2022-05-01
361,,,,76.0,,,2022-05-02
362,,,,72.0,,,2022-05-03
363,,,,67.0,,,2022-05-04


In [60]:
#Saint-Amand-les-Eaux
Saint_Amand_les_Eaux_pol = emission[emission['nom_com'] == 'Saint-Amand-les-Eaux']
Saint_Amand_les_Eaux_pol = Saint_Amand_les_Eaux_pol[['date_debut', 'nom_poll', 'valeur']]
Saint_Amand_les_Eaux_pol = Saint_Amand_les_Eaux_pol.pivot(index=['date_debut'], columns='nom_poll')
Saint_Amand_les_Eaux_pol
Saint_Amand_les_Eaux_pol.set_axis(['Dioxyde d\'azote', 'Ozone'], axis='columns', inplace=True)
Saint_Amand_les_Eaux_pol['DATE'] = Saint_Amand_les_Eaux_pol.index
Saint_Amand_les_Eaux_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Saint_Amand_les_Eaux_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Saint_Amand_les_Eaux_pol.insert(4, "Particules PM10", 'NaN', allow_duplicates=False)
Saint_Amand_les_Eaux_pol.insert(5, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Saint_Amand_les_Eaux_pol.to_csv('polluant_values_dataset/pol_saint_amand_les_eaux.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_saint_amand_les_eaux.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,8.0,,57.0,,,2021-05-06
1,,8.0,,54.0,,,2021-05-07
2,,6.0,,58.0,,,2021-05-08
3,,4.0,,67.0,,,2021-05-09
4,,4.0,,71.0,,,2021-05-10
...,...,...,...,...,...,...,...
360,,11.0,,58.0,,,2022-05-01
361,,14.0,,66.0,,,2022-05-02
362,,12.0,,57.0,,,2022-05-03
363,,11.0,,52.0,,,2022-05-04


In [61]:
#Saint-Omer
Saint_Omer_pol = emission[emission['nom_com'] == 'Saint-Omer']
Saint_Omer_pol = Saint_Omer_pol[['date_debut', 'nom_poll', 'valeur']]
Saint_Omer_pol = Saint_Omer_pol.pivot(index=['date_debut'], columns='nom_poll')
Saint_Omer_pol
Saint_Omer_pol.set_axis(['Dioxyde d\'azote', 'Ozone', 'Particules PM10'], axis='columns', inplace=True)
Saint_Omer_pol['DATE'] = Saint_Omer_pol.index
Saint_Omer_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Saint_Omer_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Saint_Omer_pol.insert(5, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Saint_Omer_pol.to_csv('polluant_values_dataset/pol_saint_omer.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_saint_omer.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,12.0,,47.0,11.9,,2021-05-06
1,,10.0,,60.0,12.4,,2021-05-07
2,,9.0,,48.0,12.9,,2021-05-08
3,,9.0,,54.0,15.6,,2021-05-09
4,,6.0,,63.0,10.3,,2021-05-10
...,...,...,...,...,...,...,...
360,,13.0,,,19.6,,2022-05-01
361,,13.0,,,26.7,,2022-05-02
362,,14.0,,56.0,27.1,,2022-05-03
363,,14.0,,47.0,19.7,,2022-05-04


In [62]:
#Sangatte
Sangatte_pol = emission[emission['nom_com'] == 'Sangatte']
Sangatte_pol = Sangatte_pol[['date_debut', 'nom_poll', 'valeur']]
Sangatte_pol = Sangatte_pol.pivot_table(index=['date_debut'], columns='nom_poll')
Sangatte_pol
Sangatte_pol.set_axis(['Dioxyde d\'azote', 'Ozone', 'Particules PM10'], axis='columns', inplace=True)
Sangatte_pol['DATE'] = Sangatte_pol.index
Sangatte_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Sangatte_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Sangatte_pol.insert(5, "Particules fines PM2.5", 'NaN', allow_duplicates=False)
Sangatte_pol.to_csv('polluant_values_dataset/pol_sangatte.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_sangatte.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,6.0,,,,,2021-05-06
1,,8.0,,,,,2021-05-07
2,,5.0,,,,,2021-05-08
3,,6.0,,,,,2021-05-09
4,,4.0,,,,,2021-05-10
...,...,...,...,...,...,...,...
360,,12.0,,51.0,13.2,,2022-05-01
361,,28.0,,41.0,,,2022-05-02
362,,14.0,,53.0,17.2,,2022-05-03
363,,8.0,,56.0,,,2022-05-04


In [63]:
#Valenciennes
Valenciennes_pol = emission[emission['nom_com'] == 'Valenciennes']
Valenciennes_pol = Valenciennes_pol[['date_debut', 'nom_poll', 'valeur']]
Valenciennes_pol = Valenciennes_pol.pivot_table(index=['date_debut'], columns='nom_poll')
Valenciennes_pol
Valenciennes_pol.set_axis(['Dioxyde d\'azote', 'Ozone', 'Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
Valenciennes_pol['DATE'] = Valenciennes_pol.index
Valenciennes_pol.insert(0, "Benzène", 'NaN', allow_duplicates=False)
Valenciennes_pol.insert(2, "Dioxyde de soufre", 'NaN', allow_duplicates=False)
Valenciennes_pol.to_csv('polluant_values_dataset/pol_valenciennes.csv', index=False)
pd.read_csv('polluant_values_dataset/pol_valenciennes.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE
0,,18.0,,60.0,11.85,7.0,2021-05-06
1,,24.0,,59.0,13.85,9.0,2021-05-07
2,,9.5,,63.0,12.65,9.0,2021-05-08
3,,6.5,,70.0,13.15,8.0,2021-05-09
4,,11.5,,71.0,12.35,8.0,2021-05-10
...,...,...,...,...,...,...,...
360,,16.0,,72.0,21.90,18.0,2022-05-01
361,,20.0,,74.0,31.15,23.0,2022-05-02
362,,17.0,,65.0,23.95,17.0,2022-05-03
363,,19.0,,57.0,27.10,16.0,2022-05-04


In [None]:
# town_list_1 = ['Amiens', 'Beauvais', 'Béthune', 'Boulogne-sur-Mer', 'Calais', 'Creil', 'Douai', 'Dunkerque', 'Gravelines', 'Lens', 'Lille', 'Maubeuge', 'Nogent-sur-Oise', 'Roubaix', 'Roye', 'Saint-Amand-les-Eaux', 'Saint-Omer', 'Sangatte', 'Valenciennes']
# town_list_2 = ['Amiens_pol', 'Beauvais_pol', 'Bethune_pol', 'Boulogne_sur_Mer_pol', 'Calais_pol', 'Creil_pol', 'Douai_pol', 'Dunkerque_pol', 'Gravelines_pol', 'Lens_pol', 'Lille_pol', 'Maubeuge_pol', 'Nogent_sur_Oise_pol', 'Roubaix_pol', 'Roye_pol', 'Saint_Amand_les_Eaux_pol', 'Saint_Omer_pol', 'Sangatte_pol', 'Valenciennes_pol']

In [None]:
# for i in town_list_2:
#     for j in town_list_1:
#         x = i
#         town_name = i
#         town_name = df[df['nom_com'] == j]
#         town_name = town_name[['date_debut', 'nom_poll', 'valeur']]
#         town_name = town_name.pivot_table(index=['date_debut'], columns='nom_poll')
#         #i.set_axis(['Dioxyde d\'azote', 'Ozone','Particules PM10', 'Particules fines PM2.5'], axis='columns', inplace=True)
#         town_name['date'] = town_name.index
#         town_name.to_csv(f'{x}.csv', index=False)

In [None]:
#Découpe de la table emission par ville:
# df_name_list = []
# for i in town_list:
#     x = f'df_{i}'
#     df_name_list.append(x)

# for j in df_name_list:
#     for k in town_list:
#         j = df[df['nom_com'] == k]
#     #f'df_{i}' = df[df['nom_com'] == i]
# df_Amiens
# #df_name_list

In [44]:
# Test pour transferer les données en colonnes sans 'pivot' --> Problème car multi index
# for i in pol_list:
#     emission[i] = 0
#     emission[i].loc[emission['nom_poll'] == f'{i}'] = emission['valeur'].loc[emission['nom_poll'] == f'{i}']
# emission

# 8) Fusion des tables emission (pol_'ville') et ind_met_'ville'

In [22]:
# Création d'une liste de tuples des fichiers ind_met et pol
tuple_list = [('pol_amiens.csv', 'iaq_met_amiens.csv'), ('pol_beauvais.csv', 'iaq_met_beauvais.csv'), ('pol_bethune.csv', 'iaq_met_bethune.csv'), ('pol_boulogne_sur_mer.csv', 'iaq_met_boulogne_sur_mer.csv'), ('pol_calais.csv', 'iaq_met_calais.csv'), ('pol_creil.csv', 'iaq_met_creil.csv'), ('pol_douai.csv', 'iaq_met_douai.csv'), ('pol_dunkerque.csv', 'iaq_met_dunkerque.csv'), ('pol_gravelines.csv', 'iaq_met_gravelines.csv'), ('pol_lens.csv', 'iaq_met_lens.csv'), ('pol_lille.csv', 'iaq_met_lille.csv'), ('pol_maubeuge.csv', 'iaq_met_maubeuge.csv'), ('pol_nogent_sur_oise.csv', 'iaq_met_nogent_sur_oise.csv'), ('pol_roubaix.csv', 'iaq_met_roubaix.csv'), ('pol_roye.csv', 'iaq_met_roye.csv'), ('pol_saint_amand_les_eaux.csv', 'iaq_met_saint_amand_les_eaux.csv'), ('pol_saint_omer.csv', 'iaq_met_saint_omer.csv'), ('pol_sangatte.csv', 'iaq_met_sangatte.csv'), ('pol_valenciennes.csv', 'iaq_met_valenciennes.csv')]

In [28]:
for i in tuple_list:
    #print(i[0])
    pol = pd.read_csv(f'polluant_values_dataset/{i[0]}')
    pol['DATE'] = pd.to_datetime(pol['DATE']).dt.date
    pol['DATE'] = pd.to_datetime(pol['DATE'])
    iaq_met = pd.read_csv(f'iaq_met_datasets/{i[1]}')
    iaq_met['DATE'] = pd.to_datetime(iaq_met['DATE']).dt.date
    iaq_met['DATE'] = pd.to_datetime(iaq_met['DATE'])
    x = re.search('pol_(.+?).csv', f'polluant_values_dataset/{i[0]}').group(1)
    #iaq_met = re.search('pol_(.+?).csv', f'polluant_values_dataset/{i[0]}').group(1)
    pol_ind_met = pd.merge(pol, iaq_met, on='DATE', how='inner')
    pol_ind_met.to_csv(f'iaq_met_pol_datasets/pol_iaq_met_{x}.csv', index=False)

In [32]:
# Exemple avec la ville d'Amiens
pd.read_csv('iaq_met_pol_datasets/pol_iaq_met_amiens.csv')

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE,type,geometry,properties.gml_id,properties.aasqa,properties.date_maj,properties.partition_field,properties.code_no2,properties.code_o3,properties.code_pm10,properties.code_pm25,properties.code_qual,properties.code_so2,properties.code_zone,properties.coul_qual,properties.date_dif,properties.epsg_reg,properties.lib_qual,properties.lib_zone,properties.source,properties.type_zone,properties.x_reg,properties.x_wgs84,properties.y_reg,properties.y_wgs84,MAX_TEMPERATURE_C,MIN_TEMPERATURE_C,WINDSPEED_MAX_KMH,TEMPERATURE_MORNING_C,TEMPERATURE_NOON_C,TEMPERATURE_EVENING_C,PRECIP_TOTAL_DAY_MM,HUMIDITY_MAX_PERCENT,VISIBILITY_AVG_KM,PRESSURE_MAX_MB,CLOUDCOVER_AVG_PERCENT,HEATINDEX_MAX_C,DEWPOINT_MAX_C,WINDTEMP_MAX_C,WEATHER_CODE_MORNING,WEATHER_CODE_NOON,WEATHER_CODE_EVENING,TOTAL_SNOW_MM,UV_INDEX,SUNHOUR,OPINION,SUNSET,SUNRISE,TEMPERATURE_NIGHT_C
0,,11.0,,55.0,8.8,,2021-05-06,Feature,,365623018,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,9,3,15,4,9,8,1.5,85,8.75,1011,83.000,9,6,1,116,122,296,0.0,3,10.4,météo très défavorable,20:17:00,05:19:00,3
1,,13.0,,53.0,12.2,,2021-05-07,Feature,,365626807,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,12,3,14,4,11,11,0.1,90,9.00,1016,19.750,12,3,2,116,116,116,0.0,2,14.5,météo très défavorable,20:18:00,05:17:00,3
2,,8.0,,43.0,10.4,,2021-05-08,Feature,,365630596,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,17,3,28,6,9,17,2.0,91,9.00,1018,61.000,18,13,2,116,266,176,0.0,4,10.4,météo correcte,20:20:00,05:16:00,3
3,,7.0,,48.0,13.8,,2021-05-09,Feature,,365634385,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,20,12,31,14,20,17,5.2,91,9.00,1005,72.500,22,15,12,386,353,353,0.0,5,9.4,météo défavorable,20:21:00,05:14:00,12
4,,5.0,,60.0,7.8,,2021-05-10,Feature,,365638174,32,2021/10/07 17:38:37.808+02,322021w19,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,16,11,27,11,15,14,2.1,92,9.75,1008,77.250,16,13,9,353,122,353,0.0,3,10.5,météo défavorable,20:23:00,05:13:00,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
549,,15.0,,68.0,24.2,20.0,2022-04-28,Feature,,1738401187,32,2022/04/29 13:32:26.692+02,322022w17,1,3,2,2,3,1,80021,#F0E641,2022/04/29,2154,Dégradé,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,20,4,22,4,19,20,0.0,92,10.00,1029,4.625,20,12,1,113,113,113,0.0,4,14.5,météo favorable,21:05:00,06:33:00,5
550,,13.0,,59.0,29.1,24.0,2022-04-29,Feature,,1738401188,32,2022/04/29 13:32:26.692+02,322022w17,1,2,2,2,2,1,80021,#50CCAA,2022/04/29,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,17,7,15,7,15,16,0.0,88,10.00,1030,61.375,17,9,5,116,122,116,0.0,4,12.4,météo correcte,21:06:00,06:31:00,7
551,,13.0,,59.0,29.1,24.0,2022-04-29,Feature,,1743148136,32,2022/04/30 13:31:26.763+02,322022w17,1,2,2,2,2,1,80021,#50CCAA,2022/04/30,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,17,7,15,7,15,16,0.0,88,10.00,1030,61.375,17,9,5,116,122,116,0.0,4,12.4,météo correcte,21:06:00,06:31:00,7
552,,10.0,,70.0,12.7,12.0,2022-04-30,Feature,,1743148137,32,2022/04/30 13:31:26.763+02,322022w17,1,2,2,2,2,1,80021,#50CCAA,2022/04/30,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.2898,6978201.0,49.90044,16,4,20,4,16,15,0.0,96,9.00,1028,11.875,16,8,1,143,113,113,0.0,4,14.5,météo correcte,21:08:00,06:29:00,4


# 9) Fusion des tables ind_met_pol_'ville'

In [34]:
ind_met_pol_list = ['pol_iaq_met_amiens.csv', 'pol_iaq_met_beauvais.csv', 'pol_iaq_met_bethune.csv', 'pol_iaq_met_boulogne_sur_mer.csv', 'pol_iaq_met_calais.csv', 'pol_iaq_met_creil.csv', 'pol_iaq_met_douai.csv', 'pol_iaq_met_dunkerque.csv', 'pol_iaq_met_gravelines.csv', 'pol_iaq_met_lens.csv', 'pol_iaq_met_lille.csv', 'pol_iaq_met_maubeuge.csv', 'pol_iaq_met_nogent_sur_oise.csv', 'pol_iaq_met_roubaix.csv', 'pol_iaq_met_roye.csv', 'pol_iaq_met_saint_Amand_les_Eaux.csv', 'pol_iaq_met_saint_omer.csv', 'pol_iaq_met_sangatte.csv', 'pol_iaq_met_valenciennes.csv']

In [40]:
table_list = []
for i in ind_met_pol_list:
    x = re.search('pol_iaq_met_(.+?)csv', i).group(1)
    x = pd.read_csv(f'iaq_met_pol_datasets/{i}')
    table_list.append(x)

In [42]:
table = pd.concat(table_list, axis = 0)
table

Unnamed: 0,Benzène,Dioxyde d'azote,Dioxyde de soufre,Ozone,Particules PM10,Particules fines PM2.5,DATE,type,geometry,properties.gml_id,properties.aasqa,properties.date_maj,properties.partition_field,properties.code_no2,properties.code_o3,properties.code_pm10,properties.code_pm25,properties.code_qual,properties.code_so2,properties.code_zone,properties.coul_qual,properties.date_dif,properties.epsg_reg,properties.lib_qual,properties.lib_zone,properties.source,properties.type_zone,properties.x_reg,properties.x_wgs84,properties.y_reg,properties.y_wgs84,MAX_TEMPERATURE_C,MIN_TEMPERATURE_C,WINDSPEED_MAX_KMH,TEMPERATURE_MORNING_C,TEMPERATURE_NOON_C,TEMPERATURE_EVENING_C,PRECIP_TOTAL_DAY_MM,HUMIDITY_MAX_PERCENT,VISIBILITY_AVG_KM,PRESSURE_MAX_MB,CLOUDCOVER_AVG_PERCENT,HEATINDEX_MAX_C,DEWPOINT_MAX_C,WINDTEMP_MAX_C,WEATHER_CODE_MORNING,WEATHER_CODE_NOON,WEATHER_CODE_EVENING,TOTAL_SNOW_MM,UV_INDEX,SUNHOUR,OPINION,SUNSET,SUNRISE,TEMPERATURE_NIGHT_C
0,,11.0,,55.0,8.80,,2021-05-06,Feature,,365623018,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.28980,6978201.0,49.90044,9,3,15,4,9,8,1.5,85,8.75,1011,83.000,9,6,1,116,122,296,0.0,3,10.4,météo très défavorable,20:17:00,05:19:00,3
1,,13.0,,53.0,12.20,,2021-05-07,Feature,,365626807,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.28980,6978201.0,49.90044,12,3,14,4,11,11,0.1,90,9.00,1016,19.750,12,3,2,116,116,116,0.0,2,14.5,météo très défavorable,20:18:00,05:17:00,3
2,,8.0,,43.0,10.40,,2021-05-08,Feature,,365630596,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.28980,6978201.0,49.90044,17,3,28,6,9,17,2.0,91,9.00,1018,61.000,18,13,2,116,266,176,0.0,4,10.4,météo correcte,20:20:00,05:16:00,3
3,,7.0,,48.0,13.80,,2021-05-09,Feature,,365634385,32,2021/10/07 17:38:37.808+02,322021w18,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.28980,6978201.0,49.90044,20,12,31,14,20,17,5.2,91,9.00,1005,72.500,22,15,12,386,353,353,0.0,5,9.4,météo défavorable,20:21:00,05:14:00,12
4,,5.0,,60.0,7.80,,2021-05-10,Feature,,365638174,32,2021/10/07 17:38:37.808+02,322021w19,1,2,1,1,2,1,80021,#50CCAA,2021/10/07,2154,Moyen,AMIENS,Atmo HDF,commune,648935.0,2.28980,6978201.0,49.90044,16,11,27,11,15,14,2.1,92,9.75,1008,77.250,16,13,9,353,122,353,0.0,3,10.5,météo défavorable,20:23:00,05:13:00,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
515,,19.5,,70.0,30.70,19.0,2022-04-28,Feature,,1738396237,32,2022/04/29 13:32:26.692+02,322022w17,1,3,2,2,3,1,59606,#F0E641,2022/04/29,2154,Dégradé,VALENCIENNES,Atmo HDF,commune,736780.0,3.51617,7029164.0,50.35911,20,5,18,5,20,20,0.0,90,10.00,1029,3.500,20,11,2,113,113,113,0.0,5,14.5,météo favorable,21:01:00,06:27:00,5
516,,22.5,,49.0,36.85,24.0,2022-04-29,Feature,,1743143186,32,2022/04/30 13:31:26.763+02,322022w17,1,2,2,3,3,1,59606,#F0E641,2022/04/30,2154,Dégradé,VALENCIENNES,Atmo HDF,commune,736780.0,3.51617,7029164.0,50.35911,16,7,17,7,14,15,0.0,84,10.00,1030,78.375,16,7,5,122,119,119,0.0,3,11.3,météo défavorable,21:03:00,06:25:00,7
517,,22.5,,49.0,36.85,24.0,2022-04-29,Feature,,1738396238,32,2022/04/29 13:32:26.692+02,322022w17,1,2,2,2,2,1,59606,#50CCAA,2022/04/29,2154,Moyen,VALENCIENNES,Atmo HDF,commune,736780.0,3.51617,7029164.0,50.35911,16,7,17,7,14,15,0.0,84,10.00,1030,78.375,16,7,5,122,119,119,0.0,3,11.3,météo défavorable,21:03:00,06:25:00,7
518,,11.0,,73.0,15.70,12.0,2022-04-30,Feature,,1746298149,32,2022/05/01 13:30:27.894+02,322022w17,1,2,1,1,2,1,59606,#50CCAA,2022/05/01,2154,Moyen,VALENCIENNES,Atmo HDF,commune,736780.0,3.51617,7029164.0,50.35911,16,4,22,4,16,15,0.0,95,10.00,1028,16.375,16,7,1,116,113,113,0.0,4,14.5,météo correcte,21:04:00,06:23:00,4


In [43]:
table.to_csv('table.csv')

# 10) Exportation pdf

In [6]:
#pip install -U notebook-as-pdf

In [7]:
#pip install pyppeteer