# Development of a Widget

The goal of this notebook is providing a widget. 

This widget will compare the level of ozone pollution in cities of Occitanie.
The widget will offer 2 interactive options: 
  +  The first one allows you to choose three cities
  +  The second one allows you to choose the month on the past 12 months

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact  # widget manipulation
from IPython.display import HTML

## 1)  Data importation:


We import data from the following website:

http://data-atmo-occitanie.opendata.arcgis.com/datasets/4a648b54876f485e92f22e2ad5a5da32_0

This website is about quality monitoring in the Occitanie region. There are daily, monthly and annual data about pollution.

In [2]:
from download import download
# We choose daily data
url = "https://opendata.arcgis.com/datasets/2ab16a5fb61f42c1a689fd9cc466383f_0.csv"
path_target = "datasets/Mesure_journaliere_Region_Occitanie_Polluants_Principaux.csv"
download(url, path_target, replace=True)

Downloading data from https://opendata.arcgis.com/datasets/2ab16a5fb61f42c1a689fd9cc466383f_0.csv (1 byte)



                                                                                

Successfully downloaded file to datasets/Mesure_journaliere_Region_Occitanie_Polluants_Principaux.csv


file_sizes: 10.7MB [00:10, 1.04MB/s]

'datasets/Mesure_journaliere_Region_Occitanie_Polluants_Principaux.csv'

In [3]:
occ = pd.read_csv(path_target) # all data of Occitanie

## 2) Data treatment

We must give the date a usable format to exploit datas:

In [4]:
occ['date'] = pd.to_datetime(occ['date_debut']).dt.to_period('M')
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.to_period.html
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html



Variables we care about for our widget in this dataset are: 

   + *nom_com* : name of the city
   + *nom_station* : station's name, usefull because there can be more than one station per city
   + *code_station* : station's code
   + *nom_poll* : polluting's name
   + *valeur* : value of polluting
   + *date_debut*: with the format year/month/day/hour that make the beginning of the measurement
   + *date_fin*: end of the measurement

We only care about one pollutiong: **Ozone** 

In [5]:
occ = occ[occ['nom_poll'] == 'O3'] # only ozone

The available cities for ozone data are:

In [6]:
occ.nom_com.unique() # cities with ozone data available 

array(['NIMES', 'AGDE', 'CAUNES-MINERVOIS', 'BELESTA-EN-LAURAGAIS',
       'MONTGISCARD', 'MIRAMONT-DE-COMMINGES', 'PERPIGNAN',
       'SAINT-ESTEVE', 'PEYRUSSE-VIEILLE', 'LA CALMETTE', 'TOULOUSE',
       'MONTPELLIER', 'TARBES', 'CARCASSONNE', 'LATTES',
       'SAINT-GELY-DU-FESC', 'LOURDES', 'ROQUEREDONDE', 'ALBI', 'RODEZ',
       'BIARS-SUR-CERE', 'SAZE', 'CASTRES', 'CORNEILHAN', 'MENDE',
       'VALLABREGUES'], dtype=object)

It's 26 cities with more stations.

## 3) Widget Development

This first widget only allow to choose the date. 3 cities are already choosen: Lourdes, Toulouse and Montpellier.

In [7]:
def poluted_cities0(month):
    
    station = 'FR50030', 'FR50200', 'FR50042'
    df_villes = occ[occ['code_station'].isin(station)]
    
    df_villes = df_villes[df_villes.date == month]
    
    plt.style.use('dark_background')
    sns.catplot(x = 'nom_com', y = 'valeur', 
            data = df_villes,
            height = 3, aspect = 2,
            kind = 'boxen')
    plt.tight_layout()
    plt.xlabel('Cities')
    plt.ylabel('O3')
    plt.title("Ozone measurement of 3 cities in a month")
    plt.show()

In [8]:
interact(poluted_cities0, month=occ.date.unique())

interactive(children=(Dropdown(description='month', options=(Period('2019-08', 'M'), Period('2019-12', 'M'), P…

<function __main__.poluted_cities0(month)>

## 4) Second Widget development

The following widget compares again pollution in three cities, but we can both choose the cities and the month:

In [9]:
def poluted_cities(month, station_1='Montpellier Nord - Périurbain', 
                    station_2='Lourdes-Lapaca Urbain', 
                    station_3='Toulouse-Berthelot Urbain'):
    
    stations = station_1, station_2, station_3
    
    df_station = occ[occ['nom_station'].isin(stations)] # only stations we ask
    df_station = df_station[df_station['nom_poll'] == 'O3'] # only ozone  
    df_station = df_station[df_station.date == month]
    df_station = df_station[['nom_com', 'nom_station', 'valeur', 'date']]
    
    plt.style.use('dark_background')
    sns.catplot(x = 'nom_com', y = 'valeur', 
            data = df_station,
            height = 3, aspect = 2,
            kind = 'boxen')
    plt.tight_layout()
    plt.xlabel('Cities')
    plt.ylabel('O3')
    plt.title("Ozone measurement of 3 cities in a month")
    plt.show()    

In [10]:
interact(poluted_cities, station_1=occ.nom_station.unique(), 
         station_2=occ.nom_station.unique(), 
         station_3=occ.nom_station.unique(), 
         month=occ.date.unique())

interactive(children=(Dropdown(description='month', options=(Period('2019-08', 'M'), Period('2019-12', 'M'), P…

<function __main__.poluted_cities(month, station_1='Montpellier Nord - Périurbain', station_2='Lourdes-Lapaca Urbain', station_3='Toulouse-Berthelot Urbain')>

*Warning: the current month is not complete, as is the first month of the dataset, only one year of data is available.*