# Station Air Pollution Estimation

For each station, find a good model to predict the individual pollutants.

In [1]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import seaborn as sns


from utils import *

datasets_folder = './datasets'
verbosity=0

## Data Import

In [2]:
giardini_margherita_pollution_dict, san_felice_pollution_dict, chiarini_pollution_dict = read_and_preprocess_dataset(datasets_folder, 'pollution', v=verbosity)
giardini_margherita_traffic_df, san_felice_traffic_df, chiarini_traffic_df = read_and_preprocess_dataset(datasets_folder, 'traffic', v=verbosity)
weather_df = read_and_preprocess_dataset(datasets_folder, 'weather', v=verbosity)

In [6]:
giardini_margherita_data = prepare_station_data_for_training(giardini_margherita_pollution_dict, giardini_margherita_traffic_df, weather_df)
san_felice_data = prepare_station_data_for_training(san_felice_pollution_dict, san_felice_traffic_df, weather_df)
chiarini_data = prepare_station_data_for_training(chiarini_pollution_dict, chiarini_traffic_df, weather_df)

## Models

For each station we develop a model to describe the air pollution. Given that each station collects different data, and sometimes with different intensities, we decided to treat each station indepentendtely.

Air polution follows this criteria:
- for each pollutant a certain limit is defined (like 25 µg/m³ for PM2.5)
- for each pollutant $ AQI = \frac{Concentration}{Limit} \times 100 $
- take the maximum value across all the IQAs 

The overage value is then matched on the following table:

| **AQI**         | **CONDITIONS** |
| --------------- | -------------- |
| <30             | EXCELLENT      |
| From 34 to 66   | GOOD           |
| From 67 to 99   | FAIR           |
| From 99 to 150  | POOR           |
| >150            | VERY POOR      |

We are going to model each agent to predict its hour value and then use all of the predictions to compute the overall Air Quality Index and compare it with the one got from the readings.

In [None]:
pollutant_limits = {
    'PM2.5': 25,    # µg/m³
    'PM10': 50,     # µg/m³
    'CO': 10,       # mg/m³
    'O3': 180,      # µg/m³
    'NO': None,     # µg/m³
    'NO2': 200,     # µg/m³
    'NOX': None,    # µg/m³
    'C6H6': None    # µg/m³
}

### Giardini Margherita