<div class="alert alert-info">
<u><strong>Authors:</strong></u> <b>Alberto Vavassori</b> (alberto.vavassori@polimi.it), <b>Andrea Folini</b> (andrea.folini@polimi.it), and <b>Mathilde Puche</b> (mathildedanielle.puche@mail.polimi.it) - 2023 - Politecnico di Milano, Italy <br>
</div>

# ARPA and Netatmo temperature time series cleaning

Import functions for the automatic cleaning of ARPA and Netamto time series.

In [None]:
import arpa_cleaning as ac
import netatmo_cleaning as nc
import pandas as pd
import ipywidgets as widgets
import datetime
pd.set_option('display.max_columns', None)

In [None]:
%load_ext autoreload

In [None]:
year_w = widgets.Dropdown(
    options = [i for i in range(2014, 2024)],
    value = 2023,
    description = 'Year:',
    disabled = False,
    layout = {'width': 'max-content'},
    style = {'description_width': 'initial'}
)
year_w

In [None]:
year = year_w.value

In [None]:
month_w = widgets.Dropdown(
    options = [i for i in range(1, 13)],
    value = 1,
    description = 'Month:',
    disabled = False,
    layout = {'width': 'max-content'},
    style = {'description_width': 'initial'}
)
month_w

In [None]:
month = month_w.value

In [None]:
arpa_first_month = month_w.value
arpa_last_month = month_w.value

Open the csv file containing the ARPA data of the selected year.

In [None]:
arpa_out_path = 'Arpa_csv_files/'

In [None]:
# original_arpa_data = pd.read_csv(arpa_out_path + '%s_milan.csv' %year, skiprows=0)
original_arpa_data = pd.read_csv(arpa_out_path + 'ARPA_%s.csv' %year, skiprows=0)

Open the Netatmo csv file containing the measures of the selected year and month.

In [None]:
netatmo_out_path = 'Netatmo_csv_files/'

In [None]:
original_netatmo_data = pd.read_csv(netatmo_out_path + 'temp_Net_milan_%s-%s_clip.csv' % (year, month), skiprows=0)

<u>Original number of Netatmo stations:

In [None]:
len(original_netatmo_data['module_id'].unique())

<u>Original number of Netatmo measurements:

In [None]:
len(original_netatmo_data)

Use the following function (`remove_irregularity_in_dataset`) to remove duplicate or irregularity in the data set.

*Use it only for December 2019, June, July, September, and October 2020, and August and September 2021.*

In [None]:
# original_netatmo_data = nc.remove_irregularity_in_dataset(original_netatmo_data , year, month)

### Clear ARPA data and create the ARPA virtual station

In [None]:
arpa_clean = ac.remove_outliers(year, arpa_first_month, arpa_last_month, original_arpa_data, arpa_out_path)
arpa_virtual_station = ac.create_virtual_station(year, arpa_first_month, arpa_last_month, arpa_clean, arpa_out_path)

### Remove low correlated stations

In [None]:
arpa_net_correlation = nc.compute_corr(year, month, original_netatmo_data, arpa_virtual_station, netatmo_out_path)
netatmo_high_corr, correlation_stats = nc.remove_low_corr(year, month, netatmo_out_path, arpa_net_correlation, original_netatmo_data)

<u>Number of stations kept:

In [None]:
len(netatmo_high_corr)

<u>Number of measurements kept:

In [None]:
len(netatmo_high_corr['module_id'].unique())

### Remove unrealistic values

In [None]:
netatmo_realistic, unrealistic_stats = nc.remove_unrealistic_values(year, month, netatmo_out_path, netatmo_high_corr, arpa_virtual_station)

### Remove biased time series

In [None]:
netatmo_unbiased, biased_tot_stats, biased_station_stats = nc.remove_biased_series(year, month, netatmo_out_path, netatmo_realistic, arpa_virtual_station)

<u>Number of stations kept:

In [None]:
len(netatmo_unbiased)

<u>Number of measurements kept:

In [None]:
len(netatmo_unbiased['module_id'].unique())

### Remove local outliers

In [None]:
netatmo_cleaned = nc.remove_local_outliers(year, month, netatmo_out_path, netatmo_unbiased)

<u>Number of stations kept:

In [None]:
len(netatmo_cleaned)

<u>Number of measurements kept:

In [None]:
len(netatmo_cleaned['module_id'].unique())

### Remove unreliable stations

In [None]:
reliability_df, filtered_stations, removed_stations, netatmo_filtered = nc.remove_unreliable_stations(original_netatmo_data, netatmo_cleaned, netatmo_out_path, year, month)

<u>Number of stations kept:

In [None]:
len(netatmo_filtered)

<u>Number of measurements kept:

In [None]:
len(netatmo_filtered['module_id'].unique())