# **Example notebook**

In [1]:
import risicolive_QC as qc
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

## Algorithm description
The quality check algorithm is composed by a list of consecutive tests. These tests are computed for each row. All rows are flagged according to test passed, and classified in four classes accordingly.
The quality tests considered are the following:
1. **complete test**: this test checks if all variables needed are present in each row. If test fails, the row is flagged and classified as *incomplete*;
2. **range test**: this test checks if values for each variables are in a certain range. If the test fails, the weather station is flagged and classified as *wrong*;
3. **step test**: this test checks if non-physical steps are present. If the test fails, the row is flagged and classified as *suspicious*;
4. **time persistence test**: this test checks if data can be considered time fixed. This test work on a sliding window. If the test fails, the row is flagged and classified as *suspicious*.
5. if all tests are passed, the row is flagged and classified as *good*.
---
## Settings
The user must specify in a dictionary all information needed for the tests:
* *VARS_CHECK*: It is a list: **[varA, varB]**, for the *complete_test*. The test checks if there are NaN values in these variables.
* *RANGES*: It is a dictionary: **{varA: [valueA, valueB]}**, for the *range_test*. The test checks if varA has values outside the range between valueA and ValueB. 
* *STEPS*: It is a dictionary: **{varA: stepA}**, for the *step_test*. The test checks if varA haa steps greater than stepA. 
* *WINDOW*: It is a **int**. Time window for *persistence_test*.
* *VARIATIONS*: It is a dictionary: **{varA: [variationA, valueA, valueB]}**, for the *persistence_test*. The test checks if varA has variation less than variationA in the window specified, when it is in the range between valueA and valueB.

If an information is not provided, the related test is not performed and it is considered as passed. All DEFAUL settings are placed in the **config.py** file. 

In [2]:
qc.DEFAULT

{'VARS_CHECK': ['t', 'h', 'p', 'ws'],
 'RANGES': {'t': [-30, 50], 'h': [0, 100], 'p': [0, 400], 'ws': [0, 75]},
 'STEPS': {'t': 2, 'h': 10},
 'WINDOW': 12,
 'VARIATIONS': {'t': [0.01, -30, 50],
  'h': [0.01, 0, 100],
  'ws': [0.01, 0, 75]}}

---
## **Example**

In [3]:
df_TEST = pd.read_csv('test/test.csv')
df_TEST.loc[:, "date"] = pd.to_datetime(df_TEST.date)
df_TEST = df_TEST.set_index("date")
df_TEST

Unnamed: 0_level_0,p,t,h,ws
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2022-09-26 14:20:00,0.0,14.7,65.0,1.9
2022-09-26 14:30:00,0.0,15.3,66.0,2.1
2022-09-26 14:40:00,0.0,14.7,59.0,1.5
2022-09-26 14:50:00,0.0,13.7,60.0,4.3
2022-09-26 15:00:00,0.0,13.7,58.0,3.7
...,...,...,...,...
2022-09-27 13:40:00,,,,
2022-09-27 13:50:00,,,,
2022-09-27 14:00:00,,,,
2022-09-27 14:10:00,,,,


In [4]:
# TEST ON A SINGLE STATION: DEFAULT ################################
# the function "quality_check" returns a new dataframe with the column of flags "QC" and the classes "QC_LABEL"
df_check = qc.quality_check(df_TEST)
df_check

Unnamed: 0_level_0,QC,QC_LABEL
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-09-26 14:20:00,23,SUSPICIOUS
2022-09-26 14:30:00,31,GOOD
2022-09-26 14:40:00,31,GOOD
2022-09-26 14:50:00,31,GOOD
2022-09-26 15:00:00,31,GOOD
...,...,...
2022-09-27 13:40:00,18,INCOMPLETE
2022-09-27 13:50:00,18,INCOMPLETE
2022-09-27 14:00:00,18,INCOMPLETE
2022-09-27 14:10:00,18,INCOMPLETE


In [5]:
# TEST ON A SINGLE STATION: CHANGE CONFIG
## let define a new settings, where complete_test, range_test and persistence_test are performed for the variable 't'
## and the step_test for variable 'h'
settings_new = {
        'VARS_CHECK':['t'],
        'RANGES':{
            't':[0, 50]
        },
        'STEPS': {
            'h': 2,
        },
        'WINDOW': 3, 
        'VARIATIONS':{
            't':[1,-10,50]
        }
}
df_check_new = qc.quality_check(df_TEST, settings=settings_new)
df_check_new

Unnamed: 0_level_0,QC,QC_LABEL
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-09-26 14:20:00,23,SUSPICIOUS
2022-09-26 14:30:00,31,GOOD
2022-09-26 14:40:00,7,SUSPICIOUS
2022-09-26 14:50:00,31,GOOD
2022-09-26 15:00:00,31,GOOD
...,...,...
2022-09-27 13:40:00,18,INCOMPLETE
2022-09-27 13:50:00,18,INCOMPLETE
2022-09-27 14:00:00,18,INCOMPLETE
2022-09-27 14:10:00,18,INCOMPLETE
