# QA/QC
 
### What is QA/QC:
The task of annotating the quality of collected data/observation:

    - GOOD
    - BAD
    - SUSPECT
    - UNKNOWN
    
### Why QA/QC is needed:

Due to different conditions in the natural environment, observations collected by sensors may not be reliable.  
The quality assurance and control of data collected from sensor is very important to make sure the applicability.

### What are methods to annotate data with QA/QC flags:

[IOOS](https://ioos.github.io/ioos_qc/resources.html) has defined standard and statistical methods to annotate quality check on data. As for each essential ocean variable (EoV) different set of statistical tests are recommended to ensure the quality of collected data. Please read the [mannual](https://github.com/ioos/ioos_qc/blob/main/resources/argo-quality-control-manual.pdf) and [mannual In-situ](https://cdn.ioos.noaa.gov/media/2019/08/QARTOD_Currents_Update_Second_Final.pdf)


### Why thresholds are necessary

[IOOS](https://ioos.github.io/ioos_qc/resources.html) developed a IOOS_QC python package in which different statistical functions are implemented. As each statistical function requires series to observations (time-series data), as well as it requires additional parameters which are referred as thresholds. 

For Example:

The most basic test for each EoV is `range test` where the natural range of value is used to validate the observation (data value). E.g., for sea surface temperature the global range is between -2.5 and 40.0.

### What current data is avaiable which has QA/QC annotations

### How we are trying to estimate

In the CIOOS Atlantic, only datasets provided by CMAR contain quality check flags related to sea surface temperature, dissolve oxygen, salinity, and depth.

In [None]:
import pandas as pd
import math

## Step#1:  Download the data as csv
The csv file contain first column as field name and second column contains the unit of each value.

## Step#2: Replacing String flag into int Flags
By default, the raw data contains flag in String but for easy to process we replace string into integer to optimize. We save the a new file as .csv

In [None]:
class QartodFlags:
    """Primary flags for QARTOD."""
    GOOD = 1
    UNKNOWN = 2
    SUSPECT = 3
    FAIL = 4
    MISSING = 9

################# REPLACE FLAGS FROM STRING TO INT FROM CSV##################
def custom_replacement(value):
    if value == 'Not Evaluated':
        return QartodFlags.UNKNOWN
    elif value == 'Pass':
        return QartodFlags.GOOD
    elif value == 'Suspect/Of Interest':
        return QartodFlags.SUSPECT
    elif value == 'Fail':
        return QartodFlags.FAIL
    elif math.isnan(float(value)):
        return -1
    else:
        print(f"Unknown [{value}]")

    return value

csv_name = "D://CIOOS-Full-Data/"
# if the file is big then process into chunks
df_chunks = pd.read_csv(csv_name, chunksize=10000)
columns_ = None
header_written = False
for df in df_chunks:
    if columns_ is None:
        lst_col  = list(df.columns)
        # columns which starts with `qc_` are flag columns
        columns_ = [col for col in lst_col if "qc" in col.lower()]
    for col in columns_:
        df[col] = df[col].apply(custom_replacement)

    df.to_csv(csv_name.replace(".csv", "_FlagCode.csv") , index=False, mode='a', header= not header_written)
    header_written = True
#######################################################

## Step#3: Grouping the data
The data is naturally grouped into station and sensor. 


## Current Feature Set:
1)	Rolling Standard Deviation
2)	Past-window mean – current value
3)	Future-window mean – current value
4)	Future-window mean – Past-window mean
5)	current value – ( (lead value – lag value)  / 2)
6)	Month Average Hourly Change – (lag Value – current Value)
7)	Month Average Hourly Change – (lead Value – current Value)
8)	(Current Value – q_997) if current value > q_997 else 0
9)	(Current Value – q_003) if current value < q_003 else 0
10)	 (Current Value – fwq_997) if current value > fwq_997 else 0
11)	(Current Value – fwq_003) if current value > fwq_003 else 0
12)	Month (1 - 12)
