# Creating New Tests

### Objective:
Show how to extend CoTeDe by creating new QC checks.

CoTeDe contains a collection of checks to evaluate the quality of the data. The user can define the parameters for each test such as changing the acceptable threshold of the spike check, but sometimes it might be necessary a completely different procedure. CoTeDe was developed with the principle of a single engine where the modular checks can be plugged in. Here you will see how to create a new check.

In [1]:
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

In [2]:
output_notebook()

Currently there are two main types of tests, QCCheck() and QCCheckVar().

- QCCheck is a hard coded test and doesn't change according to the variable being evaluated. The criteria doesn't change independent of the variable being evaluated. For instance, the increasing pressure test defined by Argo will always check pressure, independent if the goal is to QC temperature, salinity, or chlorophyll.

- QCCheckVar checks a given variable. The criteria is applied on the specific variable. For instance, although the spike test procedure is always the same, it is applied on the temperature values if the temperature 

In [3]:
from cotede.qctests import QCCheck, QCCheckVar

https://github.com/castelao/CoTeDe/blob/master/cotede/qctests/qctests.py

Note that QCCheck() only requires the data object as input.

Let's suppose that platforms 10 and 11 had bad sensors and any measurements from those should be flagged bad. Note that in this case it doesn't matter which variable we are evaluating, so let's create a new test based on cotede.qctests.QCCheck

The first question is how is the platform identified in the data object? Let's suppose that this is available in the attributes of the data object, i.e. in data.attrs.

In [4]:
class GreyList(QCCheck):
    def test(self):
        """Example test to identify measurements from known bad platforms
        
        How to identify the platform in this data object? You need to tell. Let's suppose
        that it is available at
        >>> self.data.attrs["platform"]
        """
        
        platform = self.data.attrs["platform"]
        
        self.flags = {}
        if platform in (10, 11):
            flag = np.array(self.flag_bad, dtype="i1")
        else:
            flag = np.array(self.flag_good, dtype="i1")
            

In [5]:
class MaximumValue(QCCheckVar):
    def test(self):
        assert ("threshold" in self.cfg), "Missing acceptable threshold"

        threshold = self.cfg["threshold"]

        feature = ma.fix_invalid(self.data[self.varname])

        self.flags = {}
        flag = np.zeros(feature.shape, dtype="i1")
        flag[np.nonzero(feature < minval)] = self.flag_bad
        flag[np.nonzero(feature > maxval)] = self.flag_bad
        idx = (feature >= minval) & (feature <= maxval)
        flag[np.nonzero(idx)] = self.flag_good
        flag[ma.getmaskarray(feature)] = 9
        self.flags["global_range"] = flag

### Spike test for chlorophyll - BGC Argo

BGC Argo defines the spike test based on a running median, defined as

RES = V2 - median(V0, V1, V2, V3, V4)

bad if RES < 2 * percentile10(RES)

Where percentile10 is the lowest 10% measurements for that profile

In [6]:
try:
    import pandas as pd

    PANDAS_AVAILABLE = False
except:
    PANDAS_AVAILABLE = True


def spike_median(x):
    res = x - x.rolling(5, center=True).median()
    return res / res.quantile(.1)


class BGCChlSpike(QCCheckVar):
    """Spike test as recommended by the BGC Argo
    """
    cfg = {"threshold": 2}
    
    def set_features(self):
        self.features = {
            "spike_median": spike_median(self.data[self.varname]),
        }
        
    def test(self):
        assert ("threshold" in self.cfg), "Missing acceptable threshold"

        threshold = self.cfg["threshold"]

        feature = ma.fix_invalid(self.data[self.varname])

        self.flags = {}
        flag = np.zeros(feature.shape, dtype="i1")
        flag[feature > threshold] = self.flag_bad
        flag[feature <= threshold] = self.flag_good
        x = self.data[self.varname]
        flag[ma.getmaskarray(x) | ~np.isfinite(x)] = 9
        self.flags["bgc_spike"] = flag