# Example QARTOD Testing

#### Intro
QARTOD (Quality-Assurance of Real-Time Oceanographic Data) is the effort by the broader oceanographic observing community to standardize processes related to quality control of oceanographic data. Part of the standardization is identification and recommendations of algorithms with which to test data returned by the sensor for evaluating data quality. Currently, OOI is implementing the gross range and climatology tests, which utilize either a three-standard-deviation threshold (gross range) or a monthly-varying range (climatology) determined using a two-cycle harmonic model. The available flags are:

The thresholds for the tests are calculated and save in tables that are stored on gitHub for ingenstion into OOINet. 

#### Purpose
This is an example notebook for the (1) testing and verification of the implemented QARTOD tests are performing as expected and (2) calculating some descriptives summary statistics of the returned QARTOD flags. Sampling and verification involves running the QARTOD tests locally with the appropriate QARTOD values from the gitHub tables and comparing the results with what was returned with the dataset from OOINet. The summary statistics 

1. Testing & Verification
Before a QARTOD test is moved to production, it first is implemented in the Development (Dev-1) environment on some example datasets. We want to 

ONce

2. Statisticis
Similary, once a test is successfully running on production and available to end-users to be downloaded as part of the datasets, we want to calculate some descriptive statistics from the QARTOD tests that let us know some . Outlined below is a very high level 

#### Summary Statisitcs
We calculate the total number



#### Comparison
Finally, we want to run the QARTOD tests locally and execute a comparison of the results against the 

In [1]:
# Import libraries
import os
import re
import gc
import io
import ast
import pandas as pd
import numpy as np
import xarray as xr
import warnings
warnings.filterwarnings("ignore")

In [2]:
from ooinet import M2M
from ooinet.Instrument.common import process_file

In [3]:
from ooi_data_explorations.uncabled.process_dosta import dosta_datalogger
from ooi_data_explorations.combine_data import combine_datasets

In [4]:
import dask
from dask.diagnostics import ProgressBar

---
## Request and load the data
Based on the reference designator, method, and stream, we want to load the data into 

In [5]:
# Setup parameters needed to request data
refdes = "CP01CNSM-MFD37-03-CTDBPD000"
method = "recovered_inst"
stream = "ctdbp_cdef_instrument_recovered"

In [6]:
# Generic preprocessing routine to do some generic dataset cleaning/processing
@dask.delayed
def preprocess(ds):
    ds = xr.open_dataset(ds)
    ds = process_file(ds)
    return ds

#### Production Data
Production data is the data that is available from ooinet.oceanobservatories.org. This data is what is served via the API and the data portal (ooinet). The goldcopy is a static catalog of datasets that are used by Data Explorer (dataexplorer.oceanobservatories.org) and are updated once-a-day with new data. The goldcopy data is much faster to request and download.

In [7]:
# Use the gold copy THREDDs datasets
thredds_url = M2M.get_thredds_url(refdes, method, stream, goldCopy=True)

# Get the THREDDs catalog
thredds_catalog = M2M.get_thredds_catalog(thredds_url)

# Clean the THREDDs catalog
sensor_files, ancillary_files = M2M.clean_catalog(thredds_catalog, stream)

# Now build the url to access the data
sensor_files = [re.sub("catalog.html\?dataset=", M2M.URLS["goldCopy_dodsC"], file) for file in sensor_files]
zs = [preprocess(file) for file in sensor_files]

# Load all the datasets
with ProgressBar():
    data = xr.concat([ds.chunk() for ds in dask.compute(*zs)], dim="time")

[########################################] | 100% Completed | 27.85 s


In [9]:
# Make a copy of the data
ds = data.copy()

#### Development
We may also want to examine new QARTOD tests which are on staging in the Dev-1 environment before they are moved to production. The Development environemt at ooinet-dev1-west.intra.oceanobservatories.org. In order to access data on Dev-1, you need to be granted access and be connected to the CI-West VPN (vpn-west.oceanobservatories.org) at Oregon State.

The Dev-1 environment has no "goldcopy" equivalent THREDDs catalog. Instead we'll have to do the normal request and wait for the datasets to be assembled and made available for download.

In [None]:
# Sub in ooinet-dev1-west.intra.oceanobservatories.org into the avaialbe API urls
for key in M2M.URLS:
    url = M2M.URLS.get(key)
    if "opendap" in url:
        dev1_url = re.sub("opendap", "opendap-dev1-west.intra", url)
    else:
        dev1_url = re.sub("ooinet","ooinet-dev1-west.intra", url)
    Dev01_urls[key] = dev1_url
    
# Use the gold copy THREDDs datasets
thredds_url = M2M.get_thredds_url(refdes, method, stream, goldCopy=True)

# Get the THREDDs catalog
thredds_catalog = M2M.get_thredds_catalog(thredds_url)

# Clean the THREDDs catalog
sensor_files, ancillary_files = M2M.clean_catalog(thredds_catalog, stream)

# Now build the url to access the data
sensor_files = [re.sub("catalog.html\?dataset=", M2M.URLS["goldCopy_dodsC"], file) for file in sensor_files]
zs = [preprocess(file) for file in sensor_files]

# Load all the datasets
with ProgressBar():
    data = xr.concat([ds.chunk() for ds in dask.compute(*zs)], dim="time")

In [None]:
def swap_timestamps(ds):
    """
    Swaps the timestamps from the host to the instrument timestamp
    for the CTDBPs
    """
    if "internal_timestamp" in ds.variables:
        # Calculate the timestamp
        inst_time = ds.internal_timestamp.to_pandas()
        attrs = ds.internal_timestamp.attrs
        # Convert the time
        inst_time = inst_time.apply(lambda x: np.datetime64(int(x), 's'))
        # Create a DataArary
        da = xr.DataArray(inst_time, attrs=attrs)
        ds['internal_timestamp'] = da
    ds = ds.set_coords(["internal_timestamp"])
    ds = ds.swap_dims({"time":"internal_timestamp"})
    ds = ds.reset_coords("time")
    ds = ds.rename_vars({"time":"host_time"})
    ds["host_time"].attrs = {
        "long_name": "DCL Timestamp",
        "comment": ("The timestamp that the instrument data as recorded by the mooring data "
                    "concentration logger (DCL)")
    }
    ds = ds.rename({"internal_timestamp":"time"})
    return ds

#### Identify Test Parameters
Next, identify which parameters in the dataset have QARTOD applied to them. Sometimes the variable name in the dataset is different that the key that is used by OOINet to build the datasets. For that we can check the attributes of the variable for the "alternate_parameter_name"!

In [11]:
# Create a dictionary of key-value pairs of dataset variable name:alternate parameter name
test_parameters={}
for var in data.variables:
    if "qartod_results" in var:
        # Get the parameter name
        param = var.split("_qartod")[0]
        
        # Check if the parameter has an alternative ooinet_name
        if "alternate_parameter_name" in ds[param].attrs:
            ooinet_name = ds[param].attrs["alternate_parameter_name"]
        else:
            ooinet_name = param
        
        # Safe the results in a dictionary
        test_parameters.update({
            param: ooinet_name
        })
# Print out the results
test_parameters

{'sea_water_electrical_conductivity': 'ctdbp_seawater_conductivity',
 'sea_water_temperature': 'ctdbp_seawater_temperature',
 'sea_water_practical_salinity': 'practical_salinity',
 'sea_water_pressure': 'ctdbp_seawater_pressure'}

---
## Testing & Verification

To verify the results of the QARTOD tests being run by OOINet, we want to compare the QARTOD flags returned with the datasets against the results from running the tests locally using the same inputs. First, we have to parse out the separate test results from the ```qartod_executed``` variable. Then, we parse and load the appropriate gitHub tables. With the correct input tables, we can then run the different tests locally. Finally, we directly compare the locally-run results against what was returned with the dataset and identify any disagreements. 

#### Parse the QARTOD Executed
The ```qartod_executed``` variable for a given parameter contains the individual QARTOD test flags. For each datum, flags are listed in a string matching the order of the tests_executed attribute. Flags should be interpreted using the standard QARTOD mapping: \[1: pass, 2: not_evaluated, 3: suspect_or_of_high_interest, 4: fail, 9: missing_data\].

For verification, we first want to split out each test into its own separate variable, named using the following convention: {param}\_qartod\_{test_name}. For example, parsing out the gross range test results for the CTD parameter ```sea_water_practical_salinity``` from the qartod flags ```sea_water_practical_salinity_qartod_executed``` will return a variable ```sea_water_practical_salinity_qartod_gross_range``` with just flags corresponding to the results of the gross range QARTOD test.

In [20]:
ds.sea_water_electrical_conductivity_qartod_executed

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 52 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,


In [16]:
import io
import ast
import requests

def parse_qartod_executed(ds, parameters):
    """
    Parses the qartod tests for the given parameter into separate variables.
    
    Parameters
    ----------
    ds: xarray.DataSet
        The dataset downloaded from OOI with the QARTOD flags applied.
    pparameters: list[str]
        The name of the parameters in the dataset to parse the QARTOD flags
        
    Returns
    -------
    ds: xarray.DataSet
        The dataset with the QARTOD test for the given parameters split out
        into new seperate data variables using the naming convention:
        {parameter}_qartod_{test_name}
    """
    # For the params into a list if only a string
    if type(parameters) is not list:
        parameters = list(parameters)
    
    # Iterate through each parameter
    for param in parameters:
        # Generate the qartod executed name
        qartod_name = f"{param}_qartod_executed"
        
        if qartod_name not in ds.variables:
            continue
    
        # Fix the test types
        ds[qartod_name] = ds[qartod_name].astype(str)
    
        # Get the test order
        test_order = ds[qartod_name].attrs["tests_executed"].split(",")
    
        # Iterate through the available tests and create separate variables with the results
        for test in test_order:
            test_index = test_order.index(test)
            test_name = f"{param}_qartod_{test.strip()}"
            ds[test_name] = ds[qartod_name].str.get(test_index)

    return ds

In [17]:
# Put the test parameter names in the dataset into a list
parameters = [x for x in test_parameters.keys()]
parameters

['sea_water_electrical_conductivity',
 'sea_water_temperature',
 'sea_water_practical_salinity',
 'sea_water_pressure']

In [18]:
# Parse all of the variables with QARTOD tests applied into separate tests
ds = parse_qartod_executed(ds, parameters)
ds

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type datetime64[ns] numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type datetime64[ns] numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 52 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 52 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type int32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 0.93 MiB 286.22 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float32 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,0.93 MiB,286.22 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 52 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.86 MiB 572.44 kiB Shape (243242,) (73272,) Count 39 Tasks 13 Chunks Type float64 numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,1.86 MiB,572.44 kiB
Shape,"(243242,)","(73272,)"
Count,39 Tasks,13 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 52 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,52 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 92 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 92 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 92 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 92 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 92 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,
"Array Chunk Bytes 59.39 MiB 17.89 MiB Shape (243242,) (73272,) Count 92 Tasks 13 Chunks Type numpy.ndarray",243242  1,

Unnamed: 0,Array,Chunk
Bytes,59.39 MiB,17.89 MiB
Shape,"(243242,)","(73272,)"
Count,92 Tasks,13 Chunks
Type,numpy.ndarray,


#### Load & Parse the GitHub QARTOD Tables
We can grab the QARTOD tables with the test values straight from GitHub, which ensures we are using the same input and threshold values as OOINet. However, the QARTOD tables utilize the ```ooinet_parameter_name``` instead of the dataset variable name. Thus, when loading the tables we need to make sure we are requesting the correct parameter name.

In [13]:
GITHUB_BASE_URL = "https://raw.githubusercontent.com/oceanobservatories/qc-lookup/master/qartod"

def load_gross_range_qartod_test_values(refdes, stream, ooinet_param):
    """
    Load the gross range QARTOD test from gitHub
    """
    subsite, node, sensor = refdes.split("-", 2)
    sensor_type = sensor[3:8].lower()
    
    # gitHub url to the gross range table
    GROSS_RANGE_URL = f"{GITHUB_BASE_URL}/{sensor_type}/{sensor_type}_qartod_gross_range_test_values.csv"
    
    # Download the results
    download = requests.get(GROSS_RANGE_URL)
    if download.status_code == 200:
        df = pd.read_csv(io.StringIO(download.content.decode('utf-8')))
        df["parameters"] = df["parameters"].apply(ast.literal_eval)
        df["qcConfig"] = df["qcConfig"].apply(ast.literal_eval)
        
    # Next, filter for the desired parameter
    mask = df["parameters"].apply(lambda x: True if x.get("inp") == ooinet_param else False)
    df = df[mask]
    
    # Now filter for the desired stream
    df = df[(df["subsite"] == subsite) & 
            (df["node"] == node) & 
            (df["sensor"] == sensor) &
            (df["stream"] == stream)]
    
    return df


def load_climatology_qartod_test_values(refdes, param):
    """
    Load the OOI climatology qartod test values table from gitHub
    
    Parameters
    ----------
    refdes: str
        The reference designator for the given sensor
    param: str
        The name of the 
    """
    
    site, node, sensor = refdes.split("-", 2)
    sensor_type = sensor[3:8].lower()
    
    # gitHub url to the climatology tables
    CLIMATOLOGY_URL = f"{GITHUB_BASE_URL}/{sensor_type}/climatology_tables/{refdes}-{param}.csv"
    
    # Download the results
    download = requests.get(CLIMATOLOGY_URL)
    if download.status_code == 200:
        df = pd.read_csv(io.StringIO(download.content.decode('utf-8')), index_col=0)
        df = df.applymap(ast.literal_eval)
    else:
        return None
    return df

In [21]:
# Example: load the gross range QARTOD table for a specific parameter
gross_range_qartod_test_values = load_gross_range_qartod_test_values(refdes, stream, "ctdbp_seawater_temperature")
gross_range_qartod_test_values

Unnamed: 0,subsite,node,sensor,stream,parameters,qcConfig,source,notes
224,CP01CNSM,MFD37,03-CTDBPD000,ctdbp_cdef_instrument_recovered,{'inp': 'ctdbp_seawater_temperature'},{'qartod': {'gross_range_test': {'suspect_span...,Sensor min/max derived from vendor documentati...,


In [22]:
# Example: load the climatology QARTOD table for a specific parameter
climatology_qartod_test_values = load_climatology_qartod_test_values(refdes, "ctdbp_seawater_temperature")
climatology_qartod_test_values

Unnamed: 0,"[1, 1]","[2, 2]","[3, 3]","[4, 4]","[5, 5]","[6, 6]","[7, 7]","[8, 8]","[9, 9]","[10, 10]","[11, 11]","[12, 12]"
"[0, 0]","[12.5076, 14.8184]","[12.0924, 13.7305]","[11.5657, 13.3547]","[11.4895, 13.4294]","[11.4483, 13.9555]","[11.58, 14.1914]","[11.1502, 14.7142]","[11.7184, 14.3547]","[11.3198, 15.4957]","[12.7114, 15.2397]","[13.2261, 15.5345]","[12.3546, 16.1987]"


#### Run Tests Locally
Next, we run the gross range test locally to get local results that can be compared with the output from the tests. This is done using the ```ioos_qc``` QARTOD package in conjunction with the ```qartod_test_values``` tables.

#### Gross Range Test

In [23]:
# Import the ioos_qc QARTOD package tests
from ioos_qc.qartod import gross_range_test, climatology_test, ClimatologyConfig

In [24]:
# Run through all of the parameters which had the QARTOD tests applied by OOINet and
# run the tests locally, saving the results in a dictionary
gross_range_results = {}
for param in test_parameters:
    # Get the ooinet name
    ooinet_name = test_parameters.get(param)
    
    # Load the gross_range_qartod_test_values from gitHub
    gross_range_qartod_test_values = load_gross_range_qartod_test_values(refdes, stream, ooinet_name)
    
    # Get the qcConfig object, the fail_span, and the suspect_span
    qcConfig = gross_range_qartod_test_values["qcConfig"].values[0]
    fail_span = qcConfig.get("qartod").get("gross_range_test").get("fail_span")
    suspect_span = qcConfig.get("qartod").get("gross_range_test").get("suspect_span")
    
    # Run the gross_range_tenst
    param_results = gross_range_test(
        inp = ds[param].values,
        fail_span = fail_span,
        suspect_span = suspect_span)
    
    # Save the results
    gross_range_results.update(
        {param: param_results}
    )
    

In [25]:
gross_range_results

{'sea_water_electrical_conductivity': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_temperature': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_practical_salinity': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_pressure': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8)}

#### Climatology Test

In [37]:
# Run through all of the parameters which had the QARTOD tests applied by OOINet and
# run the tests locally, saving the results in a dictionary
climatology_results = {}

for param in test_parameters:
    # Get the ooinet name
    ooinet_name = test_parameters.get(param)
    
    # Load the gross_range_qartod_test_values from gitHub
    climatology_qartod_test_values = load_climatology_qartod_test_values(refdes, ooinet_name)
    
    if climatology_qartod_test_values is None:
        climatology_results.update({
            param: "Not implemented."
        })
        continue
    
    # Initialize a climatology config object
    c = ClimatologyConfig()
    
    # Iterate through the pressure ranges
    for p_range in climatology_qartod_test_values.index:
        # Get the pressure range
        pmin, pmax = ast.literal_eval(p_range)

        # Convert the pressure range values into a dictionary
        p_values = climatology_qartod_test_values.loc[p_range].to_dict()

        # Check the pressure values. If [0, 0], then set the range [0, 5000]
        if pmax == 0:
            pmax = 5000

        for tspan in p_values.keys():
            # Get the time span
            tstart, tend = ast.literal_eval(tspan)

            # Get the values associated with the time span
            vmin, vmax = p_values.get(tspan)

            # Add the test to the climatology config object
            c.add(tspan=[tstart, tend],
                  vspan=[vmin, vmax],
                  fspan=[fail_span[0], fail_span[1]],
                  zspan=[pmin, pmax],
                  period="month")

    # Run the climatology test
    param_results = climatology_test(c,
                                     inp=ds[param],
                                     tinp=ds["time"],
                                     zinp=ds["sea_water_pressure"])
    
    # Append the results
    climatology_results.update({
        param: param_results
    })

In [38]:
climatology_results

{'sea_water_electrical_conductivity': 'Not implemented.',
 'sea_water_temperature': masked_array(data=[1, 1, 1, ..., 3, 3, 3],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_practical_salinity': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_pressure': 'Not implemented.'}

### Compare the results
Finally, we want to compare the outputs from the local test with what was returned in the dataset, looking for where they disagree. This will tell us if they are running as expected.

In [55]:
def run_comparison(ds, param, test_results):
    """
    Runs a comparison between the qartod results returned as part of the dataset
    and results calculated locally.
    """
    # Get the local test results and convert to string type for comparison
    local_results = test_results[param].astype(str)
    
    # Run comparison
    not_equal = np.where(ds[f"{param}_qartod_gross_range_test"] != local_results)[0]
    
    if len(not_equal) == 0:
        return None
    else:
        return not_equal

---
## Descriptive Statistics
Next, we want to calculate the statistics of the different QARTOD flags for the different tests that are applied to the different parameters in the dataset. The example ```qartod_results_summary``` below simply counts the total number of different flags (e.g 1, 3, 4) and their relative percentages for each test (gross range, climatology, etc) for each parameter that the tests area applied to. 

In [58]:
def qartod_results_summary(ds, params, tests):
    """
    Calculate the statistics for parameter qartod flags.
    
    This function takes in a list of the parameters and
    the associated QARTOD tests to calculate the number
    of each flag and the percent of the flag.
    
    Parameters
    ----------
    ds: xarray.DataSet
        An xarray dataset which contains the data
    params: list[strings]
        A list of the variables/parameters in the given
        dataset that have been tested with QARTOD
    tests: list[strings]
        A list of the QARTOD test names which to parse
        for the given parameters.
        
    Returns
    -------
    results: dict
        A dictionary which contains the number of each
        QARTOD flag and the percent of the total flags
        for each test applied to each parameter in the
        given dataset.
        
        results = {'parameter':
                        {'test_name':
                            {'total data points': int,
                            'good data points': (int, %),
                            'suspect data points': (int, %),
                            'bad data points': (int, %)}
                            },
                        }
    """
    # Check that the inputs are a list
    if type(params) is not list:
        params = [params]
        
    if type(tests) is not list:
        tests = [tests]
    
    # Initialize the result dictionary and iterate 
    # through the parameters for each test
    results = {}
    for param in params:
        
        # Now iterate through each test
        test_results = {}
        for test in tests:
            
            # First, check that the test was applied
            test_name = f"{param}_qartod_{test}_test"
            if test_name not in ds.variables:
                continue
                
            # Count the total number of values
            n = ds[test_name].count().compute().values
            
            # First calculate the gross range results
            good = np.where(ds[test_name] == "1")[0]

            # Count the number of suspect/interesting
            suspect = np.where(ds[test_name] == "3")[0]
    
            # Count the number of fails
            bad = np.where(ds[test_name] == "4'")[0]
    
            test_results.update({test :{
                     "total": int(n),
                     "good": (len(good), np.round(len(good)/n*100, 2)),
                     "suspect": (len(suspect), np.round(len(suspect)/n*100, 2)),
                     "fail": (len(bad), np.round(len(bad)/n*100, 2))
                    }
                }
            )
        
        # Save the test results for each parameter
        results.update({
            param: test_results
        })
    
    return results

In [59]:
qartod_results = qartod_results_summary(ds, parameters, ["gross_range", "climatology"])
qartod_results

{'sea_water_electrical_conductivity': {'gross_range': {'total': 243242,
   'good': (238606, 98.09),
   'suspect': (4635, 1.91),
   'fail': (0, 0.0)}},
 'sea_water_temperature': {'gross_range': {'total': 243242,
   'good': (238074, 97.88),
   'suspect': (5168, 2.12),
   'fail': (0, 0.0)},
  'climatology': {'total': 243242,
   'good': (212845, 87.5),
   'suspect': (30397, 12.5),
   'fail': (0, 0.0)}},
 'sea_water_practical_salinity': {'gross_range': {'total': 243242,
   'good': (239026, 98.27),
   'suspect': (4216, 1.73),
   'fail': (0, 0.0)},
  'climatology': {'total': 243242,
   'good': (226321, 93.04),
   'suspect': (16921, 6.96),
   'fail': (0, 0.0)}},
 'sea_water_pressure': {'gross_range': {'total': 243242,
   'good': (242583, 99.73),
   'suspect': (656, 0.27),
   'fail': (0, 0.0)}}}