# Run QARTOD Test on Locally Saved Data

In this notebook we will load locally saved data from the interim data folder, extract QARTOD test parameters from spreadsheets on the OOI GitHub, run the QARTOD climatology and gross range tests on the imported data, and save the test results to the processed data folder.

More info about QARTOD tests and the ioos_qc module can be found from the [Integrated Ocean Observing System website](https://ioos.noaa.gov/project/qartod/) and [Python module documentation](https://ioos.github.io/ioos_qc/), respectively.

### Import modules for data manipulation

In [1]:
# Import libraries
import os
import requests
import re
import gc
import io
import ast
import pandas as pd
import numpy as np
import xarray as xr
import warnings
warnings.filterwarnings("ignore")
import sys

# Import dask tools and ProgressBar
import dask
from dask.diagnostics import ProgressBar

from qartod_testing import data_processing as dp

### Load locally saved data

In [2]:
# Set reference designator, data stream, and method 

method = "recovered_inst"                           # non-decimated data from recovered instrument
stream = "ctdbp_cdef_instrument_recovered"          # name of data stream
refdes = "CP01CNSM-RID27-03-CTDBPC000"               # build reference designator

# Site, node, and sensor info from deconstructed reference designator
[site, node, sensor] = refdes.split('-', 2)


type = 'prod'                                       # dataset saved from OOINet/"production" or from dev1

In [3]:
# Build filename and path to interim data

def build_data_path(refdes,method,stream,type,folder='interim'):
    # Input: 
    #   refdes: string built from OOI site, node, and sensor for chosen dataset
    #   method: 'recovered_inst', 'recovered_host', or 'telemetered'(?) 
    #   stream: name of data stream 
    #   type: 'prod' or 'dev'
    #   folder: 'interim' (default), 'processed', 'raw', or 'external'
    #
    # Returns:
    #   ds_path: relative path to dataset from notebook folder
    
    filename = '-'.join((type,refdes,method,stream))+'.nc'              # build filename from dataset type and source

    data_folder = os.path.relpath('../data')                            # path to data folder from notebook folder

    ds_path=os.path.join(data_folder,folder,filename)                   # build full relative path 
    
    return ds_path

In [4]:
ds_path = build_data_path(refdes,method,stream,'prod')
ds_path

'..\\data\\interim\\prod-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered.nc'

In [5]:
# Load data from .nc files

ds = xr.open_dataset(ds_path)
ds

### Identify Test Parameters

Next, identify which parameters in the dataset have QARTOD applied to them. Sometimes the variable name in the dataset is different that the key that is used by OOINet to build the datasets. For that we can check the attributes of the variable for the "alternate_parameter_name"!

In [6]:
# Create a dictionary of key-value pairs of dataset variable name:alternate parameter name
test_parameters={}
for var in ds.variables:
    if "qartod_results" in var:
        # Get the parameter name
        param = var.split("_qartod")[0]
        
        # Check if the parameter has an alternative ooinet_name
        if "alternate_parameter_name" in ds[param].attrs:
            ooinet_name = ds[param].attrs["alternate_parameter_name"]
        else:
            ooinet_name = param
        
        # Save the results in a dictionary
        test_parameters.update({
            param: ooinet_name
        })
# Print out the results
test_parameters

{'sea_water_electrical_conductivity': 'ctdbp_seawater_conductivity',
 'sea_water_temperature': 'ctdbp_seawater_temperature',
 'sea_water_practical_salinity': 'practical_salinity',
 'sea_water_pressure': 'ctdbp_seawater_pressure'}

### Collect test QARTOD lookup value tables from GitHub
We can grab the QARTOD tables with the test values straight from GitHub, which ensures we are using the same input and threshold values as OOINet. However, the QARTOD tables utilize the ```ooinet_parameter_name``` instead of the dataset variable name. Thus, when loading the tables we need to make sure we are requesting the correct parameter name.

Note to Self: This section should probably be deleted altogether since importing lookup table values is done within the QARTOD test functions.

### Run QARTOD tests locally
Next, we run the gross range test locally to get local results that can be compared with the output from the tests. This is done using the ```ioos_qc``` QARTOD package in conjunction with the ```qartod_test_values``` tables.

#### Gross Range Test

In [7]:
# Check that gross_range_results contains the tests results
gross_range_results = dp.qartod_gross_range_test(refdes, stream, test_parameters, ds)
gross_range_results

{'sea_water_electrical_conductivity': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_temperature': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_practical_salinity': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_pressure': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8)}

#### Climatology Test

In [8]:
climatology_results = dp.qartod_climatology_test(refdes, stream, test_parameters, ds)
climatology_results

{'sea_water_electrical_conductivity': 'Not implemented.',
 'sea_water_temperature': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_practical_salinity': masked_array(data=[1, 1, 1, ..., 1, 1, 1],
              mask=False,
        fill_value=999999,
             dtype=uint8),
 'sea_water_pressure': 'Not implemented.'}

### Save test results to processed data folder

In [10]:
# Convert gross range results dictionary to a dataset with time coordinate
gr_ds = dp.timeseries_dict_to_xarray(gross_range_results, ds)
gr_ds

In [17]:
gr_results_path = build_data_path(refdes,method,stream,(type+'-gr-result'),folder='processed') # Build path with filename to folder for saved results

gr_ds.to_netcdf(gr_results_path)                                             # write netCDF file with results to processed data folder

In [9]:
# Convert climatology results dictionary to a dataset with time coordinate
climatology_ds = dp.timeseries_dict_to_xarray(climatology_results, ds)
climatology_ds

In [19]:
clim_results_path = build_data_path(refdes,method,stream,(type+'-clim-result'),folder='processed') # Build path with filename to folder for saved results

climatology_ds.to_netcdf(clim_results_path)                                                   # write netCDF file with results to processed data folder