## Parsing and Plotting QC Algorithm Results and Annotations

In this example we will learn how to programatically download OOI JSON data and work with the QC algorithm results as well as annotations. We will use data from the Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen for this example, but the mechanics apply to all datasets that are processed through the OOI Cyberinfrastructure (CI) system. You wil learn:

* how to find the data you are looking for
* how to use the machine to machine API to request JSON data
* how to explore and interactively plot data using bokeh
* how to parse and visualize QC results
* how to parse and visualize Annotations

For the instrument in this example, you will need the Reference Designator, Stream and Data Delivery Method to make the request to the M2M API. More information about the instrument can be found here:
http://ooi.visualocean.net/instruments/view/GI01SUMO-RID16-06-DOSTAD000

![GI01SUMO-RID16-06-DOSTAD000](https://github.com/ooi-data-review/ooi_datateam_notebooks/raw/master/images/GI01SUMO-RID16-06-DOSTAD000.png)

In [None]:
import requests
import datetime

Before we get started, login in at https://ooinet.oceanobservatories.org/ and obtain your <b>API username and API token</b> under your profile (top right corner), or use the credential provided below.

In [None]:
username = 'OOIAPI-D8S960UXPK4K03'
token = 'IXL48EQ2XY'

Specify your inputs.

In [None]:
subsite = 'GI01SUMO'
node = 'RID16'
sensor = '06-DOSTAD000'
method = 'recovered_host'
stream = 'dosta_abcdjm_dcl_instrument_recovered'
beginDT = '2015-09-01T01:01:01.900Z'
endDT = '2016-03-01T01:01:01.900Z'

Build the GET request URL and send the request to the M2M API endpoint.

In [None]:
base_url = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'

data_request_url ='/'.join((base_url,subsite,node,sensor,method,stream))
params = {
    'beginDT':beginDT,
    'endDT':endDT,
    'limit':1000,   
}

r = requests.get(data_request_url, params=params,auth=(username, token))
data = r.json()

How many data points were returned?

In [None]:
len(data)

Examine the content of the first data point.

In [None]:
data[0]

Convert the json response to a pandas dataframe and convert the time stamps.

In [None]:
import pandas as pd
import numpy as np
import json

In [None]:
df = pd.DataFrame.from_records(map(json.loads, map(json.dumps,data)))
df['time'] = pd.to_datetime(df['time'], unit='s', origin=pd.Timestamp('1900-01-01'))

Extract the dissolved oxygen parameter for plotting.

In [None]:
time = list(df['time'].values)
oxygen = list(df['dissolved_oxygen'].values)

Plot the data.

In [None]:
import os
from bokeh.plotting import figure, output_file, reset_output, show, ColumnDataSource, save
from bokeh.models import BoxAnnotation
from bokeh.io import output_notebook

In [None]:
!pip install bokeh

In [None]:
p = figure(width=800,
           height=400,
           title='Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen',
           x_axis_label='Time (GMT)',
           y_axis_label='Oxygen umol kg-1',
           x_axis_type='datetime')

p.circle(time, oxygen, fill_color='white', fill_alpha=0.2, size=4)
output_notebook()
show(p)

Extract only the qc results.

In [None]:
df = df[['time', 'dissolved_oxygen','dissolved_oxygen_qc_results','dissolved_oxygen_qc_executed']]
df.head()

The QC flags for all tests are OR'd together to produce a single value for each data point. So, given a qc_executed value of 29 we can see which tests were run by reversing the process:  

QC table
```
Test name              Bit position
                         15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
global_range_test         0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1
dataqc_localrangetest     0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
dataqc_spiketest          0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
dataqc_polytrendtest      0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0
dataqc_stuckvaluetest     0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
dataqc_gradienttest       0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
dataqc_propagateflags     0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
```



In [None]:
np.unpackbits(np.array(29).astype('uint8'))

If you compare this result to the table above you can see that the following tests were executed:

```
global_range_test
dataqc_spiketest
dataqc_polytrendtest
dataqc_stuckvaluetest
```

We can craft a function to create new booleans variables for each test run containing the pass/fail results from that test:

In [None]:
def parse_qc(df):
    vars = [x.split('_qc_results')[0] for x in df.columns if 'qc_results' in x]
    results = [x+'_qc_results' for x in vars]
    executed = [x+'_qc_executed' for x in vars]
    key_list = vars + results + executed

    for var in vars:
        qc_result = var + '_qc_results'
        qc_executed = var + '_qc_executed'
        names = {
            0: var + '_global_range_test',
            1: var + '_dataqc_localrangetest',
            2: var + '_dataqc_spiketest',
            3: var + '_dataqc_polytrendtest',
            4: var + '_dataqc_stuckvaluetest',
            5: var + '_dataqc_gradienttest',
            7: var + '_dataqc_propagateflags',
        }
        # Just in case a different set of tests were run on some datapoint. *This should never happen*
        executed = np.bitwise_or.reduce(df[qc_executed].values)
        executed_bits = np.unpackbits(executed.astype('uint8'))
        for index, value in enumerate(executed_bits[::-1]):
            if value:
                name = names.get(index)
                mask = 2 ** index
                values = (df[qc_result].values & mask) > 0
                df[name] = values
        df.drop([qc_executed, qc_result], axis=1, inplace=True)
    return df

Run the function. The result gives us the QC algorithm result for every data point. True = test passed.

In [None]:
df_qc = parse_qc(df)
df_qc.head()

Select data points that failed the global range test, for example.

In [None]:
df_qc[df_qc['dissolved_oxygen_global_range_test'] == False]

Plot points that failed the test in red.

In [None]:
colormap = {False: 'red', True: 'green'}
colors = [colormap[x] for x in df_qc['dissolved_oxygen_global_range_test']]

In [None]:
p = figure(width=800,
           height=400,
           title='Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen',
           x_axis_label='Time (GMT)',
           y_axis_label='Oxygen umol kg-1',
           x_axis_type='datetime')

p.circle(time, oxygen, color=colors, fill_alpha=0.2, size=4)
output_notebook()
show(p)

Import annotations for 'GI01SUMO-RID16-06-DOSTAD000'. See the request_annotations.ipynb notebook for more details.

In [None]:
!pip install netCDF4

In [None]:
import netCDF4 as nc

In [None]:
beginDT = int(nc.date2num(datetime.datetime.strptime("2012-01-01T01:00:01Z",'%Y-%m-%dT%H:%M:%SZ'),'seconds since 1970-01-01')*1000)
endDT = int(nc.date2num(datetime.datetime.utcnow(),'seconds since 1970-01-01')*1000)

anno_base_url = 'https://ooinet.oceanobservatories.org/api/m2m/12580/anno/find?' # base url and port for annotations

params = { # define parameters
    'beginDT':beginDT,
    'endDT':endDT,
    'refdes':'GI01SUMO-RID16-06-DOSTAD000'
}

r = requests.get(anno_base_url, params=params,auth=(username, token)) # send data request

anno_data = pd.read_json(json.dumps(r.json())) # convert json response to pandas dataframe

Set up a function to convert the annotation milliseconds since 1970, which is a different time schema than is used for data, which is in seconds since 1900.

In [None]:
def convert_time(time_stamp):
    try: 
        time_stamp = (int(time_stamp)) / 1000
        time_stamp = nc.num2date(time_stamp,'seconds since 1970-01-01')
    except:
        pass
    return time_stamp

# convert time stamps
anno_data['beginDT'] = anno_data['beginDT'].apply(convert_time)
anno_data['endDT'] = anno_data['endDT'].apply(convert_time)

Print the annotations.

In [None]:
for i in range(len((anno_data['annotation'].values))):
    print(i)
    print(anno_data['annotation'].iloc[i])
    print('start time:', anno_data['beginDT'].iloc[i])
    print('end time:', anno_data['endDT'].iloc[i],'\n')

Select information from the fourth annotation and create the final plot.

In [None]:
anno_start_time = anno_data['beginDT'].iloc[4]
anno = anno_data['annotation'].iloc[4]

In [None]:
p = figure(width=800,
           height=400,
           title='Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen',
           x_axis_label='Time (GMT)',
           y_axis_label='Oxygen umol kg-1',
           x_axis_type='datetime')

p.line([anno_start_time,time[-1]], [(min(oxygen)-10),(min(oxygen)-10)], line_width=10, legend='Annotation: '+anno)
p.circle(time, oxygen, color=colors, fill_alpha=0.2, size=4)
p.legend.location = "top_left"

output_notebook()
show(p)


Optionally, you can save the plot as an html file for sharing.

In [None]:
output_file(os.getcwd())
save(p, filename='plot.html')

This example was developed by Friedrich Knuth.