# Accessing OOI Data and Metadata through the M2M API

Most users utilize the [OOI M2M API](https://oceanobservatories.org/ooi-m2m-interface/) to access (request and download) data from the various instruments and sensors deployed across the arrays. Indeed, the code in this repo is largely built around this activity. There are, however, additional API calls available to users that go beyond just requesting data. These can be used to request more information about, for example, the various sites, different deployments, instrument serial numbers, and calibration coefficients. 

All of this information, which is stored in the [OOI Asset Management Database](https://github.com/oceanobservatories/asset-management), is combined to form the metadata (the data about the data) that users can utilize to create data requests, re-process data, or add additional data to the requested data. This notebook will demonstrate how to use these API calls, which have been mapped to functions within this repo, to access and utilize the OOI metadata. To understand the terminalogy used below, please refer to the [README](https://github.com/oceanobservatories/ooi-data-explorations/tree/master/python#m2m-terminology) for this repo and to the [OOI website](https://oceanobservatories.org/research-arrays/). 

The functions were built based off of the [API Cheat Sheet](https://github.com/ooi-data-review/2018-data-workshops/raw/master/handouts/API%20Cheat%20Sheet.pdf) developed for the [2018 Early Career Data Workshops](https://oceanobservatories.org/data-workshops/) hosted at Rutgers University. They can be broken down into six categories:

* Sensor Information: Develop lists about the various sites, nodes and sensors deployed by OOI, and lists of the data delivery methods, data streams, and the parameters included in, and the time ranges covered by each stream.
* Preload Information: Pull information from the backend data base that defines streams and the parameters contained within a stream.
* Deployment Information: Download deployment information (deployment number, dates, sensors deployed, location, etc).
* Calibration Information: Download calibration data for an instrument defined by the site, node and sensor, or for a specific instrument as defined by it's unique ID (derived from the serial number)
* Asset Information: In many ways a duplicate of the Calibration Information functions, with queires and results returned differing slightly. 
* Annotations: Use the site, node and sensor names to obtain annotations (notes and HITL QC assessments of the data) about the instrument of interest.

Several of these functions will return a firehose of information. Users will need to spend time understanding the responses and how they can be used. The examples below will hopefully help with that process.

## Sensor Information

All of the functions used below return lists of the information a user can pull out of the [OOI M2M API](https://oceanobservatories.org/ooi-m2m-interface/) system. The first three functions imported below are hierarchical, in other words they are build on each other (e.g. listing the sites in OOI, then using a specific site to list the available nodes, and then using a specific site and node to list the available sensors). The combination of a specific site, node and sensor is used within OOI to create [the reference designator](https://oceanobservatories.org/knowledgebase/how-to-decipher-a-reference-designator/). The reference designator provides the unique code needed to access a specific instance of an instrument within OOI (e.g. the CTD on the midwater platform of the Oregon Shelf Surface mooring). 

The remaining three methods all provide different pieces of information about that specific instrument, or reference designator: data delivery methods, streams (aka datasets), the available parameters and the time ranges covered by the different streams.

In [2]:
import pprint

# import the functions used to list available information about sites, nodes and sensors ...
from ooi_data_explorations.common import list_sites, list_nodes, list_sensors

# ... and the more detailed information about a specific site, node and sensor (reference designator)
from ooi_data_explorations.common import list_methods, list_streams, list_metadata

In [61]:
# create a list of all the sites in OOI
sites = list_sites()
sites

['CE01ISSM',
 'CE01ISSP',
 'CE02SHBP',
 'CE02SHSM',
 'CE02SHSP',
 'CE04OSBP',
 'CE04OSPD',
 'CE04OSPI',
 'CE04OSPS',
 'CE04OSSM',
 'CE05MOAS',
 'CE06ISSM',
 'CE06ISSP',
 'CE07SHSM',
 'CE07SHSP',
 'CE09OSPM',
 'CE09OSSM',
 'CP01CNPM',
 'CP01CNSM',
 'CP01CNSP',
 'CP02PMCI',
 'CP02PMCO',
 'CP02PMUI',
 'CP02PMUO',
 'CP03ISPM',
 'CP03ISSM',
 'CP03ISSP',
 'CP04OSPM',
 'CP04OSSM',
 'CP05MOAS',
 'GA01SUMO',
 'GA02HYPM',
 'GA03FLMA',
 'GA03FLMB',
 'GA05MOAS',
 'GI01SUMO',
 'GI02HYPM',
 'GI03FLMA',
 'GI03FLMB',
 'GI05MOAS',
 'GP02HYPM',
 'GP03FLMA',
 'GP03FLMB',
 'GP05MOAS',
 'GS01SUMO',
 'GS02HYPM',
 'GS03FLMA',
 'GS03FLMB',
 'GS05MOAS',
 'RS01OSBP',
 'RS01SBPD',
 'RS01SBPS',
 'RS01SHBP',
 'RS01SHDR',
 'RS01SLBS',
 'RS01SUM1',
 'RS01SUM2',
 'RS03ASHS',
 'RS03AXBS',
 'RS03AXPD',
 'RS03AXPS',
 'RS03AXSM',
 'RS03CCAL',
 'RS03ECAL',
 'RS03INT1',
 'RS03INT2',
 'SSRSPACC']

In [62]:
# select CE02SHSM as the site to use for the rest of this example
site = sites[3]

# create a list of nodes available for a particular site
nodes = list_nodes(site)
nodes

['RIC21', 'RID26', 'RID27', 'SBC11', 'SBD11', 'SBD12']

In [63]:
# select RID27 (1 of 2 data loggers on the midwater platform) as the node to use for the rest of this example
node = nodes[2]

# create a list of sensors for a particular node
sensors = list_sensors(site, node)
sensors

['00-DCLENG000',
 '01-OPTAAD000',
 '02-FLORTD000',
 '03-CTDBPC000',
 '04-DOSTAD000']

In [64]:
# select the CTD as the sensor to use for the rest of this example
sensor = sensors[3]

The above steps pedantically construct the reference designator that will be used in a future requests below. There are other sources of this information, such as the [OOI website](https://oceanobservatories.org/), [Data Portal](https://ooinet.oceanobservatories.org/) or [Data Explorer](https://dataexplorer.oceanobservatories.org/) that a user to could work through to construct the reference designator manually. The above methods provide programmitic tools to walk through all of the OOI instruments, of which there are 1398 unique instances (counting engineering and science instruments). It might take awhile to manually search through all those entries. Alternatively, we can programatically find every instance of, for example, the [CTDBP](https://oceanobservatories.org/instrument-class/ctd/) instruments in OOI ...

In [48]:
%%time
ctdbp = []
for site in sites:
    nodes = list_nodes(site)
    for node in nodes:
        sensors  = list_sensors(site, node)
        for sensor in sensors:
            if sensor.find('CTDBP') > 0:
                ctdbp.append('-'.join([site, node, sensor]))

ctdbp

CPU times: user 973 ms, sys: 32.7 ms, total: 1.01 s
Wall time: 7.13 s


['CE01ISSM-MFD37-03-CTDBPC000',
 'CE01ISSM-RID16-03-CTDBPC000',
 'CE01ISSM-SBD17-06-CTDBPC000',
 'CE02SHBP-LJ01D-06-CTDBPN106',
 'CE02SHSM-RID27-03-CTDBPC000',
 'CE04OSBP-LJ01C-06-CTDBPO108',
 'CE04OSSM-RID27-03-CTDBPC000',
 'CE06ISSM-MFD37-03-CTDBPC000',
 'CE06ISSM-RID16-03-CTDBPC000',
 'CE06ISSM-SBD17-06-CTDBPC000',
 'CE07SHSM-MFD37-03-CTDBPC000',
 'CE07SHSM-RID27-03-CTDBPC000',
 'CE09OSSM-MFD37-03-CTDBPE000',
 'CE09OSSM-RID27-03-CTDBPC000',
 'CP01CNSM-MFD37-03-CTDBPD000',
 'CP01CNSM-RID27-03-CTDBPC000',
 'CP03ISSM-MFD37-03-CTDBPD000',
 'CP03ISSM-RID27-03-CTDBPC000',
 'CP04OSSM-MFD37-03-CTDBPE000',
 'CP04OSSM-RID27-03-CTDBPC000',
 'GA01SUMO-RID16-03-CTDBPF000',
 'GA01SUMO-RII11-02-CTDBPP031',
 'GA01SUMO-RII11-02-CTDBPP032',
 'GA01SUMO-RII11-02-CTDBPP033',
 'GI01SUMO-RID16-03-CTDBPF000',
 'GI01SUMO-RII11-02-CTDBPP031',
 'GI01SUMO-RII11-02-CTDBPP032',
 'GI01SUMO-RII11-02-CTDBPP033',
 'GS01SUMO-RID16-03-CTDBPF000',
 'GS01SUMO-RII11-02-CTDBPP031',
 'GS01SUMO-RII11-02-CTDBPP032',
 'GS01SU

... in ~7 seconds. There are 32 of them.

Circling back to the CTD on the Oregon Shelf Surface Mooring midwater platform, or `CE02SHSM-RID27-03-CTDBPC000`, we can use the remaining Sensor Information functions to gather more information about this particular sensor.

In [65]:
# reset the site, node and sensor to CE02SHSM-RID27-03-CTDBPC000
site = 'CE02SHSM'
node = 'RID27'
sensor = '03-CTDBPC000'

# create a list of the data delivery methods available for this sensor
methods = list_methods(site, node, sensor)
methods

['recovered_host', 'recovered_inst', 'telemetered']

In [51]:
# select recovered_host as the data delivery method to use for the rest of this example
method = methods[0]

# create a list of the data streams (aka datasets) available from this sensor and data delivery method
streams = list_streams(site, node, sensor, method)
streams

['ctdbp_cdef_dcl_instrument_recovered']

In [59]:
# select the one stream for this data delivery method to use for the rest of this example
stream = streams[0]

# create lists of dictionaries with the available parameters and time ranges covered by each 
# dataset associated with the sensor
metadata = list_metadata(site, node, sensor)
parameters = metadata.pop('parameters')
time_ranges = metadata['times']

# print the parameter dictionaries for the parameters associated with this stream
for p in parameters:
    if p['stream'] == stream:
        pp.pprint(p)

{'fillValue': '-9999999',
 'particleKey': 'conductivity',
 'pdId': 'PD1',
 'shape': 'SCALAR',
 'stream': 'ctdbp_cdef_dcl_instrument_recovered',
 'type': 'FLOAT',
 'units': 'S m-1',
 'unsigned': False}
{'fillValue': '-9999999',
 'particleKey': 'pressure',
 'pdId': 'PD2',
 'shape': 'SCALAR',
 'stream': 'ctdbp_cdef_dcl_instrument_recovered',
 'type': 'FLOAT',
 'units': 'dbar',
 'unsigned': False}
{'fillValue': '-9999999',
 'particleKey': 'density',
 'pdId': 'PD5',
 'shape': 'FUNCTION',
 'stream': 'ctdbp_cdef_dcl_instrument_recovered',
 'type': 'FLOAT',
 'units': 'kg m-3',
 'unsigned': False}
{'fillValue': '-9999999',
 'particleKey': 'temp',
 'pdId': 'PD6',
 'shape': 'SCALAR',
 'stream': 'ctdbp_cdef_dcl_instrument_recovered',
 'type': 'FLOAT',
 'units': 'ºC',
 'unsigned': False}
{'fillValue': '-9999999',
 'particleKey': 'time',
 'pdId': 'PD7',
 'shape': 'SCALAR',
 'stream': 'ctdbp_cdef_dcl_instrument_recovered',
 'type': 'DOUBLE',
 'units': 'seconds since 1900-01-01',
 'unsigned': False}
{

In [60]:
# print the time range for this stream
for t in time_ranges:
    if t['stream'] == stream:
        pp.pprint(t)

{'beginTime': '2015-04-02T20:45:22.170Z',
 'count': 2006073,
 'endTime': '2021-09-16T19:31:10.431Z',
 'method': 'recovered_host',
 'stream': 'ctdbp_cdef_dcl_instrument_recovered'}


## Preload Information

In [25]:
# Grab the provenance info and the driver associated with this data stream for this deployment
STREAM = 'ctdbp_cdef_dcl_instrument'
OPTIONS = '?beginDT=' + START + '&endDT=' + STOP + '&limit=2&parameters=7&include_provenance=true&strict_range=true'
r = requests.get(BASE_URL + SENSOR_URL + SITE + NODE + SENSOR + METHOD + STREAM + OPTIONS, auth=(auth[0], auth[2]))
data = r.json()
print(json.dumps(data, indent=2))

# note this will fail if the system does not have any data, we want to know that so will need to code for failures

{
  "data": [
    {
      "pk": {
        "node": "RID16",
        "stream": "ctdbp_cdef_dcl_instrument",
        "subsite": "CE01ISSM",
        "deployment": 11,
        "time": 3764719869.483,
        "sensor": "03-CTDBPC000",
        "method": "telemetered"
      },
      "provenance": "ba0d3067-bc6b-48e6-b986-a85eb883477e",
      "time": 3764719869.483
    },
    {
      "pk": {
        "node": "RID16",
        "stream": "ctdbp_cdef_dcl_instrument",
        "subsite": "CE01ISSM",
        "deployment": 11,
        "time": 3771743411.115,
        "sensor": "03-CTDBPC000",
        "method": "telemetered"
      },
      "provenance": "8aa65616-505c-4cd2-b4b0-1afea43dafbf",
      "time": 3771743411.115
    }
  ],
  "provenance": {
    "8aa65616-505c-4cd2-b4b0-1afea43dafbf": {
      "file_name": "/omc_data/whoi/OMC/CE01ISSM/D00011/cg_data/dcl16/ctdbp1/20190710.ctdbp1.log",
      "parser_name": "mi.dataset.driver.ctdbp_cdef.dcl.ctdbp_cdef_dcl_telemetered_driver",
      "parser_version": "

}


In [26]:
# determine the driver used to parse the data and the source of the data
provenance = data['provenance']
driver = [provenance[key]['parser_name'] for key in provenance.keys()][0].encode('ascii', 'ignore')
infile = [provenance[key]['file_name'] for key in provenance.keys()][0].encode('ascii', 'ignore')