# Introduction to the OOI M2M Interface (aka the API)

In this example, we will demonstrate how to access and use the OOI M2M interface to retrieve information about the instruments in the OOI database as well as how to retrieve data from those instruments.

For most of this example, I'm going to use my favorite instrument, the [30m CTD on Global Papa Flanking Mooring B](http://ooi.visualocean.net/instruments/view/GP03FLMB-RIM01-02-CTDMOG060). 

# Getting Started with the API

In order to use the API, you will need to have an OOI Net account, that is, an account on the [OOI Data Portal](https://ooinet.oceanobservatories.org).  Once you have an account, you will need your API Username and Token.  To get your API information, follow the steps below, or check out this [video](https://www.youtube.com/watch?time_continue=150&v=9YSIrDu4l24).

* If needed, create a new account on ooinet.oceanobservatories.org, using either your email address or use the CILogon button with your academic or Google account.
* Log in
* Click on your email address on the right side of the navigation bar to open up the pull-down menu.
* Click on the "User Profile" in the drop down.
* Copy and save the following data from your user profile: API Username and API Token. The API Username is similar to “OOIAPI-QTULEV4STGAS35”. The API Token is similar to “YYP2X2W3SOW”.

For this example, we will include the API login information as variables so we can use them later.


In [None]:
USERNAME = 'YOUR API USERNAME'
TOKEN =  'YOUR API TOKEN'

We will also add a few Python libraries that we will need in our environment

In [None]:
# First, we need to add some more Python libraries
import requests
import datetime

To access the OOI API, we will use the [python requests library](http://docs.python-requests.org/en/master/), which is very easy to use.  

> `r = requests.get(url, params=parameters, auth=('user', 'pass'))`

All we have to do is specify the URL we want to access, along with our login information.  In some cases, we will also include an additional set of optional parameters using the "params" option.  These "GET" parameters are typically found at the end of a URL following the format:
> `http://example.com/page.htm?var1=a&var2=b`

Thanks to the requests library, we don't have to worry about getting all the question marks and ampersands in the right places, we can just specify an array of options.


To see how this really works in practice, let's dive into some examples.


# Sensor Information

Perhaps the most useful API is the **Sensor Inventory**.  This endpoint essentially works like a tree, allowing you to navigate through the OOI instrument hierarchy.  Here is the starting point.

In [None]:
SENSOR_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'

Let's go ahead and take a look at what we get back from this URL.  Thanks to the requests library, we have a number of different ways to deal with the output.

In [None]:
r = requests.get(SENSOR_BASE_URL, auth=(USERNAME, TOKEN))
print( r.status_code )
print( r.headers )
print( r.encoding )
print( r.text )
print( r.json() )

As we can see, the returned status code is 200, which means our request was OK and and information was returned.  

The headers and encoding tell us a little bit more about the format of the information sent back.  Because this is a simple text response, there's nothing profound here, but you might find these handy in working with other APIs.

We can retrieve the actual data from the response using **r.text**. But because this dataset is actually in JSON format, we can use the **r.json()** method to automatically turn the returned data into a structured object we can use in Python.  

For example, let's save the output as a variable and then loop through it an print out each entry.

In [None]:
sites = r.json()
for site in sites:
  print( site )

Now this may seem like gibberish, but it turns out that this is a list of all of the **Sites** that are in the OOI system.  A [similar list](http://oceanobservatories.org/site-list/) can be found on the OOI website.

From this point on, we can simply add one of the returned items to the previous URL in order to drill down into the sensor inventory.  For this example, let's investigate [**GP03FLMB-RIM01-02-CTDMOG060**](http://ooi.visualocean.net/instruments/view/GP03FLMB-RIM01-02-CTDMOG060).

To make this easy, let's setup a quick function to make the request and reformat the results.

In [None]:
# A quick function to make an API request and print the results
def get_and_print_api(url):
  r = requests.get(url, auth=(USERNAME, TOKEN))
  data = r.json()
  for d in data:
    print( d )

### List of Nodes for a Site

In [None]:
get_and_print_api(SENSOR_BASE_URL+'/GP03FLMB')

### List of Sensors (Instruments) for a Node

In [None]:
get_and_print_api(SENSOR_BASE_URL+'/GP03FLMB/RIM01')

### List of Methods for a Sensor

In [None]:
get_and_print_api(SENSOR_BASE_URL+'/GP03FLMB/RIM01/02-CTDMOG060')

### List of Data Streams for a Method

In [None]:
get_and_print_api(SENSOR_BASE_URL+'/GP03FLMB/RIM01/02-CTDMOG060/telemetered')

At this point, we have all the information we need to make a data request.  All we have to do is go down one additional level by adding the stream name to the request we made above.  **However, don't be tempted to do this.**  If you do, you will accidentally make an "asynchronous" request (aka a request for downloadable netcdf files) for all of the data in the system for this data stream (unless, of course, this is what you want to do).

Instead, of doing that, let's dive into how to make more precise data requests.

# Synchronous Data Requests 
* Results returned as JSON

To make a synchronous data request, we basically construct a URL using all the pieces we discovered above, like the following:
> /sensor/inv/{subsite}/{node}/{sensor}/{method}/{stream}

In order to make the code clear, we're going to setup several variables and then use the **join()** function to concatenate all of the variables together with slashes.

We are also going to use a **params** array to specify a start and ending date/time for our request, as well as a limit parameter to minimize the amount of data points we get back.

Keep in mind that synchronous requests are **limited to 20,000 data points** within the time range you specify.  If the dataset you are requesting has more data points available, the data will be decimated (that is, roughly downsampled in a semi-random way).  To request all of the available data, you should make an asynchronous request (see below).  Alternatively, you could make a sequence of synchronous requests for a shorter time range, and then aggregate the results.  For a real-time plotting script, this actually makes more sense (but that's another example).

In [None]:
# Instrument Information
site = 'GP03FLMB'
node = 'RIM01'
instrument = '02-CTDMOG060'
method = 'telemetered'
stream = 'ctdmo_ghqr_sio_mule_instrument'

data_request_url ='/'.join((SENSOR_BASE_URL,site,node,instrument,method,stream))

params = {
  'beginDT':'2016-10-01T00:00:00.000Z',
  'endDT':'2016-11-01T00:00:00.000Z',
  'limit':1000,   
}

Now that everything is setup, let's make the request. Here we will return the length of the returned data array.

In [None]:
r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
data = r.json()
print( len(data) )


Let's take a look at the first item in the list.  From this, we can see that each item is a dictionary of every variable this instrument measures.

In [None]:
data[0]

### What it looks like when there isn't any data?
Now, if we specified an time range that didn't have any data, this is what we would get.

In [None]:
params = {
  'beginDT':'2017-10-24T00:00:00.000Z',
  'endDT':'2017-10-25T00:00:00.000Z',
  'limit':1000,   
}

r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
data = r.json()
print( len(data) )
data

Note that in this case, the returned data array is only 2 items long, and the returned status_code is **404**, instead of **200**.  If you need to verify whether data was returned, you can use this info to add error checking to your script.

### How do you limit the number of parameters returned?
To limit the amount of data in the returned JSON array, we can specify just the parameters we want.  However, to do this, we need to figure out the parameter IDs (see below).

In [None]:
params = {
  'beginDT':'2016-10-01T00:00:00.000Z',
  'endDT':'2016-11-01T00:00:00.000Z',
  'limit':1000,
  'parameters':'7,13' #time=7, salinity=13, temperature=2927, pressure=2926
}

r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
data = r.json()
print( len(data) )
data[1]

# Asynchnorous Data Requests 
* Results returned as downloadable NetCDF files

Synchronous requests are great when you just want to grab a little bit of data quickly. Thus, it is very useful for applications like interactive graphs on a web site.

However, if you want a larger and more complete dataset, you will need to create an Asynchronous request, which take longer to process, sometimes up to an hour or more depending on how much data you are requesting and what else the system is doing.  The end result will be a web accessible directory of NetCDF data files.

This is essentially the same functionality as requesting a download from the OOI Data Portal, but with the API you can easily create one or more requests in an automated way.

The request format is essentially the same as above, except we can drop the **limit** option from the params array.  We could also omit the start and end dates, if we want to grab all available data.

By default, asynchronous requests will return NetCDF files, but you could also specify csv or json, using the **format** option.  Optionally, you can also specify **include_provenance** and **include_annotations** which will include separate text files in the output directory with that information.

In [None]:
# Instrument Information
site = 'GP03FLMB'
node = 'RIM01'
instrument = '02-CTDMOG060'
method = 'telemetered'
stream = 'ctdmo_ghqr_sio_mule_instrument'

# Create the request URL
data_request_url ='/'.join((SENSOR_BASE_URL,site,node,instrument,method,stream))

# All of the following are optional
params = {
  'beginDT':'2016-10-01T00:00:00.000Z',
  'endDT':'2016-11-01T00:00:00.000Z',
  'format':'application/netcdf',
  'include_provenance':'true',
  'include_annotations':'true'
}


Now let's send the request.

In [None]:
# r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
# data = r.json()

While the Synchronous request returned an array of actual data, this time we will just get some URLs and some other metadata about our request.

In [None]:
data

The first URL in the **allURLs** key points to the THREDDS server, which allows for programmatic data access without downloading the entire file.

In [None]:
print(data['allURLs'][0])

The second URL in the **allURLs** key provides a direct link to a web server which you can use to quickly download files if you don't want to go through THREDDS.

In [None]:
print(data['allURLs'][1])

### How can you check when a request is complete?

We can use the second URL to check if a status.txt file has been written to the location. If true, then the request has completed and all data have been delivered to the THREDDS server.

The following for loop will poll the location for the status.txt files 1000 times, once every 0.5 seconds. Therefore, it will quit after about 8.3 minutes. If you are requesting a very large and dense dataset (3 years of BOTPT data collected at 20 Hz, for example), it may take a bit longer to complete the request, so you will want to bump up the retry range or lower the sleep interval.

In [None]:
%%time

check_complete = data['allURLs'][1] + '/status.txt'
for i in range(1000): 
    r = requests.get(check_complete)
    if r.status_code == requests.codes.ok:
        print('request completed')
        break
    else:
        time.sleep(.5)

# Sensor Metadata

Getting data is nice, but one of the other big advantages to using the API is that you can query the system about all kinds of metadata.  To start, let's see what sensor metadata is available.

This query looks a lot like the above data requests, except we're explicitly asking for just the metadata.

The following metadata requests are possible:

```
/sensor/inv/{subsite}/{node}/{sensor}/metadata
/sensor/inv/CE01ISSM/MFC31/00-CPMENG000/metadata/times
/sensor/inv/CE01ISSM/MFC31/00-CPMENG000/metadata/parameters

```
The first request grabs both times and parameters, the others specify just a single request.


In [None]:
# Instrument Information
site = 'GP03FLMB'
node = 'RIM01'
instrument = '02-CTDMOG060'
method = 'telemetered'
stream = 'ctdmo_ghqr_sio_mule_instrument'

# Create the request URL
data_request_url ='/'.join((SENSOR_BASE_URL,site,node,instrument,'metadata'))


In [None]:
r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
data = r.json()

In [None]:
# Print out the response using PrettyPrint
# import pprint
# pp = pprint.PrettyPrinter(indent=2)
# pp.pprint(data)

import json

# Print out the response using json.dumps()
# print json.dumps(data, indent=4);

# Print out just one element
print( json.dumps(data['parameters'][0], indent=4) );
print( json.dumps(data['times'][0], indent=4) );

The response includes a list of every parameter for every data stream produced by the instrument.  There may be several entries with the same parameter, if multiple streams produce the same one.

The times array, includes the start and end times of the entire data record for each stream.  There may still be gaps, but this is useful for knowing how up to date a data stream is (i.e. is it close to real time, or how recent was the last recovered data available?)

Let's put all of the parameters into a single table, and then filter the results to just the telemetered stream we were interested in above.

In [None]:
import pandas as pd

# Convert the parameters array into a DataFrame
parameters = pd.DataFrame(data['parameters'],
                          columns=['particleKey','pdId','units','stream'])

# Filter out the parameters for a single stream
parameters = parameters[parameters['stream']=='ctdmo_ghqr_sio_mule_instrument']

# Print out selected columns
print( parameters[['particleKey','pdId','units']] )


# Parameter & Stream Information

We could have gotten a similar list by using a different API endpoint, to retrieve information on a specific stream.

In [None]:
# Retrieve information for a Preload Stream given its name. 
url = 'https://ooinet.oceanobservatories.org/api/m2m/12575/stream/byname/ctdmo_ghqr_sio_mule_instrument'
r = requests.get(url, auth=(USERNAME, TOKEN))
data = r.json()

# Convert to Data Frame
parameters = pd.DataFrame(data['parameters'])
parameters[['id','name','data_product_type','display_name','unit']]


And to get information on a specific parameter...

In [None]:
# Retrieve information for a Preload Parameter given its identifier.
url = 'https://ooinet.oceanobservatories.org/api/m2m/12575/parameter/13'
r = requests.get(url, auth=(USERNAME, TOKEN))
data = r.json()

print( json.dumps(data, indent=4) )

# Deployments

To find out how many times an instrument has been deployed (and for which there is data in the system), we can use the Deployments API endpoint.

In [None]:
# Retrieve vocabulary information for a given instrument
DEPLOYMENT_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12587/events/deployment/inv'
request_url = DEPLOYMENT_BASE_URL+'/GP03FLMB/RIM01/02-CTDMOG060'
r = requests.get(request_url, auth=(USERNAME, TOKEN))
data = r.json()

data

We can then add a specific deployment number, to retrieve detailed information on that deployment.  Alternatively, you can specify "-1" to get information about all deployments.

In [None]:
# Retrieve vocabulary information for a given instrument
request_url = DEPLOYMENT_BASE_URL+'/GP03FLMB/RIM01/02-CTDMOG060/5'
r = requests.get(request_url, auth=(USERNAME, TOKEN))
data = r.json()
# print( json.dumps(data, indent=4) )

Note that due to a quirk in the API, it typically returns all calibration values for an instrument, not all of which are relevant for the specific deployment you may be interested in.

Let's pull out the unique sensor IDs for this deployment.

In [None]:
print( data[0]['sensor']['uid'] )
print( data[0]['sensor']['assetId'] )

# Asset Information

OOI Reference Designators denote a specific measurement location in the OOI, for example the 30m CTD at the Papa Flanking Mooring B location.  But over time, measurements made at this "instrument location" are actually made by multiple instruments.  Each of those would have a specific Make, Model, and unique Serial Number.  We call these "assets" in the OOI system.

You can use the Asset API endpoint to query information about a specific asset, once you kow either it's ID or UID.  In the above example, we extracted both to find out the unique sensor used during a specific deployment.

In [None]:
# Request Asset Information by UID
ASSET_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12587/asset'
request_url = ASSET_BASE_URL+'?uid=CGINS-CTDMOG-13422'
r = requests.get(request_url, auth=(USERNAME, TOKEN))
data = r.json()
# print( json.dumps(data, indent=4) )


This returns essentially the same information that was embedded in the deployment information request above, including sensor calibration information.

We can also find out where this particular asset was deployed over time.

In [None]:
# Get deployment digests by the unique asset identifier (UID).
ASSET_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12587/asset'
request_url = ASSET_BASE_URL+'/deployments/CGINS-CTDMOG-13422'
r = requests.get(request_url, auth=(USERNAME, TOKEN))
data = r.json()

# print( json.dumps(data, indent=4) )
pd.DataFrame(data)

In this case, we see that this particular instrument has been deployed twice.  Once at Global Irminger (deployment 2 of that site) and a second time at Global Papa (deployment 5).

# Calibration Information

As we've seen above, a number of API endpoints provide calibration information, including the Sensor, Deployments and Asset ones.  We could also use the following Asset endpoint to search for sensor calibrations for a particular reference designator.

In [None]:
# Simple time conversion function
def convert_time(ms):
  if ms != None:
    return datetime.datetime.utcfromtimestamp(ms/1000)
  else:
    return None

In [None]:
# Get deployment digests by the unique asset identifier (UID).
ASSET_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12587/asset'
request_url = ASSET_BASE_URL+'/cal?refdes=GP03FLMB-RIM01-02-CTDMOG060'

r = requests.get(request_url, auth=(USERNAME, TOKEN))
data = r.json()

# print( json.dumps(data, indent=4) )
# data[4]
pd.DataFrame(data)

In [None]:
# Reformat the data into a pretty table
df = pd.DataFrame() # Setup empty array
for d in data:
  for dd in d['sensor']['calibration']:
    for ddd in dd['calData']:
      df = df.append({
        'value': ddd['value'],
        'start': convert_time(ddd['eventStartTime']),
        'stop': convert_time(ddd['eventStopTime']),
        'name': ddd['eventName'],
        'assetUid': ddd['assetUid'],
        }, ignore_index=True)
    
df = df.sort_values(by=['name'])
df


Unfortunately, it turns out that most of these endpoints are not ideal, because they return all of the available calibrations for an asset, and not just those used for the deployment we're interested in.  In the example above, there are only 5 deployments, but we found over a dozen calibration sets.

The system uses the calibration values that are closest in time but before the deployment start date.  So really, for any given deployment, all we need to do is find the calibrations for that asset that occurred just prior to the deployment (though, this could be 6 months or more earlier).

Note also, that calibrations do not have an end date.  It is assumed they are valid until the next calibration for the specific asset. 

As an alternative, we can use the following request to retrieve the calibration values that were used by the system to process data on any given day (which should theoretically be the same across a given deployment).  If you keep the date range within one deployment, you should only get one set of calibration values.

In [None]:
# Setup the API request url
request_url = 'https://ooinet.oceanobservatories.org/api/m2m/12587/asset/cal'

params = {
  'beginDT':'2016-10-01T00:00:00.000Z',
  'endDT':'2016-11-01T00:00:00.000Z',
  'refdes':'GP03FLMB-RIM01-02-CTDMOG060'
}

# Grab the information from the server
r = requests.get(request_url, params=params, auth=(USERNAME, TOKEN))
data = r.json()

# Reformat the data into a pretty table
df = pd.DataFrame() # Setup empty array
for d in data[0]['sensor']['calibration']:
  for dd in d['calData']:
    df = df.append({
      'value': dd['value'],
      'start': convert_time(dd['eventStartTime']),
      'stop': convert_time(dd['eventStopTime']),
      'name': dd['eventName'],
      'assetUid': dd['assetUid'],
      }, ignore_index=True)
    
df = df.sort_values(by=['start','name'])
df

# Instrument Vocabulary
We can use the Vocabulary API endpoint to convert a given reference designator for an instrument into its descriptive names.

In [None]:
# Retrieve vocabulary information for a given instrument
VOCAB_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12586/vocab/inv'
request_url = VOCAB_BASE_URL+'/GP03FLMB/RIM01/02-CTDMOG060'
r = requests.get(request_url, auth=(USERNAME, TOKEN))
data = r.json()

data

# Annotations
Finally, let's grab all of the annotations for this instrument.  The annotations endpoint requires all of the options to be specified in the params array.

In [None]:
ANNO_API = 'https://ooinet.oceanobservatories.org/api/m2m/12580/anno/find'
params = {
  'beginDT': int(datetime.date(2010,1,1).strftime('%s'))*1000,
  'endDT': int(datetime.date(2018,1,1).strftime('%s'))*1000,
  'refdes': 'GP03FLMB-RIM01-02-CTDMOG060',
#   'method': 'telemetered',
#   'stream': 'ctdmo_ghqr_sio_mule_instrument',
}

r = requests.get(ANNO_API, params=params, auth=(USERNAME, TOKEN))
data = r.json()

df = pd.DataFrame() # Setup empty array
for d in data:
  df = df.append({
    'annotation': d['annotation'],
    'start': convert_time(d['beginDT']),
    'stop': convert_time(d['endDT']),
    'site': d['subsite'],
    'node': d['node'],
    'sensor': d['sensor'],
    'id': d['id']
  }, ignore_index=True)

pd.set_option('display.max_colwidth', -1) # Show the full annotation text
df = df.sort_values(by=['start'])
df

In [None]:
pd.set_option('display.max_colwidth', 30) # Show the full annotation text


# Resources
For more information on using the API, you can check out:

* [OOI Machine 2 Machine Introduction](http://oceanobservatories.org/ooi-m2m-interface/)
* [OOI Machine to Machine (M2M) Basics](https://github.com/ooi-data-review/m2m_demo/blob/master/notebooks/basic_examples.ipynb)
* [Accessing data as JSON via OOI RESTful API](https://github.com/ooi-data-review/m2m_demo/blob/master/notebooks/json_data_request.ipynb)
* [Accessing data as NetCDF via OOI RESTful API](https://github.com/ooi-data-review/m2m_demo/blob/master/notebooks/netcdf_data_request.ipynb)
* [OOI Machine to Machine (M2M) Realtime Requests](https://github.com/ooi-data-review/m2m_demo/blob/master/notebooks/realtime_requests.ipynb)
