# Example 1 - Requesting OOI Data in NetCDF Format
*Written by Sage Lichtenwalner, Rutgers University, June 14, 2018*

*Verified by Leila Belabbassi (to work with Pangeo), July 9, 2018*

In this example, we will demonstrate how to use the OOI M2M interface (also known as the OOI API) to:
* retrieve information about the instruments in the OOI database and
* request downloadable files for an instrument.

For this example, I'm going to use my favorite instrument, the [30m CTD on Global Papa Flanking Mooring B](http://ooi.visualocean.net/instruments/view/GP03FLMB-RIM01-02-CTDMOG060). Check out the link to the Data Team's Portal to find out more information about this instrument.

# Getting Started with the API

In order to use the API, you will need to have an OOI Net account, that is, an account on the [OOI Data Portal](https://ooinet.oceanobservatories.org).  Once you have an account, you will need your API Username and Token.  To get your API information, follow the steps below, or check out this [video](https://www.youtube.com/watch?time_continue=150&v=9YSIrDu4l24).

1. If needed, create a new account on ooinet.oceanobservatories.org.
  * You can use either your email address or the CILogon option with your university or Google account.
2. Log in
3. Click on your email address on the right side of the navigation bar to open up the pull-down menu.
4. Click on the "User Profile" in the drop down.
5. Copy and save the following data from your user profile: API Username and API Token. 
  * The API Username is similar to “OOIAPI-QTULEV4STGAS35”. 
  * The API Token is similar to “YYP2X2W3SOW”.

For this example, we will include the API login information as variables so we can use them later.


In [None]:
USERNAME = 'YOUR API USERNAME'
TOKEN =  'YOUR API TOKEN'

We will also add a few Python libraries that we will need in our environment

In [None]:
# First, we need to add some more Python libraries
import requests
import datetime
import time

To access the OOI API, we will use the [python requests library](http://docs.python-requests.org/en/master/), which is very easy to use.  Here is the basic command format.

> `r = requests.get(url, params=parameters, auth=('user', 'pass'))`

All we have to do is specify the URL we want to access, along with our login information.  

In some cases, we will also include an additional set of optional parameters using the "params" option.  These "GET" parameters are typically found at the end of a URL.  
> `http://example.com/page.htm?var1=a&var2=b`

In this example, var1 and var2 are GET parameters.  Thanks to the requests library, we don't have to worry about getting all the question marks and ampersands in the right places, we can just specify these as part of an array and use the parameters option.

To see how this really works in practice, let's dive into an example.


## How to find Instrument Information needed to use the API?
Note, in order to use the OOI API, you will need to know the various OOI codes or IDs to make a request.  Many of these are available in the [OOI Data Portal](https://ooinet.oceanobservatories.org), but you may find the [Data Team portal](http://ooi.visualocean.net) helpful.

For the instrument in this example, you will need the following to make the request to the M2M API.
* the 3 parts of the Reference Designator
* the stream name, and
* the data delivery method

More information about this instrument can be found here:
http://ooi.visualocean.net/instruments/view/GP03FLMB-RIM01-02-CTDMOG060


# Instrument Vocabulary
We can use the **Vocabulary API endpoint** to convert a given reference designator for an instrument into its descriptive names.

In [None]:
# Instrument Information
site = 'GP03FLMB'
node = 'RIM01'
instrument = '02-CTDMOG060'

VOCAB_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12586/vocab/inv'

# Create the request URL (3 different ways)
# request_url = VOCAB_BASE_URL + '/'+site+'/'+ node+'/'+instrument # Good
# request_url = "%s/%s/%s/%s" % (VOCAB_BASE_URL,site,node,instrument) # Better
request_url ='/'.join((VOCAB_BASE_URL,site,node,instrument)) # Python wizard best

# Retrieve vocabulary information for a given instrument
r = requests.get(request_url, auth=(USERNAME, TOKEN))
data = r.json()

data

In [None]:
print("%s, %s, %s" % (data[0]['tocL1'],data[0]['tocL2'],data[0]['instrument']))

# Asynchnorous Data Requests 

The OOI provides 2 ways to access OOI data.

* **Synchronous** requests are great when you just want to grab a little bit of data quickly. Thus, it is very useful for applications like interactive graphs on a web site.
* **Asynchronous** requests are if you want a larger and more complete dataset, which take longer to process, sometimes up to an hour or more depending on how much data you are requesting and what else the system is doing.  The end result is a web accessible directory of NetCDF data files.


| | Synchronous requests | Asynchronous requests |
| -- | -- |
| **Response Time** | Seconds | Minutes to hours |
|**Data Resolution** | Up to 20k points | Full resolution |
| **Data Format** | JSON | Downloadable NetCDF or CSV (not recommended) |
| **Advantages** | Good for interactive web graphs and quick look plots | Good for data analysis, data available via web server (temporarily) |
| **Disadvantages** | low resolution, need to parse JSON array | Can take time to process, files can be large |




Making asynchronous requests through the API  is essentially the same as requesting a download from the OOI Data Portal, but with the API you can easily create one or more requests in an automated way.

To make a data request, we basically construct a URL using the reference designator, delivery method, stream name and other parameters.  The URL is constructed using the following format:
> /sensor/inv/{subsite}/{node}/{sensor}/{method}/{stream}

In order to make the code clear, we're going to setup several variables and then use the **join()** function to concatenate all of the variables together with slashes.

We can also specify a number of additional optionals using the **params** array. 
* We are also going to specify a start (**beginDT**) and ending date/time (**endDT**) for our request.
* By default, asynchronous requests will return NetCDF files, but you could also specify csv or json, using the **format** option.  
* Optionally, you can also specify **include_provenance** and **include_annotations** which will include separate text files in the output directory with that information.

In [None]:
# Instrument Information
site = 'GP03FLMB'
node = 'RIM01'
instrument = '02-CTDMOG060'
method = 'recovered_inst'
stream = 'ctdmo_ghqr_instrument_recovered'

SENSOR_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'

# Create the request URL
data_request_url ='/'.join((SENSOR_BASE_URL,site,node,instrument,method,stream))

# All of the following are optional
params = {
  'beginDT':'2015-07-01T00:00:00.000Z',
  'endDT':'2017-07-01T00:00:00.000Z',
  'format':'application/netcdf',
  'include_provenance':'true',
  'include_annotations':'true'
}


### WARNING:
#### Data request lines are commented out to prevent accidental resubmission when running through the entire notebook quickly.

In [None]:
# r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
# data = r.json()

While the Synchronous request will return an array of actual data, when making an asynchronous request, we will just get some URLs and some other metadata about our request.

In [None]:
data

## Which data URL should I use?

The **first** URL in the **allURLs** key points to the THREDDS server, which allows for programmatic data access without downloading the entire file.

In [None]:
print(data['allURLs'][0])

The **second** URL in the **allURLs** key provides a direct link to a web server which you can use to quickly download files if you don't want to go through THREDDS.

In [None]:
print(data['allURLs'][1])

## How can you check when a request is complete?

We can use the second URL to check if a status.txt file has been written to the location. If true, then the request has completed and all data have been delivered to the THREDDS server.

The following for loop will poll the location for the status.txt files 1000 times, once every second. Therefore, it will quit after about 16.6 minutes. If you are requesting a very large and dense dataset (for example, 3 years of BOTPT data collected at 20 Hz), it may take a bit longer to complete the request, so you will want to bump up the retry range or lower the sleep interval.

In [None]:
%%time
check_complete = data['allURLs'][1] + '/status.txt'
for i in range(1000): 
    r = requests.get(check_complete)
    if r.status_code == requests.codes.ok:
        print('request completed')
        break
    else:
        time.sleep(1)

# Next Steps

In this example we made an asynchronous data request, received a URL where the data will be available when it is ready.

Once the dataset has finished processing, we can download the resulting NetCDF files and start playing with the data, which we will cover in Example 2.

You can also use [NASA's Panoply](https://www.giss.nasa.gov/tools/panoply/) software to open the NetCDF files on your machine.  This is a great tool to peruse the metadata and make some quick plots.