## Downsampling cabled FLORT data
*Written by Friedrich Knuth, Rutgers University*

*Revised by Lori Garzio, Rutgers University, July 12, 2018*

This example demonstrates how to download data via the OOI API, downsample the data (in this case, because cabled data are collected at a high frequency), and save the more manageable downsampled data as a pickle file. Once you have your pickle file with the data you want, you can use the 02_plot_pk.ipynb script to visualize the data.

The downsampled pickle file from this example is created and saved in this directory for reference (chlor_a_and_temp.pd).

In [None]:
import requests
import time

import warnings
warnings.filterwarnings("ignore")

In [None]:
# Enter your API username and password
username = ''
token = ''

# Specify data for request
subsite = 'RS01SBPS'
node = 'SF01A'
sensor = '3A-FLORTD101'
method = 'streamed'
stream = 'flort_d_data_record'
beginDT = '2014-09-01T01:01:01.000Z'
endDT = '2018-06-30T01:01:01.000Z'

Build and send the request. 
#### Note: Data request lines are commented out to prevent accidental resubmission when running through the entire notebook quickly.

In [None]:
base_url = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'

data_request_url ='/'.join((base_url,subsite,node,sensor,method,stream))
params = {
    'beginDT':beginDT,
    'endDT':endDT,   
}

# r = requests.get(data_request_url, params=params, auth=(username, token))
# data = r.json()

In [None]:
# THREDDs directory containing data files
# data['allURLs'][0]

Check for the data request to complete. Note: this may take a while if you have requested a large time range of cabled data.

In [None]:
%%time
check_complete  = data['allURLs'][1] + '/status.txt'
for i in range(1800): 
    r = requests.get(check_complete)
    if r.status_code == requests.codes.ok:
        print('request completed')
        break
    else:
        time.sleep(1)

In [None]:
import re
import xarray as xr
import pandas as pd
import os

In [None]:
# List all of the .nc files in the THREDDs directory
url = data['allURLs'][0] # or copy and paste the THREDDs url here
# url = 'https://opendap.oceanobservatories.org/thredds/catalog/ooi/ooidatateam@gmail.com/20180712T150853-RS01SBPS-SF01A-3A-FLORTD101-streamed-flort_d_data_record/catalog.html'
tds_url = 'https://opendap.oceanobservatories.org/thredds/dodsC'
datasets = requests.get(url).text
urls = re.findall(r'href=[\'"]?([^\'" >]+)', datasets)
x = re.findall(r'(ooi/.*?.nc)', datasets)
for i in x:
    if i.endswith('.nc') == False:
        x.remove(i)
for i in x:
    try:
        float(i[-4])
    except:
        x.remove(i)
datasets = [os.path.join(tds_url, i) for i in x]
datasets

In [None]:
# Create a list of only the FLORT files (exclude files from other instruments that might also be provided in the data request)
datasets_sel = []
for i in datasets:
    if '2A-CTDPFA102' in i:
        pass
    else:
        datasets_sel.append(i)

In [None]:
datasets_sel

In [None]:
# make the output directory for the pickle files that are created below
new_dir = 'minute_mean_data/'
if not os.path.isdir(new_dir):
    try:
        os.makedirs(new_dir)
    except OSError:
        if os.path.exists(new_dir):
            pass
        else:
            raise

In [None]:
import pickle as pk
import gc

For each .nc file: open the file, resample the chlorophyll a and temperature data by taking minute averages, and save the information to a pickle file. This step might take a while.

In [None]:
num = 0
for i in datasets_sel:
    print('Downsampling file {} of {}'.format(str(num + 1), str(len(datasets_sel))))
    ds = xr.open_dataset(i)
    ds = ds.swap_dims({'obs': 'time'})

    chlor_a_min = pd.DataFrame()
    chlor_a_min['fluorometric_chlorophyll_a'] = ds['fluorometric_chlorophyll_a'].to_pandas().resample('T').mean()
    chlor_a_min['int_ctd_pressure'] = ds['int_ctd_pressure'].to_pandas().resample('T').mean()

    ds['seawater_temperature'].attrs.pop('units')
    chlor_a_min['seawater_temperature'] = ds['seawater_temperature'].to_pandas().resample('T').mean()

    chlor_a_min = chlor_a_min.dropna()

    out = 'minute_mean_data/' + i.split('/')[-1][:-3] + '_resampled' + '.pd'
    num = num +1

    with open(out, 'wb') as fh:
        pk.dump(chlor_a_min,fh)

    gc.collect()
print('Complete!')

In [None]:
import os

Combine each individual pickle file to one file. An alternative would be to write the information to a single pickle file in the previous code.

In [None]:
# create a single file with all the pickled data.
chlor_a_and_temp = pd.DataFrame()
for path, subdirs, files in os.walk('minute_mean_data/'):
    for name in files:
        file_name = os.path.join(path, name)
        with open(file_name, 'rb') as f:
            pd_df = pk.load(f)
            chlor_a_and_temp = chlor_a_and_temp.append(pd_df)

with open('chlor_a_and_temp.pd', 'wb') as f:
    pk.dump(chlor_a_and_temp,f)