# Request and Download OOI Data

This notebook creates requests for data and QARTOD QC test results that are available from OOINet and from the OOI dev1 server. QC tests associated with datasets from OOINet have already been implemented in production by the Data Team. The dev1 server is where datasets with results of QARTOD tests in development are hosted. Access to dev1 is restricted to OOI personnel on the internal network.

The requests built below include the retrieval method, data stream, and either the reference designator or site, node, and sensor combination for a specific instrument to request data through the OOI M2M API. The requested datasets can also be limited to a time period defined by start datetime and end datetime parameters.

After downloading the datasets and performing preprocessing to prepare the data for analysis, the datasets are saved locally to an interim data folder for the next step in testing and analyzing QARTOD test results.

### Import modules used in this notebook

In [106]:
# Import non-OOI libraries

import os
import re
import requests
import gc
import io
import ast
import warnings
warnings.filterwarnings("ignore")
import sys

import pandas as pd
import numpy as np
import xarray as xr
import netCDF4
import dask
from dask.diagnostics import ProgressBar

In [107]:
# Import OOINet library

sys.path.append("c:\\Users\\cooleyky\\Documents\\GitHub\\OOINet") # this is what was missing from the steps I followed to install ooinet and ooi-data-explorations as local dev repo
from ooinet import M2M
from ooinet.Instrument.common import process_file

In [108]:
# Import functions from ooi-data-explorations library

sys.path.append("c:\\Users\\cooleyky\\Documents\\GitHub\\ooi-data-explorations\\python") # why did the initial install not include this?
from ooi_data_explorations.uncabled.process_dosta import dosta_datalogger
from ooi_data_explorations.combine_data import combine_datasets
from ooi_data_explorations import common as ooi_common

### QARTOD in Production: Request data from the OOINet THREDDS catalog

##### Define data parameters and routines

In [None]:
# Setup parameters needed to request data

refdes = "CP01CNSM-MFD37-03-CTDBPD000"              # Coastal Pioneer Array (NES) - Central Surface Mooring CTD Bottom-pumped, is this the same as site, node, sensor?
method = "recovered_inst"                           # non-decimated data from recovered instrument
stream = "ctdbp_cdef_instrument_recovered"          # name of data stream

# Site, node, and sensor info from deconstructed reference designator

site = "CP01CNSM"
node = "MFD37"
sensor = "03-CTDBPD000"

In [None]:
# Generic preprocessing routine to do some generic dataset cleaning/processing

@dask.delayed
def preprocess(ds):
    ds = xr.open_dataset(ds)
    ds = process_file(ds)
    return ds

##### Using mostly OOINet module

In [111]:
# Use the gold copy THREDDs datasets

thredds_url = M2M.get_thredds_url(refdes, method, stream, goldCopy=True)

# Get the THREDDs catalog

thredds_catalog = M2M.get_thredds_catalog(thredds_url)

In [112]:
# Clean the THREDDs catalog

sensor_files, ancillary_files = M2M.clean_catalog(thredds_catalog, stream) 
# sensor_files

# This step separates entries from thredds_catalog if they do not match the stream. These ancillary files are usually provided because they are used in calculating a derived variable from the measured variable stream.

In [113]:
# Now build the url to access the data

sensor_files = [re.sub("catalog.html\?dataset=", M2M.URLS["goldCopy_dodsC"], file) for file in sensor_files]
# sensor_files

In [114]:
# preprocess the data

zs = [preprocess(file) for file in sensor_files]

In [115]:
# Load all the datasets

with ProgressBar():
    data = xr.concat([ds.chunk() for ds in dask.compute(*zs)], dim="time")
# data

[########################################] | 100% Completed | 5.15 ss


##### Using ooi_data_explorations modules

In [116]:
# Load data with ooi_common module

data = ooi_common.load_gc_thredds(site,node,sensor,method,stream,use_dask=True)    # Request the gold copy data through THREDDs catalog

# It looks like the OOINet module method attempts to avoid collecting ancillary files in addition to the requested sensor files which could add time to the download and open dataset step.
# load_gc_thredds() also calls process_file() within gc_collect() so we achieve the same preprocessing as in the preprocess() defined above.

Downloading 15 data file(s) from the OOI Gold Copy THREDSS catalog
Downloading and Processing Data Files: 100%|██████████| 15/15 [00:17<00:00,  1.16s/it]


In [117]:
# Make a copy of the data with a unique name

ds_prod = data.copy()
ds_prod

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type float64 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.01 MiB 39.06 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type int32 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.01 MiB 39.06 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type int32 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.01 MiB 39.06 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type int32 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.01 MiB 39.06 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type int32 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,195 Tasks,35 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 195 Tasks 35 Chunks Type datetime64[ns] numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,195 Tasks,35 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type object numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type float64 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.01 MiB 39.06 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type int32 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type object numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 1.01 MiB 39.06 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type int32 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,1.01 MiB,39.06 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type float64 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type float64 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type float64 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray
"Array Chunk Bytes 257.42 kiB 9.77 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type uint8 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,257.42 kiB,9.77 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,uint8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type object numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type float64 numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 2.01 MiB 78.12 kiB Shape (263601,) (10000,) Count 90 Tasks 35 Chunks Type object numpy.ndarray",263601  1,

Unnamed: 0,Array,Chunk
Bytes,2.01 MiB,78.12 kiB
Shape,"(263601,)","(10000,)"
Count,90 Tasks,35 Chunks
Type,object,numpy.ndarray


### QARTOD in Development: Request data from dev1 server

In [118]:
# Setup parameters needed to request data 
# We need to set new parameters since only a subset of OOI datasets is available from dev1 server.

refdes = "CP01CNSM-MFD35-05-PCO2WB000"              # Coastal Pioneer Array (NES) - Central Surface Mooring Seafloor Multi-Function Node - pCO2 Water
method = "recovered_inst"                           # non-decimated data from recovered instrument
stream = "pco2w_abc_instrument"                     # name of data stream

# Site, node, and sensor info from deconstructed reference designator
site = "CP01CNSM"
node = "MFD35"
sensor = "05-PCO2WB000"

In [88]:
# Sub in ooinet-dev1-west.intra.oceanobservatories.org into the avaialbe API urls

Dev01_urls = {}
for key in M2M.URLS:
    url = M2M.URLS.get(key)
    if "opendap" in url:
        dev1_url = re.sub("opendap", "opendap-dev1-west.intra", url)
    else:
        dev1_url = re.sub("ooinet","ooinet-dev1-west.intra", url)
    Dev01_urls[key] = dev1_url
    
Dev01_urls
   

{'data': 'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12576/sensor/inv',
 'anno': 'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12580/anno/find',
 'vocab': 'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12586/vocab/inv',
 'asset': 'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12587',
 'deploy': 'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12587/events/deployment/inv',
 'preload': 'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12575/parameter',
 'cal': 'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12587/asset/cal',
 'fileServer': 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/fileServer/',
 'dodsC': 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/',
 'goldCopy': 'https://thredds.dataexplorer.oceanobservatories.org/thredds/catalog/ooigoldcopy/public/',
 'goldCopy_fileServer': 'https://thredds.dataexplorer.oceanobservatories.org/thre

In [119]:
# Use the Dev1 data catalog URL 

api_base_url = Dev01_urls['data']
api_base_url

# Our choice of URL is similar to the URL used in the M2M example notebook here: https://github.com/ooi-data-review/2018-data-workshops/blob/master/chemistry/examples/quickstart_python.ipynb 
# The rest of the data request process through this section is modeled after the linked tutorial above. 

'https://ooinet-dev1-west.intra.oceanobservatories.org/api/m2m/12576/sensor/inv'

In [120]:

# Create the request URL

data_request_url ='/'.join((api_base_url,site,node,sensor,method,stream))

# We are using a different process for downloading data than in the OOINet section since the default URLs that are set within the other functions connect to OOINet. 
# The development environment also doesn't have a gold copy, although different functions to request non-gold copy datasets from OOINet exist in the OOINet and ooi-data-explorations modules.

# All of the following parameters are optional, but you should specify a date range to control the size of the dataset requested 

params = {
  'beginDT':'2018-01-01T00:00:00.000Z',
  'endDT':'2019-01-01T00:00:00.000Z',
  'format':'application/netcdf',
  'include_provenance':'true',
  'include_annotations':'true'
}

In [121]:
# Initialize credentials
# This process is borrowed from ooinet.M2M

import netrc
try:
    nrc = netrc.netrc()
    AUTH = nrc.authenticators('ooinet-dev1-west.intra.oceanobservatories.org')
    login, password = AUTH[0], AUTH[2]
    if AUTH is None:
        raise RuntimeError(
            'No entry found for machine ``ooinet.oceanobservatories.org`` in the .netrc file')
except FileNotFoundError as e:
    raise OSError(e, os.strerror(e.errno), os.path.expanduser('~'))

In [None]:
# # Build and send the data request
# # This cell is commented out after it has already been run once so that we don't make redundant requests.

# r = requests.get(data_request_url, params=params, auth=(login, password))
# data = r.json()
# data

In [93]:
# Loading in NetCDF files

# Copy and paste one of the URLs above manually below (below I used the THREDDS catalog URL)

url = 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/catalog/ooi/kylene.cooley@whoi.edu/20230410T161638159Z-CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument/catalog.html'

# Find all available .nc files in the directory indicated by the catalog URL

tds_url = 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC' 

# The first URL above is to the catalog/request that we built, so does the second URL actually go to the datasets?

datasets = requests.get(url).text
urls = re.findall(r'href=[\'"]?([^\'" >]+)', datasets)
x = re.findall(r'(ooi/.*?.nc)', datasets)
for i in x:
    if i.endswith('.nc') == False:
        x.remove(i)
for i in x:
    try:
        float(i[-4])
    except:
        x.remove(i)
datasets = ["/".join((tds_url, i)) for i in x] 

# I changed this from os.path.join because os.path.join will create paths with a "\\" delimiter instead of "/" on Windows which does not successfully connect to web addresses

datasets

['https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/ooi/kylene.cooley@whoi.edu/20230410T161638159Z-CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument/deployment0010_CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument_blank_20181030T020053-20190101T200052.nc',
 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/ooi/kylene.cooley@whoi.edu/20230410T161638159Z-CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument/deployment0010_CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument_20181030T030009-20181231T230009.nc',
 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/ooi/kylene.cooley@whoi.edu/20230410T161638159Z-CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument/deployment0009_CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument_blank_20180325T000055-20181029T060055.nc',
 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/ooi/kylene.cooley@whoi.

In [41]:
# Remove _blank dataset files
# I'm not sure what the files with "blank" are used for on dev1, but they seem redundant to me.

selected_datasets = []
for d in datasets:
    if 'blank' in d:
        pass
    else:
        selected_datasets.append(d)
selected_datasets

['https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/ooi/kylene.cooley@whoi.edu/20230410T161638159Z-CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument/deployment0010_CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument_20181030T030009-20181231T230009.nc',
 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/ooi/kylene.cooley@whoi.edu/20230410T161638159Z-CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument/deployment0009_CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument_20180325T010009-20181029T120009.nc',
 'https://opendap-dev1-west.intra.oceanobservatories.org/thredds/dodsC/ooi/kylene.cooley@whoi.edu/20230410T161638159Z-CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument/deployment0008_CP01CNSM-MFD35-05-PCO2WB000-recovered_inst-pco2w_abc_instrument_20180101T000009-20180329T190009.nc']

In [124]:
# Load files into xarray dataset

ds = xr.open_mfdataset(selected_datasets)
ds = ds.swap_dims({'obs': 'time'})              # Swap the primary dimension
ds = ds.chunk({'time': 100})                    # Used for optimization
ds = ds.sortby('time')                          # Data from different deployments can overlap so we want to sort all data by time stamp.
ds

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,20 Tasks,19 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 20 Tasks 19 Chunks Type int32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,20 Tasks,19 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type datetime64[ns] numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray
"Array Chunk Bytes 117.12 kiB 6.25 kiB Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type |S64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray
"Array Chunk Bytes 117.12 kiB 6.25 kiB Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type |S64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type datetime64[ns] numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type datetime64[ns] numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type datetime64[ns] numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray
"Array Chunk Bytes 117.12 kiB 6.25 kiB Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type |S64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,object,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type object numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,object,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type datetime64[ns] numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,102.48 kiB,5.47 kiB
Shape,"(1874, 14)","(100, 14)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 102.48 kiB 5.47 kiB Shape (1874, 14) (100, 14) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",14  1874,

Unnamed: 0,Array,Chunk
Bytes,102.48 kiB,5.47 kiB
Shape,"(1874, 14)","(100, 14)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray


In [56]:
# Preprocess the dataset with commands adapted for Dev01 datasets from ooi_data_explorations.common.process_file() 

# Address error in how the *_qartod_executed variables are set

qartod_pattern = re.compile(r'^.+_qartod_executed$')
for v in ds.variables:
    if qartod_pattern.match(v):
        # the shape of the QARTOD executed variables should compare to the provenance variable
        if ds[v].shape != ds['provenance'].shape:
            ds = ds.drop_vars(v)

# Convert the dimensions from obs to time and get rid of obs and other variables we don't need
# ds = ds.swap_dims({'obs': 'time'}) is already done in the cell above

ds = ds.reset_coords()
keys = ['obs', 'id', 'provenance', 'driver_timestamp', 'ingestion_timestamp',
        'port_timestamp', 'preferred_timestamp']
for key in keys:
    if key in ds.variables:
        ds = ds.drop_vars(key)

# Since the CF decoding of the time is failing, explicitly reset all instances where the units are
# seconds since 1900-01-01 to the correct CF units and convert the values to datetime64[ns] types

time_pattern = re.compile(r'^seconds since 1900-01-01.*$')
ntp_date = np.datetime64('1900-01-01')
for v in ds.variables:
    if 'units' in ds[v].attrs.keys():
        if isinstance(ds[v].attrs['units'], str):  # because some units use non-standard characters...
            if time_pattern.match(ds[v].attrs['units']):
                del(ds[v].attrs['_FillValue'])  # no fill values for time!
                ds[v].attrs['units'] = 'seconds since 1900-01-01T00:00:00.000Z'
                np_time = ntp_date + (ds[v] * 1e9).astype('timedelta64[ns]')
                ds[v] = np_time

# Sort by time

ds = ds.sortby('time')

# Clear-up some global attributes we will no longer be using

keys = ['DODS.strlen', 'DODS.dimName', 'DODS_EXTRA.Unlimited_Dimension', '_NCProperties', 'feature_Type']
for key in keys:
    if key in ds.attrs:
        del(ds.attrs[key])

try: 
    ds.encoding['unlimited_dims']
    del ds.encoding['unlimited_dims']
except KeyError:
    pass

# Resetting cdm_data_type from Point to Station and the featureType from point to timeSeries

ds.attrs['cdm_data_type'] = 'Station'
ds.attrs['featureType'] = 'timeSeries'

# Update some global attributes

ds.attrs['acknowledgement'] = 'National Science Foundation'
ds.attrs['comment'] = 'Data collected from the OOI Dev01 M2M API and reworked for use in locally stored NetCDF files.'

In [58]:
# Make a copy of the data with a unique name

ds_dev = ds.copy()
ds_dev

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type datetime64[ns] numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray
"Array Chunk Bytes 117.12 kiB 6.25 kiB Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type |S64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,117.12 kiB,6.25 kiB
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,|S64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type datetime64[ns] numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,datetime64[ns],numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray
"Array Chunk Bytes 1.83 kiB 100 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type int8 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,1.83 kiB,100 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,int8,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 7.32 kiB 400 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,7.32 kiB,400 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 14.64 kiB 800 B Shape (1874,) (100,) Count 40 Tasks 19 Chunks Type float64 numpy.ndarray",1874  1,

Unnamed: 0,Array,Chunk
Bytes,14.64 kiB,800 B
Shape,"(1874,)","(100,)"
Count,40 Tasks,19 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,102.48 kiB,5.47 kiB
Shape,"(1874, 14)","(100, 14)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 102.48 kiB 5.47 kiB Shape (1874, 14) (100, 14) Count 40 Tasks 19 Chunks Type float32 numpy.ndarray",14  1874,

Unnamed: 0,Array,Chunk
Bytes,102.48 kiB,5.47 kiB
Shape,"(1874, 14)","(100, 14)"
Count,40 Tasks,19 Chunks
Type,float32,numpy.ndarray


### Save datasets to interim data folder for further processing

In [94]:
# We will perform tests on datasets in production and development separately, so each xarray dataset is saved in a separate netCDF file

interim_data = os.path.relpath('../data/interim')           # path to interim data folder from notebook folder

dev_filename = '-'.join(('dev',ds_dev.id,))+'.nc'           # build ds_dev filename from dataset attributes
prod_filename = '-'.join(('prod',ds_prod.id,))+'.nc'        # ds_prod filename from dataset attribute

dev_path=os.path.join(interim_data, dev_filename)           # build full relative path with ds_dev filename
prod_path=os.path.join(interim_data, prod_filename)         # repeat for ds_prod

ds_dev.to_netcdf(path=dev_path)                             # provide both relative path and filename for ds_dev in path parameter
ds_prod.to_netcdf(path=prod_path)                           # repeat for ds_prod