# Exploring OPTAA sensor data
In preparation for an OOIFB summer school in July 2023, I am developing a baseline knowledge about the spectrophotometer data that is available from OOI. The week-long summer school will educate OOI data users on how to access OPTAA (Spectrophotometer) instrument data, prepare it for analysis, and where the data comes from.

I am requesting and downloading data from one of the Coastal Pioneer NES array nodes that had an OPTAA instrument on a more recent deployment. After checking the content and structure of the downloaded dataset, I will plot a couple of different views of the data. I am interested in whether annotated data is already removed from the dataset, and how the visualizations I create will compare to what is currently available at [OOI Data Explorer](https:\\dataexplorer.oceanobservatories.org).

In [9]:
# Import modules needed for this notebook

import os
import ooinet
import numpy as np
import pandas as pd
import xarray as xr
import dask
import matplotlib.pyplot as plt
import seaborn as sns

from ooi_data_explorations import common

# eventually when I have source code in this project folder, I will include the following line:
# import optaa_data_explorations as optaa

In [2]:
# Pick reference designator for instrument of interest
refdes = 'CP03ISSM-RID27-01-OPTAAD000'

[site, node, sensor] = refdes.split('-', 2)

In [3]:
# Show available data recovery methods for refdes (maybe also show metadata)
methods = common.list_methods(site, node, sensor)
methods

['recovered_host', 'telemetered']

In [4]:
# Show available data streams for refdes and method
streams = []
for method in methods: streams.append(common.list_streams(site, node, sensor, method))
streams

[['optaa_dj_dcl_instrument_recovered', 'optaa_dj_dcl_metadata_recovered'],
 ['optaa_dj_dcl_instrument', 'optaa_dj_dcl_metadata']]

In [5]:
# Show deployments for refdes (& get dates? How will this be used to build the data request?)
deployments = common.list_deployments(site, node, sensor)
deployments

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

In [6]:
# Also show annotations available from OOINet for this sensor
annotations = common.get_annotations(site, node, sensor)
annotations = pd.DataFrame(annotations)
with pd.option_context('display.max_colwidth', None):
  display(annotations["annotation"])

0                                                                                                                                                                                                                                                                                                                                                             Deployment 11: Telemeterd and recovered data potentially unavailable for several instruments on DCL27 from 2020-09-12 through 2020-09-14.
1                                                                                                                                                                                                                                                                                                                                                               The OPTAA burst duration was changed from 2 minutes to 4 minutes as of the start date/time of this annotation (start of Deployment 11).
2                       

In [7]:
# Pick method, stream, deployment from options shown above
method = methods[0]             # picking recovered host method; check OOINet for data availability to choose the most complete timeseries
stream = streams[0][0]          # data product or metadata; must pick a data stream from the list corresponding to the chosen method (first index)
deployment = deployments[13]    # picking a more recent deployment with less annotations indicating missing or suspect data
[sdt, edt] = common.get_deployment_dates(site, node, sensor, deployment)

# Build path to directory where we'll store data downloaded from OOI
filename = '-'.join([site, node, sensor, method, stream,f"deployment{deployment}"])+'.nc'
filepath = os.path.join(os.path.abspath('../data/external'), filename)

# Try importing locally saved data first
try:
    ds = xr.open_dataset(filepath)
except FileNotFoundError:
    # Otherwise download data with common.m2m_request and common.m2m_collect then write to disk
    data = common.m2m_request(site, node, sensor, method, stream, sdt, edt)
    ds = common.m2m_collect(data, use_dask=True)
    ds.to_netcdf(filepath)

  flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(


In [15]:
ds

In [24]:
absorption = ds.optical_absorption.to_dataframe(dim_order=["time", "wavelength"])
absorption

Unnamed: 0_level_0,Unnamed: 1_level_0,optical_absorption
time,wavelength,Unnamed: 2_level_1
2021-11-01 15:00:20.000000000,0,
2021-11-01 15:00:20.000000000,1,
2021-11-01 15:00:20.000000000,2,
2021-11-01 15:00:20.000000000,3,
2021-11-01 15:00:20.000000000,4,
...,...,...
2022-04-13 21:30:18.529000448,82,
2022-04-13 21:30:18.529000448,83,
2022-04-13 21:30:18.529000448,84,
2022-04-13 21:30:18.529000448,85,


In [25]:
# plot trivariate histogram of optical absorption coefficient
# sns.set_theme(style="dark") # what other theme styles are available?

sns.heatmap(absorption)

MemoryError: 