# Internal and External Cross-Validation of OOI Coastal Endurance Sea Surface Temperatures from Surface Moorings
**Evaluation Date**: 07/03/2018

**Evaluators**: Melanie Abecassis, Geoffrey Dilly, Iain R. Caldwell

### Review Summary
This report summarizes a cross-validation/data quality review of sea surface temperature (SST) data collected by four surface moorings from the Ocean Observatories Initiative's (OOI) Coastal Endurance Array: the Oregon Shelf Surface Mooring (CE02SHSM), the Oregon Offshore Surface Mooring (CE04OSSM), the Washington Shelf Surface Mooring (CE07SHSM), and the Washington Offshore Surface Mooring (CE09OSSM). <br><br>

![map](http://oceanobservatories.org/wp-content/uploads/2018/03/CEV-OOI-Endurance-Array.jpg =600x)
<br>
We focus on SST data collected by Bulk Meteorology Instrument Packages (METBKA) deployed on each of these four surface moorings between May 2017 and April 2018, which includes data from three deployments for each mooring. <br><br>
In this report, we specifically address the following:  
* Exploring the OOI METBKA SST data and metadata
  * Is the metadata complete? (3a. and 3b.)
  * Has any of the existing data been flagged or annotated? (3c.)
  * Does the plotted data make sense? (4b.)
 <br>
* Internal cross-validation of the SST data (i.e. comparing across OOI deployments/ instruments)
  * Are there any disparities with overlapping deployment data? (4b.)
  * How does the SST data from the METBKA packages compare to CTD casts from cruises at the same moorings? (6a.)
  <br>
* External cross-validation of the SST data (i.e. comparing with satellite-derived data and nearby, non-OOI buoys)
  * How does the SST data from the surface moorings compare to satellite-derived SST for the same locations (i.e. NASA JPL MUR product)? (6a. and 6b.)
  * How does the SST data from the surface moorings compare to SST collected by other buoys nearby (i.e. NDBC/SCRIPPS buoys)? (6a. and 6b.)

## 1. OOI instruments evaluated in this report
In this report, we evaluate and cross-validate sea surface temperatures measured by bulk meteorology instrument packages (METBKA) deployed on four of the surface moorings in the Coastal Endurance Array, from May 1, 2017 - April 30, 2018. These are listed below with associated information: 

Site/ Mooring | Bottom Depth | Instrument Designator | Method | Stream 
 -- | -- | -- | -- | --
Oregon Shelf Surface Mooring | 80 m | [CE02SHSM-SBD11-06-METBKA000](https://ooinet.oceanobservatories.org/data_access/?search=CE02SHSM-SBD11-06-METBKA000) | telemetered | metbk_a_dcl_instrument
Oregon Offshore Surface Mooring | 588 m  | [CE04OSSM-SBD11-06-METBKA000](https://ooinet.oceanobservatories.org/data_access/?search=CE04OSSM-SBD11-06-METBKA000) | telemetered | metbk_a_dcl_instrument
Washington Shelf Surface Mooring | 87 m | [CE07SHSM-SBD11-06-METBKA000](https://ooinet.oceanobservatories.org/data_access/?search=CE07SHSM-SBD11-06-METBKA000) | telemetered | metbk_a_dcl_instrument
Washington Offshore Surface Mooring | 542 m | [CE09OSSM-SBD11-06-METBKA000](https://ooinet.oceanobservatories.org/data_access/?search=CE09OSSM-SBD11-06-METBKA000) | telemetered | metbk_a_dcl_instrument

<br>
We cross-validate the SST data collected by the four above instruments both internally (using CTD cast data collected during OOI cruises) and externally (comparing with satellite-derived data and other nearby buoys). 
<br> 

We focused specifically on the telemetered METBKA SST data because it covered the most complete time period. For some instruments, telemetered data is decimated (down-sampled temporally because the acquisition frequency is too high to transmit the complete data). For those instruments, the recovered data is more complete than the telemetered data. In some instances, transmission issues can also affect the telemetered data, while the instrument still records data. In that case the recovered data would also be more complete than the telemetered data. However, the bulk meteorological package data is not decimated, and there haven't been any telemetry issues (Chris Wingard, pers. comm.). So the telemetry data available should be exactly equivalent to the recovered data. The telemetered data also covers a longer period, since deployement 7 hasn't been recovered yet.
<br>
<font color='red'>NOTE TO DATA TEAM : There is no information (or we could not find information) on which data gets decimated. Since there is no recovered data on the OOI ERDDAP, this is important information to put front and center if people will be using telemetered data. <br>
In time, for instruments with decimated data, we recommend that the recovered data SHOULD be made available on ERDDAP.</font>
<br>

  ## 2. Time period of interest
We will focus on one full year of data from the start of the first deployment available on the OOI ERDDAP, i.e. May 2017 - April 2018. The dates in the following table for each deployment of the four OOI Coastal Endurance Surface Moorings come from the ERDDAP data access forms:

Site | Site code | Deployment | Latitude| Longitude | Start Date | End Date 
 -- | -- | -- | -- | --
Oregon Shelf Surface Mooring | CE02SHSM | 5 | 44.636 | -124.304 | 04/20/2017 | 10/14/2017
Oregon Shelf Surface Mooring | CE02SHSM | 6 | 44.639 | -124.304 | 10/11/2017 | 04/04/2018
Oregon Shelf Surface Mooring | CE02SHSM | 7 | 44.636 | -124.304 | 04/03/2018 | 06/14/2018
-- | -- | -- | -- | --
Oregon Offshore Surface Mooring | CE04OSSM | 4 | 44.366 | -124.941 | 04/21/2017 | 10/13/2017
Oregon Offshore Surface Mooring | CE04OSSM | 5 | 44.382 | -124.950 | 10/11/2017 | 11/14/2017
Oregon Offshore Surface Mooring | CE04OSSM | 6 | 44.366 | -124.940 | 04/03/2018 | 06/14/2018
-- | -- | -- | -- | --
Washington Shelf Surface Mooring | CE07SHSM | 5 | 46.988 | -124.568 | 04/11/2017 | 09/20/2017
Washington Shelf Surface Mooring | CE07SHSM | 6 | 46.985 | -124.564 | 10/03/2017 | 12/01/2017
Washington Shelf Surface Mooring | CE07SHSM | 7 | 46.988 | -124.568 | 03/25/2018 | 06/14/2018
 -- | -- | -- | -- | --
Washington Offshore Surface Mooring | CE09OSSM | 5 | 46.853 | -124.959 | 04/12/2017 | 10/06/2017
Washington Offshore Surface Mooring | CE09OSSM | 6 | 46.848 | -124.984 | 10/04/2017 | 03/30/2018
Washington Offshore Surface Mooring | CE09OSSM | 7 | 46.853 | -124.958 | 03/26/2018 | 04/29/2018


<br><br>




## 3. Related metadata
In this section, we review some of the metadata available for the focal SST datasets to make sure it is present and correct.

Before we get started, we need to set up our Python environment with some libraries, variables and functions we will need later in this report.

In [None]:
#Install and import packages we will need for the code to run

!pip install netCDF4 #to work with netCDF files
!pip install xarray #to work with arrays
!pip install folium #for interactive mapping
!pip install cmocean
!pip install erddapy

from erddapy import ERDDAP
import requests
import datetime
import matplotlib.pyplot as plt #plotting
import matplotlib.colors as mc #plotting colors
import xarray as xr #arrays
import pandas as pd #working with dataframes
import folium #interactive mapping
import re
import os
import time
import warnings
import pickle as pk
import gc
import numpy as np
import netCDF4 as nc
from IPython.display import display
from IPython.display import Image

In [None]:
# Add the OOI API Information
USERNAME ='OOIAPI-ZMKS84D1KXKYWU' #username from IRC - change to your own
TOKEN= 'TEMP-TOKEN-B9UND6X00XQB2T' #token from IRC - change to your own
DATA_API = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv'
VOCAB_API = 'https://ooinet.oceanobservatories.org/api/m2m/12586/vocab/inv'
ASSET_API = 'https://ooinet.oceanobservatories.org/api/m2m/12587'

In [None]:
# Specify some functions to convert timestamps
ntp_epoch = datetime.datetime(1900, 1, 1)
unix_epoch = datetime.datetime(1970, 1, 1)
ntp_delta = (unix_epoch - ntp_epoch).total_seconds()

def ntp_seconds_to_datetime(ntp_seconds):
    return datetime.datetime.utcfromtimestamp(ntp_seconds - ntp_delta).replace(microsecond=0)
  
def convert_time(ms):
  if ms != None:
    return datetime.datetime.utcfromtimestamp(ms/1000)
  else:
    return None

### 3a. Vocabulary metadata
First, we will explore the basic vocabulary information (metadata) from the system to make sure we have the right instrument.

In [None]:
# Setup instrument variables for METBKA packages from four sites 
sites = ['CE02SHSM', 'CE04OSSM', 'CE07SHSM', 'CE09OSSM']
node = 'SBD11'
instrument = '06-METBKA000'
method = 'telemetered'
stream = 'metbk_a_dcl_instrument'

print(sites)

In [None]:
#Use a FOR loop to make requests for each of the four sites
for site in sites:
  # Setup the API request url
  data_request_url ='/'.join((VOCAB_API,site,node,instrument))
  print (data_request_url)
  
  # Grab the information from the server
  r = requests.get(data_request_url, auth=(USERNAME, TOKEN))
  data = r.json()
  print (data)
  

Everything looks as we expected here.

### 3b. Deployment information
Next, we will explore some information about the deployments for these four instruments.  We will request metadata for all of the deployments between our dates of interest for each of instruments and then output the date ranges, latitude/longitude, asset ID, and sensor ID for each.  Note that the **reference designator** specified above represents the geographical location of an instrument across all deployments (e.g. the CTD on the Pioneer Upstream Offshore Profiler), the **Sensor ID** (and its Asset ID equivalent) represent the specific instrument used for a given deployment (i.e. a unique make, model, and serial numbered instrument).

In [None]:
#Use a function to make requests for each of the four sites and their deployments

def print_metadata(site):
  print (site)
  # Setup the API request url
  data_request_url = ASSET_API + '/events/deployment/query'
  params = {
      'beginDT':'2017-05-01T00:00:00.000Z',
      'endDT':'2018-04-30T00:00:00.000Z',
      'refdes':site+'-'+node+'-'+instrument,   
  }
  # Grab the information from the server
  r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
  data = r.json()
  df = pd.DataFrame() # Setup empty array
  for d in data:
    df = df.append({
        'deployment': d['deploymentNumber'],
        'start': convert_time(d['eventStartTime']),
        'stop': convert_time(d['eventStopTime']),
        'latitude': d['location']['latitude'],
        'longitude': d['location']['longitude'],
        'sensor': d['sensor']['uid'],
        'asset_id': d['sensor']['assetId'],
    }, ignore_index=True)
    
  return df

In [None]:
print_metadata(sites[0])

There appear to be some tiny discrepancies in the dates compared to ERDDAP.

In [None]:
print_metadata(sites[1])

There are also some discrepancies with the dates in ERDDAP for this surface mooring. Deployment #5, actually ended on 11/14/2017.

In [None]:
print_metadata(sites[2])

Again, there appears to be some weird discrepancies in the dates for this surface mooring.

In [None]:
print_metadata(sites[3])

The dates are ok for this surface mooring. The location information (latitudes and longitudes) also seem consisent with the other information we found on ERDAPP.<br>

For each buoy, there is no end date for the third deployment (as of 06/21/2018), because the third deployment isn't complete (i.e. instruments are still in the water and haven't been recovered yet).


### 3c. Annotations

Finally, let's pull any relevant annotations for the METBKA instruments on our four surface moorings of interest.

In [None]:
def get_annotations(site):
  print(site)
  ANNO_API = 'https://ooinet.oceanobservatories.org/api/m2m/12580/anno/find'
  params = {
    'beginDT':int(datetime.date(2017,1,1).strftime('%s'))*1000,
    'endDT':int(datetime.date(2018,1,1).strftime('%s'))*1000,
    'refdes':site+'-'+node+'-'+instrument,
  }

  r = requests.get(ANNO_API, params=params, auth=(USERNAME, TOKEN))
  data = r.json()

  df = pd.DataFrame() # Setup empty array
  for d in data:
    df = df.append({
      'annotation': d['annotation'],
      'start': convert_time(d['beginDT']),
      'stop': convert_time(d['endDT']),
      'site': d['subsite'],
      'node': d['node'],
      'sensor': d['sensor'],
      'id': d['id']
    }, ignore_index=True)
  pd.set_option('display.max_colwidth', -1) # Show the full annotation text
  return df

In [None]:
get_annotations(sites[0])

No annotations for CE02SHSM, and no data gap : <font color='green'>OK</font>

In [None]:
get_annotations(sites[1])

No existing annotations for CE04OSSM but there is a big data gap from 11/15/2017-04/03/2018 : <font color='red'>Need to add an annotation for the data gap.</font>

In [None]:
get_annotations(sites[2])

There are two data gaps for CE07SHSM:<br>
- from 09/21/2017-10/02/2017 : <font color='red'>This needs an annotation.</font><br>
- from 12/02/2017-03/25/2018 : <font color='red'>The existing annotation needs correct start and end dates.</font>



In [None]:
get_annotations(sites[3])

No existing annotations, and no apparent data gap for CE09OSSM : <font color='green'>OK</font>

## 4. OOI data exploration
Now let's take a look at the full extent of the data for each surface mooring.

###4a.  ERDDAP
We first attempted to access the data through ERDDAP, but with no success as the ERDAPP site did not seem to be working during the workshop....

In [None]:
server = 'https://erddap-uncabled.oceanobservatories.org/uncabled/erddap'

dataset_id = 'CE04OSSM-SBD11-06-METBKA000-metbk_a_dcl_instrument-telemetered-deployment0005-tabledap'
#dataset_id = 'CE02SHSM-SBD11-06-METBKA000-metbk_a_dcl_instrument-telemetered-deployment0005-tabledap'

constraints = {
    'time>=': '2017-05-01T00:00:00Z',
    #'time<=': '2017-05-04T00:00:00Z',
    'time<=': '2018-04-30T00:00:00Z',

}

variables = [
 'latitude',
 'longitude',
 'sea_surface_temperature',
 'sea_surface_temperature_qc_executed',
 'sea_surface_temperature_qc_results'
 'time',
]



In [None]:
e = ERDDAP(
    server=server,
    protocol='tabledap',
    response='nc'
)

e.dataset_id=dataset_id
e.constraints=constraints
e.variables=variables


print(e.get_download_url())

In [None]:
df = e.to_pandas(
    index_col='time',
    parse_dates=True,
    skiprows=(1,)  # units information can be dropped.
).dropna()

df.head()

ERDDAP is not working. Every request, however small, times out ....
So now, trying with the OOI API.

###4b. OOI API
Although the OOI API is a more complex approach to accessing the data compared to ERDAPP,  this route has proven to be a more robust approach to accessing the data. 

One added benefit of accessing the data through the THREADDS server (as we have done below) is that the data is hosted on the server after each request. This means that the request only needs to be fullfilled once, and then can be accessed easily by referring to the THREADDS URL. In contrast, ERDAPP servers do not host the data so requests have to be made each time the code is run, which can take a lot of time to run depending on the size of the request.

In [None]:
sites = ['CE02SHSM', 'CE04OSSM', 'CE07SHSM', 'CE09OSSM']

In [None]:
# this function creates the folders on the THREDDS server for each buoy. Takes a while, so only run it once, 
#and then use the THREDDS URL (copy it manually) in subsequent functions to save time...

def get_request(site):
  
  # first we build the request:
  node = 'SBD11'
  instrument = '06-METBKA000'
  method = 'telemetered'
  stream = 'metbk_a_dcl_instrument'

  SENSOR_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'

  # Create the request URL
  data_request_url ='/'.join((SENSOR_BASE_URL,site,node,instrument,method,stream))

  # All of the following are optional
  params = {
    'beginDT':'2017-05-01T00:00:00.000Z',
    'endDT':'2018-04-30T00:00:00.000Z',
    'format':'application/netcdf',
    'include_provenance':'true',
    'include_annotations':'true'
  }

  r = requests.get(data_request_url, params=params, auth=(USERNAME, TOKEN))
  data = r.json()
  
  #then we wait until data is ready for download
  %%time
  check_complete = data['allURLs'][1] + '/status.txt'
  for i in range(1000): 
      r = requests.get(check_complete)
      
      #once data is ready, we select only the .nc files:
      if r.status_code == requests.codes.ok:
          url = data['allURLs'][0]
          tds_url = 'https://opendap.oceanobservatories.org/thredds/dodsC'
          datasets = requests.get(url).text
          urls = re.findall(r'href=[\'"]?([^\'" >]+)', datasets)
          x = re.findall(r'(ooi/.*?.nc)', datasets)
          for i in x:
              if i.endswith('.nc') == False:
                  x.remove(i)
          for i in x:
              try:
                  float(i[-4])
              except:
                  x.remove(i)
          datasets = [os.path.join(tds_url, i) for i in x]
          break
      else:
          time.sleep(1)
  return datasets



          

In [None]:
get_request(sites[3]) #Only need to run this once - as mentioned above

In [None]:
# this function cleans the list of files from THREDDS and only selects the NetCDF files
def get_datasets(url):
  tds_url = 'https://opendap.oceanobservatories.org/thredds/dodsC'
  datasets = requests.get(url).text
  urls = re.findall(r'href=[\'"]?([^\'" >]+)', datasets)
  x = re.findall(r'(ooi/.*?.nc)', datasets)
  for i in x:
      if i.endswith('.nc') == False:
          x.remove(i)
  for i in x:
      try:
          float(i[-4])
      except:
          x.remove(i)
  datasets = [os.path.join(tds_url, i) for i in x]
  return datasets

In [None]:
# and this one removes ancillary data and combines all data into one file per site
def pool_data(datasets,site):
  #then we exclude the data that is irrelevant: 
  #in this case, VELPT data that is used to generate other L2 data from METBK, but not SST
  datasets_sel = []
  for i in datasets:
      if ('VELPT' in i):
          pass
      else:
          datasets_sel.append(i)

  # we make an output directory where we are gonna store one file per .nc file, but containing inly SST, not everything
  # NOTE: there is a way to select only a few variables using the API which would make this next step unnecessary
  new_dir = 'SST_mean_data_' + site + '/'
  if not os.path.isdir(new_dir):
      try:
          os.makedirs(new_dir)
      except OSError:
          if os.path.exists(new_dir):
              pass
          else:
              raise

  # we select only time and SST from .nc files and save them as pandas df
  # NOTE: it might be helpful to also grab the SST quality data if time allows.
  num = 0
  for i in datasets_sel:
    
    ds = xr.open_dataset(i)
    ds = ds.swap_dims({'obs': 'time'})

    sst = pd.DataFrame()
    ds['sea_surface_temperature'].attrs.pop('units')
    sst['sea_surface_temperature'] = ds['sea_surface_temperature'].to_pandas()

    #sst = sst.dropna()

    out = 'SST_mean_data_' + site + '/' + i.split('/')[-1][:-3] + '.pd'
    num = num +1

    with open(out, 'wb') as fh:
        pk.dump(sst,fh)

    gc.collect()


  # we combine everything into a single file per site (ie. for each buoy, 3 deployments into one file).
  sst_tot = pd.DataFrame()
  for path, subdirs, files in os.walk('SST_mean_data_' + site + '/'):
      for name in files:
          file_name = os.path.join(path, name) 
          with open(file_name, 'rb') as f:
              pd_df = pk.load(f)
              sst_tot = sst_tot.append(pd_df)

  with open('sst_tot_' + site + '.pd', 'wb') as f:
      pk.dump(sst_tot,f)


In [None]:
#finally (... !!!), some plotting.
def plot_buoy(site):
    with open('sst_tot_' + site + '.pd', 'rb') as f:
        sst_data = pk.load(f)

    time_stamp = list(sst_data.index.values)
    sst1 = list(sst_data['sea_surface_temperature'].values)

    plt.figure(figsize=(12, 7))
    plt.plot(time_stamp,sst1,linestyle='None',marker=".")
    plt.title(site)
    plt.ylabel('SST (ºC)')  
    return sst_data

#### CE02SHSM

In [None]:
datasetsCE02=get_datasets('https://opendap.oceanobservatories.org/thredds/catalog/ooi/robertson.caldwell@gmail.com/20180621T150838-CE02SHSM-SBD11-06-METBKA000-telemetered-metbk_a_dcl_instrument/catalog.html')
pool_data(datasetsCE02,sites[0])
sst_CE02=plot_buoy(sites[0])

In [None]:
get_datasets('https://opendap.oceanobservatories.org/thredds/catalog/ooi/robertson.caldwell@gmail.com/20180621T150838-CE02SHSM-SBD11-06-METBKA000-telemetered-metbk_a_dcl_instrument/catalog.html')

Although there is no data gap, it looks like the SST sensor was not working during deployment #4 and there is no annotation to explain what might have happened.<br>
From Chris Wingard: the instrument was not connected correctly. This should be flagged.<br>
<font color='red'> -5ºC values are the FillValues for data issues for SST. They will be replaced by NaNs below.</font>

In [None]:
plt.hist(sst_CE02['sea_surface_temperature'],bins=100);
plt.title(sites[0])
plt.xlabel('SST (ºC)')  

In [None]:
sst_CE02_clean=sst_CE02.replace(-5, np.nan)
df0 = sst_CE02_clean.resample('1D').mean()
df0_sd = sst_CE02_clean.resample('1D').std()

time_stamp = np.array(df0.index.values)

sst = np.array(df0['sea_surface_temperature'].values)
sst_sd = np.array(df0_sd['sea_surface_temperature'].values)
  
plt.figure(figsize=(12, 7))

plt.plot(df0.index,df0['sea_surface_temperature'].values,linestyle='None',marker=".")

xlim_min = datetime.datetime(2017, 5, 1, 0, 0)
xlim_max = datetime.datetime(2018, 4, 29, 0, 0)
plt.xlim(xlim_min,xlim_max)

plt.errorbar(time_stamp, sst, yerr=sst_sd)
plt.title(sites[0])
plt.ylabel('SST (ºC)');  





Green bars are standard deviations over each day.

In [None]:
plt.hist(sst[~np.isnan(sst)],bins=20);
plt.title(sites[0])
plt.xlabel('SST (ºC)');  

#### CE04OSSM

In [None]:
datasetsCE04=get_datasets('https://opendap.oceanobservatories.org/thredds/catalog/ooi/robertson.caldwell@gmail.com/20180621T151901-CE04OSSM-SBD11-06-METBKA000-telemetered-metbk_a_dcl_instrument/catalog.html')
pool_data(datasetsCE04,sites[1])
sst_CE04=plot_buoy(sites[1])

As expected from the ERDDAP data access forms, there is a big data gap at the end of 2017 and beginning of 2018, for which there is no annotation.<br>
From Chris Wingard: A wave event ripped off the WindSonic (wind sensor), which caused a short in the whole system.

In [None]:
plt.hist(sst_CE04['sea_surface_temperature'],bins=40);
plt.title(sites[1])
plt.xlabel('SST (ºC)'); 

In [None]:
sst_CE04_clean=sst_CE04.replace(-5, np.nan)
df1 = sst_CE04_clean.resample('1D').mean()
df1_sd = sst_CE04_clean.resample('1D').std()

time_stamp = np.array(df1.index.values)

sst = np.array(df1['sea_surface_temperature'].values)
sst_sd = np.array(df1_sd['sea_surface_temperature'].values)
  
plt.figure(figsize=(12, 7))

plt.plot(df1.index,df1['sea_surface_temperature'].values,linestyle='None',marker=".")

xlim_min = datetime.datetime(2017, 5, 1, 0, 0)
xlim_max = datetime.datetime(2018, 4, 29, 0, 0)
plt.xlim(xlim_min,xlim_max)

plt.errorbar(time_stamp, sst, yerr=sst_sd)
plt.title(sites[1])
plt.ylabel('SST (ºC)');  




In [None]:
plt.hist(sst[~np.isnan(sst)],bins=20);
plt.title(sites[1])
plt.xlabel('SST (ºC)');  

#### CE07SHSM

In [None]:
datasetsCE07=get_datasets('https://opendap.oceanobservatories.org/thredds/catalog/ooi/robertson.caldwell@gmail.com/20180621T153325-CE07SHSM-SBD11-06-METBKA000-telemetered-metbk_a_dcl_instrument/catalog.html')
pool_data(datasetsCE07,sites[2])
sst_CE07=plot_buoy(sites[2])

As expected, there are large data gaps (particularly at the end of 2017 and start of 2018), and one outlier (-5 value). <br>
From Chris Wingard: The large data gap is due to a power system failure. <font color='red'>However, there is raw data for Oct-Dec. 2017. It appears that it just has not been ingested yet for some reason.</font>


In [None]:
plt.hist(sst_CE07['sea_surface_temperature'],bins=40);
plt.title(sites[2])
plt.xlabel('SST (ºC)'); 

In [None]:
sst_CE07_clean=sst_CE07.replace(-5, np.nan)
df2 = sst_CE07_clean.resample('1D').mean()
df2_sd = sst_CE07_clean.resample('1D').std()

time_stamp = np.array(df2.index.values)

sst = np.array(df2['sea_surface_temperature'].values)
sst_sd = np.array(df2_sd['sea_surface_temperature'].values)
  
plt.figure(figsize=(12, 7))

plt.plot(df1.index,df2['sea_surface_temperature'].values,linestyle='None',marker=".")

xlim_min = datetime.datetime(2017, 5, 1, 0, 0)
xlim_max = datetime.datetime(2018, 4, 29, 0, 0)
plt.xlim(xlim_min,xlim_max)

plt.errorbar(time_stamp, sst, yerr=sst_sd)
plt.title(sites[2])
plt.ylabel('SST (ºC)');  

In [None]:
plt.hist(sst[~np.isnan(sst)],bins=20);
plt.title(sites[2])
plt.xlabel('SST (ºC)');  

#### CE09OSSM

In [None]:
datasetsCE09=get_datasets('https://opendap.oceanobservatories.org/thredds/catalog/ooi/robertson.caldwell@gmail.com/20180621T162604-CE09OSSM-SBD11-06-METBKA000-telemetered-metbk_a_dcl_instrument/catalog.html')
pool_data(datasetsCE09,sites[3])
sst_CE09=plot_buoy(sites[3])

This one looks good, with no data gap!!

In [None]:
plt.hist(sst_CE09['sea_surface_temperature'],bins=40);
plt.title(sites[3])
plt.xlabel('SST (ºC)'); 

In [None]:
sst_CE09_clean=sst_CE09.replace(-5, np.nan)
df3 = sst_CE09_clean.resample('1D').mean()
df3_sd = sst_CE09_clean.resample('1D').std()

time_stamp = np.array(df3.index.values)

sst = np.array(df3['sea_surface_temperature'].values)
sst_sd = np.array(df3_sd['sea_surface_temperature'].values)
  
plt.figure(figsize=(12, 7))

plt.plot(df3.index,df3['sea_surface_temperature'].values,linestyle='None',marker=".")

xlim_min = datetime.datetime(2017, 5, 1, 0, 0)
xlim_max = datetime.datetime(2018, 4, 29, 0, 0)
plt.xlim(xlim_min,xlim_max)

plt.errorbar(time_stamp, sst, yerr=sst_sd)
plt.title(sites[3])
plt.ylabel('SST (ºC)');  

In [None]:
plt.hist(sst[~np.isnan(sst)],bins=20);
plt.title(sites[3])
plt.xlabel('SST (ºC)');  

#### Combining into a single dataframe
To make it easier to compare the OOI surface mooring data with other data sources, we combine the data from all four moorings into a single dataset.

In [None]:
df_tot=pd.concat([df0,df0_sd,df1,df1_sd,df2,df2_sd,df3,df3_sd],axis=1)

In [None]:
df_tot.columns=['CE02_sst','CE02_sst_sd','CE04_sst','CE04_sst_sd','CE07_sst','CE07_sst_sd','CE09_sst','CE09_sst_sd']

In [None]:
df_tot.head()

<font color='red'>NOTE: We pulled data from all 3 deployments into one time-series for each buoy. However, for all of the buoys, the spring and fall deployments are at slightly different locations. It would be better to plot each deployment in separate colors.</font>

##5. Extracting comparative data sources

###5a. Mapping the comparative data

To gain a sense of where each of the data sources have been collected in geographic space, we now map the locations of the four surface moorings with the track from the Endurance 08 cruise and the locations of the two non-OOI buoys that are closest to the OOI Coastal Endurance array. The locations for the non-OOI buoys are as follows: 

Buoy Station # | Site | Organization | Latitude | Longitude |Nearest OOI Surface Mooring
 -- | -- | -- | -- | -- | --
46050 | Stonewall Bank | NDBC | 44.677°N | 124.515°W | Oregon Shelf (CE02SHSM)
46211 | Gray's Harbour | SCRIPPS | 46.858°N | -124.244°W | Washington Shelf (CE07SHSM)


<font color='red'>NOTE: We could not find information about where the Endurance 09 cruise travelled (i.e. locations and time stamps) on the al fresco site or the r2r site. We recommend that such information should be uploaded to both of those sites shortly after the cruise so it will be available for cross-comparisons as soon as possible.</font>

First, we create a dataframe with the site names and locations (latitudes and longitudes) of the surface moorings and the other nearby, non-OOI buoys of interest.

In [None]:
sites = ['CE02SHSM', 'CE04OSSM', 'CE07SHSM', 'CE09OSSM', 'NDBC_46050', 'SCRIPPS_46211']
lats = ['44.6393', '44.3811', '46.9859', '46.8508', '44.677', '46.858']
longs = ['-124.304', '-124.956', '-124.566', '-124.972', '-124.515', '-124.244']
orgs = ['OOI', 'OOI', 'OOI', 'OOI', 'NDBC', 'SCRIPPS']

siteLocs = pd.DataFrame(data={'Site':sites, 'Latitude':lats, 'Longitude':longs, 'Organization':orgs})
siteLocs

Next, we will download the Endurance 08 cruise track from the r2r website.

In [None]:
cruiseCE8_URL = 'http://get.rvdata.us/cruise/SKQ201715S/products/r2rnav/SKQ201715S_1min.r2rnav'
cruiseCE8_df = pd.read_table(cruiseCE8_URL, delim_whitespace=True, header=None, skiprows=3) #first three rows are a header
cruiseCE8_df.columns = ['Datetime','Longitude','Latitude','Speed','Course']
cruiseCE8_df.head()

Add function to create a colored track from the cruise location dataframe

In [None]:
def plot_track(df, name, popup = 'track', color='orange'): 
    df = df.reset_index().drop_duplicates(['Latitude','Longitude'], keep='first').sort_values('Datetime')

    locations = list(zip(df['Latitude'].values, df['Longitude'].values))
    folium.PolyLine(
        locations=locations,
        popup=popup,
        color=color,
        weight=8,
        opacity=0.7,
        tooltip=name
    ).add_to(map)

Add tiles for the basemap

In [None]:
tiles = ('http://services.arcgisonline.com/arcgis/rest/services/'
         'World_Topo_Map/MapServer/MapServer/tile/{z}/{y}/{x}') #accesses basemap

Create a map showing the locations of the four surface moorings, two nearby, non-OOI buoys, and the track of the Coastal Endurance 8 cruise.

In [None]:
#Create an empty folium interactive map
map = folium.Map(location=(45.5, -124), zoom_start=6.5,
               tiles=tiles, attr='ESRI')

#define the colors
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

#Plot the track of the Endurance 8 cruise
plot_track(cruiseCE8_df, 'Endurance 8', popup = 'SKQ201715S', color=mc.to_hex(colors[0]))

#Add markers for each of the moorings and buoys
for point in range(len(siteLocs)):
  lat = float(siteLocs.iloc[point]['Latitude'])
  lon = float(siteLocs.iloc[point]['Longitude'])
  location = [lat, lon]
  color = 'yellow'
  if siteLocs['Organization'][point] != 'OOI':
    color = 'green'
    
  #folium.Marker(location, popup=siteLocs.iloc[point]['Site'], icon = iconField).add_to(map)
  folium.CircleMarker(location,
                    radius=5,
                    popup=siteLocs.iloc[point]['Site'],
                    color=color,
                    fill_color=color,
                   ).add_to(map)
  
# Save it as html
#map.save('OOI_CEsurfaceVsCruiseVsBuoy.html')

map

These locations match what we expected. The four yellow circles mark the locations of the four OOI surface moorings being assessed. The two green circles mark the non-OOI buoys being used for comparison, and the blue lines mark the track for the Endurance 08 cruise. As noted above, the Endurance 09 locations were not available yet so we could not map their track, even though we use CTD casts from both cruises for our comparisons.

###5b. Satellite-derived SST data

As one external cross-validation of SST from the four focal Coastal Endurance OOI surface moorings and the two non-OOI buoys, we compare with satellite derived SST from [NASA's Jet Propulsion Lab Multi-scale Ultra-high Resolution Sea Surface Temperature (JPL MUR SST)](https://mur.jpl.nasa.gov/). The JPL MUR SST is a daily "blended" product, combining information from many satellites to get the best (cloud-free) data available. To extract this satellite data, we use the [NOAA Coastwatch](https://coastwatch.pfeg.noaa.gov/erddap/index.html) ERDAPP RESTful API, that makes it easy to search for and request data, focusing on the locations of our four surface moorings and two other non-OOI buoys of interest.



First we need to create a dataframe that includes all locations of the surface moorings and other nearby buoys that takes into account the change of location of the surface moorings between deployment. Although we created a similar dataframe earlier for the mapping step, that only included one location for each of the surface mooring.

In [None]:
#Create a dataframe with the site names and locations (latitude and longitude) 
#  of the surface moorings for which we want to extract the satellite data
ooiSitesA = ['CE02SHSM', 'CE04OSSM', 'CE07SHSM', 'CE09OSSM']
ooiDepA = ['5', '4', '5', '5']
ooiBeginDatesA = ['2017-05-01', '2017-05-01', '2017-05-01', '2017-05-01']
ooiEndDatesA = ['2017-10-14', '2017-10-13', '2017-09-20', '2017-10-06']
ooiLatsA = ['44.636', '44.366', '46.988', '46.853']
ooiLongsA = ['-124.304', '-124.941', '-124.568', '-124.958']
ooiLocsA = pd.DataFrame(data={'Site':ooiSitesA, 'Deployment': ooiDepA,
                              'BeginDate':ooiBeginDatesA, 'EndDate':ooiEndDatesA, 
                              'Latitude':ooiLatsA, 'Longitude':ooiLongsA})
#ooiLocsA

ooiSitesB = ['CE02SHSM', 'CE04OSSM', 'CE07SHSM', 'CE09OSSM']
ooiDepB = ['6', '5', '6', '6']
ooiBeginDatesB = ['2017-10-11', '2017-10-11', '2017-10-03', '2017-10-01']
ooiEndDatesB = ['2018-04-04', '2017-11-14', '2017-12-01', '2018-03-30']
ooiLatsB = ['44.639', '44.382', '46.985', '46.848']
ooiLongsB = ['-124.304', '-124.950', '-124.564', '-124.984']
ooiLocsB = pd.DataFrame(data={'Site':ooiSitesB, 'Deployment': ooiDepB,
                              'BeginDate':ooiBeginDatesB, 'EndDate':ooiEndDatesB, 
                              'Latitude':ooiLatsB, 'Longitude':ooiLongsB})
#ooiLocsB

ooiSitesC = ['CE02SHSM', 'CE04OSSM', 'CE07SHSM', 'CE09OSSM']
ooiDepC = ['7', '6', '7', '7']
ooiBeginDatesC = ['2018-04-03', '2018-04-03', '2018-03-25', '2018-03-26']
ooiEndDatesC = ['2018-05-01', '2018-05-01', '2018-05-01', '2018-05-01']
ooiLatsC = ['44.636', '44.366', '46.988', '46.853']
ooiLongsC = ['-124.304', '-124.940', '-124.568', '-124.958']
ooiLocsC = pd.DataFrame(data={'Site':ooiSitesC, 'Deployment': ooiDepC,
                              'BeginDate':ooiBeginDatesC, 'EndDate':ooiEndDatesC, 
                              'Latitude':ooiLatsC, 'Longitude':ooiLongsC})
#ooiLocsC

ndbcSites = ['NDBC_46050', 'SCRIPPS_46211']
ndbcDep = ['All', 'All']
ndbcBeginDates = ['2017-05-01', '2017-05-01']
ndbcEndDates = ['2018-05-01', '2018-05-01']
ndbcLats = ['44.677', '46.858']
ndbcLongs = ['-124.515', '-124.244']
ndbcLocs = pd.DataFrame(data={'Site':ndbcSites, 'Deployment': ndbcDep,
                              'BeginDate':ndbcBeginDates, 'EndDate':ndbcEndDates, 
                              'Latitude':ndbcLats, 'Longitude':ndbcLongs})

#ndbcLocs

siteLocs2 = pd.concat([ooiLocsA, ooiLocsB, ooiLocsC, ndbcLocs])
siteLocs2

Next, we assemble the URL needed to access the data from ERDAPP

This is an example of the URL for the first entry:
https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.asc?analysed_sst[(2017-05-01T00:00:00Z):1:(2018-05-01T00:00:00Z)][(44.6393):1:(44.6393)][(-123.304):1:(-123.304)],analysis_error[(2017-05-01T00:00:00Z):1:(2018-05-01T00:00:00Z)][(44.6393):1:(44.6393)][(-123.304):1:(-123.304)]


In [None]:
#Assemble the URL's for the satellite data for each location and our 
# time period of interest and print them at the end for verification

baseURL = 'https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.csv?'
includeError = True #Indicates whether to include the estimated errors around the SST as well

allURLs = [] #creates empty vector to fill in with each URL
for i in range(len(siteLocs2)): #iterates through the row numbers of the 'siteLocs' dataframe
  beginDate = siteLocs2.iloc[i]['BeginDate'] + 'T00:00:00Z'
  endDate = siteLocs2.iloc[i]['EndDate'] + 'T00:00:00Z'
  dateRangeURL = '[(' + beginDate + '):1:(' + endDate + ')]' #assembles the date section of the URL
  beginLat = siteLocs2.iloc[i]['Latitude']
  endLat = beginLat
  latRangeURL = '[(' + beginLat + '):1:(' + endLat + ')]' #assembles the latitude section of the URL
  beginLong = siteLocs2.iloc[i]['Longitude']
  endLong = beginLong
  longRangeURL = '[(' + beginLong + '):1:(' + endLong + ')]' #assembles the longitude section of the URL
  dateLocURL = dateRangeURL + latRangeURL + longRangeURL
  newURL = baseURL + 'analysed_sst' + dateLocURL #assembles the complete URL
  
  if includeError == True: #includes the SST error if we said we wanted it above
    newURL = newURL + ',analysis_error' + dateLocURL
    
  allURLs.append(newURL) #appends each URL into the empty list

siteLocs2['URL'] = allURLs #adds the URL vector to the 'siteLoc' dataframe
siteLocs2 #displays this dataframe for testing

Using the URLs we generated, we now download the data from the ERDAPP server (this step can take awhile but does need to be run each time, unlike the THREADDS request through the OOI API)

This next step is a time limiting step for the code as it is the actual request for the data from ERDAPP and takes ~8min to run.

In [None]:
#Get the satellite SST data for each of the locations using the URLs listed above
# then combine them into a single dataframe.
for i in range(len(siteLocs2)): #iterates through the row numbers of the 'siteLocs' dataframe
  URL = siteLocs2.iloc[i]['URL']
  if i == 0: #for the first row (eliminates the need to assign an empty dataframe)
    satSSTs = pd.read_csv(URL) #reads the URL as a csv and removes the first (header) row
    satSSTs = satSSTs.iloc[1:] #remove the first line after the header
    satSSTs['Site'] = siteLocs2.iloc[i]['Site'] #Adds the site to the new 'satSSTs' dataframe
    satSSTs['Latitude'] = siteLocs2.iloc[i]['Latitude']
    satSSTs['Longitude'] = siteLocs2.iloc[i]['Longitude']
  else: #for all other rows (so that it appends to the existing file)
    newDF = pd.read_csv(URL) #read the URL as a CSV 
    newDF = newDF.iloc[1:] #remove the first line after the header
    newDF['Site'] = siteLocs2.iloc[i]['Site'] #Add the site to the new dataframe
    newDF['Latitude'] = siteLocs2.iloc[i]['Latitude'] #Add the site to the new dataframe
    newDF['Longitude'] = siteLocs2.iloc[i]['Longitude'] #Add the site to the new dataframe
    satSSTs = pd.concat([satSSTs, newDF]) #append the new data to the existing data

In [None]:
satSSTs.head() #show the first part of the data (testing step)

In [None]:
satSSTs.tail() #another testing step showing the end of the data

In [None]:
##Creates a new dataframe with just the sites, dates, temperatures. and analysis errors
## for merging with the other datasets

#Make a new column for date that takes out the last portion of the 'time' column
dateCol = []
for i in range(len(satSSTs)):
  dateCol.append(satSSTs.iloc[i]['time'].split('T')[0])

  
satSSTs['Date'] = dateCol
satSSTs['MUR_Temp'] = satSSTs['analysed_sst']
satSSTs['MUR_Error'] = satSSTs['analysis_error']
satSSTs = satSSTs[['Site', 'Date', 'MUR_Temp', 'MUR_Error']]

In [None]:
satSSTs.head()

###5c.  Cruise CTD data (Endurance 08 & 09)

As another cross-validation approach, we will use SST data from CTD casts collected on two OOI cruises and compare those with the SST values recorded from the Surface Moorings and the MUR SST Data. During our time period of interest (May 1, 2017 to April 30, 2018), there were two OOI cruises that visited the Coastal Endurance array: the Endurance 08 and Endurance 09 cruises. We collected all of the CTD SST data available from both of those cruises for comparison.
<br>
The Cruise CTD casts were found on the Alfresco Explorer by navigating to the following directory in the Cruise Data archive: <br>
**Endurance Cruise 8**<br>
OOI > Coastal Endurance Array > Cruise Data > Endurance-08_SKQ201715S_2017-10 > Ship Data > ctd > proc
from Fall 2017
<br>
and 
<br>
**Endurance Cruise 9**<br>
OOI > Coastal Endurance Array > Cruise Data > Endurance-09_SKQ201808S_2018-03 > Ship Data > ctd > proc
from Spring 2018
<br>
<br>
The files we use are the averaged data for the shallowest reading from each file (e.g. ...ctd001avg.cnv).  

<font color='red'> Note to Data team: The files maintained from the Coastal Endurance Array differ from the Pioneer array in that there are no available ascii files, and therefore cnv files need to be used with the headers removed (Skip 314 Lines or 297 Lines).  Additionally, we were interested in further validating the SST temperature data from the Surface Moorings and the Satellite data against the cruise flowthrough system. This was a challenge because we found that the temperature flowthrough data, as well as several other datasets housed at NOAA - NCEI, were not mirrored on Alfresco.  This meant that to track down much of that information we had to go first to R2R and then find a link out to the bulk data maintained at NCEI. This was further hampered by the fact that Endurance 09 data was not yet available on the R2R or NCEI database, despite returning to port more than 2 months ago. While the final cruise report can take some time, it would be helpful to have this information shortly after the cruise conclusion. </font>

In [None]:
# CTD_URL_EXAMPLE
# URL_EXAMPLE = 'https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/cb97e6a2-234d-46aa-8911-3f96c1958db5/skq201715s_ctd001avg.cnv'

We first build an array including the Cruise #, CTD Cast#, Mooring Nearby, Date of Cast, and data_url.


In [None]:
# Endurance Cruise 8 and 9 CTD CASTS
cruiseURLs = [
  ['ENDURANCE08','CTD001','CE07SHSM','2017-10-03','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/cb97e6a2-234d-46aa-8911-3f96c1958db5/skq201715s_ctd001avg.cnv'],
  ['ENDURANCE08','CTD005','CE09OSSM','2017-10-07','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/0cdcb5cc-a642-4fbd-8186-efe69a66c7d4/skq201715s_ctd005avg.cnv'],
  ['ENDURANCE08','CTD009','CE04OSSM','2017-10-13','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/43fcb532-6a55-4e24-a6ac-840abe98808c/skq201715s_ctd009avg.cnv'],
  ['ENDURANCE08','CTD010','CE02SHSM','2017-10-14','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/191336a5-6deb-40e6-bdbc-dccda2c8a38d/skq201715s_ctd010avg.cnv'],
  ['ENDURANCE09','CTD001','CE09OSSM','2018-03-26','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/8f21596b-8179-432c-a588-bf1e7d5c813b/skq201808s_ctd001avg.cnv'],
  ['ENDURANCE09','CTD003','CE07SHSM','2018-03-29','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/75736d85-b15d-4c44-a65d-960f0e316562/skq201808s_ctd003avg.cnv'],
  ['ENDURANCE09','CTD006','CE02SHSM','2018-04-03','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/15f8af4f-c25c-4d54-8369-48f69abf4633/skq201808s_ctd006avg.cnv'],
  ['ENDURANCE09','CTD007','CE04OSSM','2018-04-04','https://alfresco.oceanobservatories.org/alfresco/d/d/workspace/SpacesStore/617e0ecf-d2e8-4713-8523-e4a690b19a70/skq201808s_ctd007avg.cnv']
]


In [None]:
for row in cruiseURLs:
  print(row[4])

In [None]:
# Return the shallowest temperature reading from each CTD CAST
cruiseDF = pd.DataFrame(columns=['Depth','Latitude','Longitude','Temperature','Salinity'])
for row in cruiseURLs:
  if row[0] == 'ENDURANCE08':
    cruise_data = pd.read_table(row[4], delim_whitespace=True, header=None, skiprows=314)
  else:
    cruise_data = pd.read_table(row[4], delim_whitespace=True, header=None, skiprows=297)
  cruise_data = cruise_data.rename( columns={ 0:'Depth', 1:'Latitude', 2:'Longitude', 6: 'Temperature', 7: 'Salinity'} )
  cruise_data = cruise_data[['Depth','Latitude','Longitude','Temperature','Salinity']]
  cruise_data['Cruise'] = row[0]
  cruise_data['CTD'] = row[1]
  cruise_data['Mooring'] = row[2]
  cruise_data['Date'] = row[3]
  cruiseDF = cruiseDF.append(cruise_data[0:1])
  #print(row[0])
cruiseDF

In [None]:
#Bar Graph of surface CTD Data from each Buoy Site between End 08 and End 09
 
# data to plot
n_groups = 4
Fall_2017 = (11.7, 14.2, 14.7, 14.9)
Spring_2018 = (10.2, 10.0, 9.1, 9.5)
 
# create plot
fig, ax = plt.subplots()
index = np.arange(n_groups)
bar_width = 0.35
opacity = 0.8
 
rects1 = plt.bar(index, Fall_2017, bar_width,
                 alpha=opacity,
                 color='b',
                 label='Endurance 08')
 
rects2 = plt.bar(index + bar_width, Spring_2018, bar_width,
                 alpha=opacity,
                 color='y',
                 label='Endurance 09')
 
plt.xlabel('Sites')
plt.ylabel('Temperature')
plt.title('CTD Surface Temps')
plt.xticks(index + bar_width, ('CE02SHSM', 'CE04OSSM', 'CE07SHSM', 'CE09OSSM'))
plt.legend()
 
plt.tight_layout()
plt.show()

###5c. Nearby buoys

For further cross-validation of the OOI surface mooring SST data and the satellite-derived data, we compared with SST data collected by nearby buoy stations maintained by other research groups. The two non-OOI buoys closes to the Coastal Endurance array are those at Station 46050 (maintained by the NDBC) and Station 46211 (maintained by Scripps Institute of Oceanography). The locations for these two non-OOI buoys are listed and mapped above (in section 5a.)

We downloaded the data for both of these buoys through the NOAA Coastwatch ERDAPP site. Examples URLs for CSV and viewable html versions of these data files are as follows:

Example CSV URL --
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.csv?longitude%2Clatitude%2Ctime%2Cwtmp&station=%2246050%22&time%3E=2017-05-01T00%3A00%3A00Z&time%3C=2018-05-01T00%3A00%3A00Z

Viewable html URL -- 
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.htmlTable?station%2Clongitude%2Clatitude%2Ctime%2Cwtmp&station=%2246050%22&time%3E=2017-05-01T00%3A00%3A00Z&time%3C=2018-05-01T00%3A00%3A00Z

In [None]:
# Dataframe with the Buoy Station Number and Locations (latitude and longitude) 

Station = ['46050', '46211',]
lats = ['44.677','46.858']
longs = ['-124.515', '-124.244']

StationLocs = pd.DataFrame(data={'Station':Station, 'Latitude':lats, 'Longitude':longs})
StationLocs

In [None]:

#Assemble the URLs for nearby Buoys from NDBC - NOAA ERDDAP

baseURL = 'https://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.csv?longitude%2Clatitude%2Ctime%2Cwtmp&'

beginDate = '2017-05-01T00%3A00%3A00Z'
endDate = '2018-05-01T00%3A00%3A00Z'

allURLs = [] #creates empty vector to fill in with each URL
for i in range(len(StationLocs)): #iterates through the row numbers of the 'StationLocs' dataframe
  stationName = StationLocs['Station'][i]
  stationURL = 'station=%22' + stationName + '%22'
  dateRangeURL = 'time%3E=' + beginDate + '&time%3C=' + endDate 
  #assembles the date section of the URL
  newURL = baseURL + stationURL + '&' + dateRangeURL #assembles the complete URL
      
  allURLs.append(newURL) #appends each URL into the empty dataframe
  print (newURL)
StationLocs['URL'] = allURLs #adds the URL vector to the 'siteLoc' dataframe


In [None]:
# Plotting out the Year of Data for the two Additional Buoy Stations
St46050 = pd.read_csv('https://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.csv?longitude%2Clatitude%2Ctime%2Cwtmp&station=%2246050%22&time%3E=2017-05-01T00%3A00%3A00Z&time%3C=2018-05-01T00%3A00%3A00Z', skiprows=1, parse_dates=['UTC'])

plt.plot(St46050['UTC'],St46050['degree_C'], label='Temp over time')
plt.xlabel('Time')
plt.ylabel('Temp C')
plt.title('Station 46050\nOne Year')
plt.legend()
plt.show()

St46211 = pd.read_csv('https://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.csv?longitude%2Clatitude%2Ctime%2Cwtmp&station=%2246211%22&time%3E=2017-05-01T00%3A00%3A00Z&time%3C=2018-05-01T00%3A00%3A00Z', skiprows=1, parse_dates=['UTC'])
plt.plot(St46211['UTC'],St46211['degree_C'], label='Temp over time')
plt.xlabel('Time')
plt.ylabel('Temp C')
plt.title('Station 46211\nOne Year')
plt.legend()
plt.show()

In [None]:
# Converting the UTC to a datetime format
St46050.index = St46050['UTC']
St46211.index = St46211['UTC']

In [None]:
# Buoy downsampling to daily average

DS46050 = St46050.resample('1D').mean()
DS46050std = St46050.resample('1D').std()
DS46211 = St46211.resample('1D').mean()
DS46211std = St46211.resample('1D').std()

plt.plot(DS46050.index,DS46050['degree_C'], label='Temp over time')
plt.xlabel('Time')
plt.ylabel('Temp C')
plt.title('Station 46050\nMean per day')
plt.legend()
plt.show()

plt.plot(DS46211.index,DS46211['degree_C'], label='Temp over time')
plt.xlabel('Time')
plt.ylabel('Temp C')
plt.title('Station 46211\nMean per day')
plt.legend()
plt.show()

plt.plot(DS46050std.index,DS46050std['degree_C'], label='Temp over time')
plt.xlabel('Time')
plt.ylabel('Temp C')
plt.title('Station 46050\nSTD per day')
plt.legend()
plt.show()

plt.plot(DS46211std.index,DS46211std['degree_C'], label='Temp over time')
plt.xlabel('Time')
plt.ylabel('Temp C')
plt.title('Station 46211\nSTD per day')
plt.legend()
plt.show()

## 6. Cross-validation (internal and external)
To compare the OOI surface mooring SST data with the three other data sources (1. Cruise CTD data, 2. Non-OOI buoys, and 3. Satellite derived data), we ran two different sets of comparions: a discrete comparison (focusing on specific days of overlap for the locations of the four OOI surface moorings), and a more continuous time-series comparison (comparing patterns in time). The discrete analysis was necessary to compare with the cruise CTD data, since those were only measured on specific days during the cruises. In the discrete analysis we include SST data comparisons between the surface moorings, cruise CTD data, and satellite derived data (i.e. the thrtee data sources that have discrete data for the locations of the four moorings). For the time series comparison, however, we include the three data sources with available time series data: surface mooring data , non-OOI buoy data, and satellite-derived data.

###6a. Comparing OOI surface moorings with OOI cruise data and nearby buoys (discrete)

Here, we cross-validate the SST data from the METBKA instruments on the four OOI surface moorings both internally (comparing with SST data measured during the OOI Endeavour 08 and Endeavour 09 cruises) and externally (comparing with satellite-derived data). The SST data from the two cruises is from CTD casts that were done in locations near each of the surface moorings (as described above in 5c.). The satellite-derived data was from the JPL MUR blended SST product at the same locations (as described above in 5b.). This validation step allows for comparisons of discrete sampling high-resolution values compared to long-term continuous data collected in close proximity.

####Combining the discrete data sources
First we will need to extract the data only for the days of the CTD casts from each of the dataframes we created, and then combine them into a single dataset for comparison.

Let's revisit the cruise CTD data we extracted earlier (5c.) and drop any of the columns we don't need: 

In [None]:
#Check the cruise CTD data
cruiseDF



We only need the columns for 'Cruise', 'Date', 'Mooring', and 'Temperature' for our comparisons so will drop the others

In [None]:
cruiseDF = cruiseDF.drop(columns=['CTD', 'Depth', 'Latitude', 'Longitude', 'Salinity'])
cruiseDF

We also want to rename the 'Mooring' column to 'Site' (to be consistent with other data sources) and rename 'Temperature' to 'CTD_Temp' since there will be temperatures in the other data sets as well

In [None]:
cruiseDF.rename(columns={'Mooring':'Site', 'Temperature':'CTD_Temp'}, inplace=True)
cruiseDF

Next we will merge the CTD data with the satellite data (JPL MUR).

In [None]:
#Merge MUR and SM Data with Date, and Site
cruiseSatDF = pd.merge(cruiseDF, satSSTs, on=['Date','Site'], how='inner')
cruiseSatDF

This resulted in some repetition of rows (8 to 11 rows), probably due to the overlap in deployments. Before fixing this, we need to make sure the three numeric rows are of the same data types.


In [None]:
print(type(cruiseSatDF['CTD_Temp'][0]))
print(type(cruiseSatDF['MUR_Temp'][0]))
print(type(cruiseSatDF['MUR_Error'][0]))


The first numeric column (CTD_Temp) is a float but the second and third (MUR_Temp and MUR_Error) must have been imported as strings when they were downloaded from ERDAPP. Any analytical comparison of these two columns will fail because it is recognizing the second and third columns as text rather than numbers, so they needs to be converted to a numeric:

In [None]:
#Need to convert the MUR_Temp column to a numeric (it downloads from ERDAPP as a string)
cruiseSatDF['MUR_Temp'] = pd.to_numeric(cruiseSatDF['MUR_Temp'])
cruiseSatDF['MUR_Error'] = pd.to_numeric(cruiseSatDF['MUR_Error'])
print(type(cruiseSatDF['CTD_Temp'][0]))
print(type(cruiseSatDF['MUR_Temp'][0]))
print(type(cruiseSatDF['MUR_Error'][0]))


Now that all three of these columns are floats (i.e. numeric), we can fix the problem with the extra rows by averaging:

In [None]:
#Averaging the duplicated rows
bygroup_cruiseSat = cruiseSatDF.groupby(['Cruise', 'Date', 'Site'], as_index = False)
cruiseSatDF = pd.DataFrame(bygroup_cruiseSat.aggregate(np.mean))
cruiseSatDF


Finally, we need to add the data from the OOI surface moorings

To do this, we will first merge the "cruiseSatDF" dataframe with the dataframe created with the OOI data (getting only the days that overlap), and then clean it up so that we only get the sites of interest.

We will also need to make sure that the dates are of the same type in both datasets before merging

In [None]:
#Reset the index for the OOI surface mooring data and save as 'smDF'
smDF = df_tot.reset_index()

#Rename the 'time' column to 'Date'
smDF.rename(columns={'time':'Date'}, inplace=True)

#Rename the final column so it is consistent with the others
smDF.rename(columns={'CE09_sst_Sd':'CE09_sst_sd'}, inplace=True)

smDF.head() #for testing

In [None]:
#Check the data types for the 'Date' columns in both the surface mooring data and the combined cruise and satellite data
print(type(smDF['Date'][0]))
print(type(cruiseSatDF['Date'][0]))



The 'Date' column in the OOI surface mooring data is a time stamp, whereas the same column in the combined dataframe is a string (i.e. text), so we will need to convert one to match the other.

In [None]:
cruiseSatDF['Date'] = pd.to_datetime(cruiseSatDF['Date'])
print(type(cruiseSatDF['Date'][0]))
cruiseSatDF.head()

The 'Date' column in the previously combined cruise and satellite dataframe is now a time data type, which is the same as the OOI SM dataframe.

Now that the two columns are the same, we can merge them using that column:

In [None]:
cruiseSatSmDF = pd.merge(cruiseSatDF, smDF, on=['Date'], how='inner')
cruiseSatSmDF

Since each row of the previously combined dataframe only refers to a single surface mooring (i.e. 'Site'), though, all of there should only be one column of data from the surface moorings per row. To fix this, we will write a loop that grabs only the SST and standard deviation from the correct surface mooring columns and assign those to two new columns called 'SM_Temp' and 'SM_Sd'.

In [None]:
#Assign arbitrary columns to the new columns to retain formatting
cruiseSatSmDF['SM_Temp'] = cruiseSatSmDF['CE02_sst'] 
cruiseSatSmDF['SM_Sd'] = cruiseSatSmDF['CE02_sst_sd']
cruiseSatSmDF # for testing





In [None]:
#Iterate through the dataframe only assigning the appropriate column to the row
for i in range(len(cruiseSatSmDF)):
  siteNum = cruiseSatSmDF['Site'][i][:4]
  sstColname = siteNum + '_sst'
  sdColname = siteNum + '_sst_sd'
  cruiseSatSmDF['SM_Temp'][i] = cruiseSatSmDF[sstColname][i]
  cruiseSatSmDF['SM_Sd'][i] = cruiseSatSmDF[sdColname][i]
  
cruiseSatSmDF #for testing

In [None]:
#Drop the extra columns now 
dropNames = [ colName for colName in list(cruiseSatSmDF) if colName[:2]=='CE' ] #Get all column names starting with 'CE'

cruiseSatSmDF = cruiseSatSmDF.drop(columns=dropNames) #drop those column names
cruiseSatSmDF

Now let's plot the three sources of data for each surface mooring as bar plots for comparison (SM vs. CTD vs. MUR):


####Plotting discrete SST comparisons

In [None]:
uniqueSites = sorted(cruiseSatSmDF.Site.unique().tolist()) #get a sorted list of all of the unique sites
uniqueSites

In [None]:
for i in range(len(uniqueSites)):
  ##Get the data for a single site
  siteName = uniqueSites[i]
  siteDF = cruiseSatSmDF.loc[cruiseSatSmDF['Site'] == siteName]
  
  ##Plot using that data
  # Setting the positions and width for the bars
  pos = list(range(len(siteDF['Cruise']))) 
  width = 0.25 
  
  # Plotting the bars
  fig, ax = plt.subplots(figsize=(10,5))
  
  # Create a bar with SM_Temp data,
  # in position pos,
  plt.bar(pos, 
          #using siteDF['SM_Temp'] data,
          siteDF['SM_Temp'], 
          # of width
          width, 
          # with alpha 0.5
          alpha=0.5, 
          # with yellow color
          color='y', 
          # with label
          label='OOI Surface Mooring',
          # and an error bar from siteDF['SM_sd']
          yerr = siteDF['SM_Sd']) 
  
  # Create a bar with CTD_Temp data,
  # in position pos + some width buffer,
  plt.bar([p + width for p in pos], 
          #using siteDF['CTD_Temp'] data,
          siteDF['CTD_Temp'],
          # of width
          width, 
          # with alpha 0.5
          alpha=0.5, 
          # with blue color
          color='b', 
          # with label
          label='Cruise CTD') 
  # Create a bar with MUR_Temp data,
  # in position pos + some width buffer,
  plt.bar([p + width*2 for p in pos], 
          #using siteDF['MUR_Temp'] data,
          siteDF['MUR_Temp'], 
          # of width
          width, 
          # with alpha 0.5
          alpha=0.5, 
          # with color
          color='r', 
          # with label
          label='Satellite-derived',
          # and an error bar from siteDF['MUR_Error']
          yerr = siteDF['MUR_Error']) 
  
  # Set the y axis label
  ax.set_ylabel('Sea Surface Temperature ($\degree C$)')
  
  # Set the chart's title
  ax.set_title('Comparison of SST from OOI Surface Moorings, OOI Cruises and Satellites\n' + siteName)
  
  # Set the position of the x ticks
  ax.set_xticks([p + 1 * width for p in pos])
  
  # Set the labels for the x ticks
  ax.set_xticklabels(siteDF['Cruise'])
  
  # Setting the x-axis and y-axis limits
  plt.xlim(min(pos)-width, max(pos)+width*4)
  plt.ylim([0, 16] )
  
  # Adding the legend and showing the plot
  plt.legend()
  plt.grid()
  plt.show()
  


From these plots, the SST from the METBKA packages on each mooring (i.e. yellow bars) seem quite consistent with the cruise CTD data (blue bars) an the satellite-derived JPL MUR product (red bars). However, the scales of the bars make it difficult to compare. To examine these patterns closer, it might help to look at the differences between the data sources more closely:

####Plotting differences between surface moorings and other sources
To take a closer look at the differences between the surface moorings and the other sources of discrete data (cruise CTD readings and satellite-derived data), we will also plot those differences.

We will need to first calculate those differences:

In [None]:
cruiseSatSmDF['SM-CTD'] = cruiseSatSmDF['SM_Temp'] - cruiseSatSmDF['CTD_Temp']
cruiseSatSmDF['SM-MUR'] = cruiseSatSmDF['SM_Temp'] - cruiseSatSmDF['MUR_Temp']
cruiseSatSmDF


Now we will plot those differences with the error bars from the surface moorings:

In [None]:
for i in range(len(uniqueSites)):
  ##Get the data for a single site
  siteName = uniqueSites[i]
  siteDF = cruiseSatSmDF.loc[cruiseSatSmDF['Site'] == siteName]
  
  ##Plot using that data
  # Setting the positions and width for the bars
  pos = list(range(len(siteDF['Cruise']))) 
  width = 0.35 
  
  # Plotting the bars
  fig, ax = plt.subplots(figsize=(10,7))
  
  # Create a bar with SM-CTD data,
  # in position pos,
  plt.bar(pos, 
          #using siteDF['SM-CTD'] data,
          siteDF['SM-CTD'], 
          # of width
          width, 
          # with alpha 0.5
          alpha=0.5, 
          # with blue color
          color='b', 
          # with label
          label='OOI Surface Mooring - Cruise CTD',
          # and an error bar from siteDF['SM_sd']
          yerr = siteDF['SM_Sd']) 
  
  # Create a bar with SM-MUR data,
  # in position pos + some width buffer,
  plt.bar([p + width for p in pos], 
          #using siteDF['SM-MUR'] data,
          siteDF['SM-MUR'],
          # of width
          width, 
          # with alpha 0.5
          alpha=0.5, 
          # with red color
          color='r', 
          # with label
          label='OOI Surface Mooring - Satellite-derived',
          # and an error bar from siteDF['SM_sd']
          yerr = siteDF['SM_Sd']) 
  
  # Set the y axis label
  ax.set_ylabel('$\Delta$ Sea Surface Temperature ($\degree C$)')
  
  # Set the chart's title
  ax.set_title('Comparison of $\Delta$SST between OOI Surface Moorings, OOI Cruises and Satellites\n' + siteName)
  
  # Set the position of the x ticks
  ax.set_xticks([p + 0.6 * width for p in pos])
  
  # Set the labels for the x ticks
  ax.set_xticklabels(siteDF['Cruise'])
  
  #Make a horizontal line at y=0
  ax.axhline(0, color='k', linewidth=1)
  
  # Setting the x-axis and y-axis limits
  plt.xlim(min(pos)-width, max(pos)+width*4)
  plt.ylim([-0.5, 1] )
  
  # Adding the legend and showing the plot
  plt.legend(loc = 'lower right')
  plt.grid()
  plt.show()

These difference plots allow us to look much more closely at the data and suggests that the sea surface temperatures from the METBKA instruments are less consistent with the cruise CTD and satellite-derived data than suggested by the previous plots. 

An internal cross-validation comparing the METBKA SST instruments at each OOI Surface Mooring with CTD casts during OOI cruises at the same locations is shown as the blue bars. An external cross-validation comparing the METBKA SST instruments at each OOI Surface Mooring with satellite-derived JPL MUR data at the same locations is shown as the red bars. In all cases, the error bars that are shown (i.e. black vertical lines in the middle of the bars) are the standard deviations calculated when we averaged the METBKA data by day. 

Only three of the bars (out of 16) have error bars that intersect with the horizontal line of perfect consistency between data sources, indicating that most of these comparisons showed that there were larger differences between the sources than the variation of readings throughout the day from the METBKA instruments. 

It is also interesting to note that most of the time the METBKA SST readings are higher than the other sources, with 10 of the 18 comparisons resulting in a positive diffences (i.e. bars above the horizontal line at 0). 

Although these plots seem to indicate less consistency among the data sources, the scale of these differences should also be taken into account. In all cases, the differences between the data sources was less than 1ºC. Whether this amount of potential variability is important will obviously depend on how someone wants to use the data. 

We can also take this one step further in examining these patterns by plotting the mean and variance of the differences among the four moorings: 

In [None]:
### Data to plot
##Separate the data for the Endurance 08 and 09 cruises
end08df = cruiseSatSmDF.loc[cruiseSatSmDF['Cruise'] == 'ENDURANCE08']
end09df = cruiseSatSmDF.loc[cruiseSatSmDF['Cruise'] == 'ENDURANCE09']
smMinCtd_means = [np.mean(end08df['SM-CTD']), np.mean(end09df['SM-CTD'])]
smMinCtd_sds = [np.std(end08df['SM-CTD']), np.std(end09df['SM-CTD'])]
smMinSat_means = [np.mean(end08df['SM-MUR']), np.mean(end09df['SM-MUR'])]
smMinSat_sds = [np.std(end08df['SM-MUR']), np.std(end09df['SM-MUR'])]

In [None]:
##Plot using all the data
# Setting the positions and width for the bars
pos = [0,1] 
width = 0.35 
# Plotting the bars
fig, ax = plt.subplots(figsize=(10,7))

# Create a bar with SM-CTD data,
# in position pos,
plt.bar(pos, 
        #using the mean of 'SM-CTD' for each of the cruises,
        smMinCtd_means, 
        # of width
        width, 
        # with alpha 0.5
        alpha=0.5, 
        # with blue color
        color='b', 
        # with label
        label='OOI Surface Mooring - Cruise CTD',
        # and an error bar of the standard deviation
        yerr = smMinCtd_sds) 

# Create a bar with SM-MUR data,
# in position pos + some width buffer,
plt.bar([p + width for p in pos], 
        #using the mean of 'SM-MUR' for each of the cruises,
        smMinSat_means,
        # of width
        width, 
        # with alpha 0.5
        alpha=0.5,
        # with red color
        color='r',
        # with label
        label='OOI Surface Mooring - Satellite-derived',
        # and an error bar of the standard deviance
        yerr = smMinSat_sds) 

# Set the y axis label
ax.set_ylabel('$\Delta$ Sea Surface Temperature ($\degree C$)')

# Set the chart's title
ax.set_title('Comparison of $\Delta$SST between OOI Surface Moorings, OOI Cruises and Satellites\nAll Sites')

# Set the position of the x ticks
ax.set_xticks([p + 0.6 * width for p in pos])

# Set the labels for the x ticks
ax.set_xticklabels(siteDF['Cruise'])

#Make a horizontal line at y=0
ax.axhline(0, color='k', linewidth=1)

# Setting the x-axis and y-axis limits
plt.xlim(min(pos)-width, max(pos)+width*4)
plt.ylim([-0.5, 1] )

# Adding the legend and showing the plot
plt.legend(loc = 'upper right')
plt.grid()
plt.show()

The above plot shows the mean (bars) and standard deviation (black vertical error lines) of differences in SST between the four OOI surface moorings and the cruise CTDs (blue bars) and between the OOI surface moorings and satellite data. This indicates that the METBKA instruments on the surface moorings were reading slightly higher temperatures than what the cruise CTDs measured in the first cruise (with no consistent difference during the second cruise). The METBKA instruments were also reading slightly higher temperatures than the satellite-derived JPL MUR data, but only during the second cruise, with no consistent difference during the first cruise. As above, none of the differences were greater than 1ºC and the METBKA instruments either had consistent temperatures or were slightly higher than the other sources.

In comparing the two OOI sources of data (surface moorings vs. cruise CTDs), it should be noted that they are collecting temperatures at slightly different depths. The METBKA instruments on the surface moorings are collecting temperatures in the top of the water column (<1m). Although we extracted the most shallow CTD readings from each cast, these were between ~2m and ~4m (see section 5c.). As temperature tends to decrease with depth, this could explain why the surface moorings could read slightly higher than the CTD casts. One reason why the results might differ between cruises (i.e. the Endurance 09 cruise being more consistent between the CTD casts and surface moorings than the Endurance 08 cruise) could be the difference in absolute temperature between the cruises. As shown by the temperatures from all three data sources in the first plots in this section, water temperatures were generally warmer during the first cruise (Endurance 08) than the second (Endurance 09). This could have also meant there was a steeper gradient between the temperatures at the sea surface collected by the METBKA instrument (<1m) and those measured by the CTD casts in slightly deeper water (~2-4m), which could explain the pattern shown above. However, these results would need to be further explored to make any more definitive conclusion.

### 6b. Comparing OOI surface moorings with satellite-derived data and nearby buoys (time series)


####Combining the time series data
Before we compare the SST time series from the OOI surface moorings with the other two sources of time series data (satellite-derived and nearby buoys), we want to combine all of that data into a single dataframe.

Since each of the data sources was extracted slightly differently, we will look at each one separately first, then try to match their formats before merging them together.

First, we will check the SST time series from the OOI Surface Moorings that we downloaded earlier (in step 4b.):

In [None]:
#Check the surface mooring dataframe:
print(smDF.shape)
smDF.head()

Since the above dataset is what we are trying to compare against, we will use that as the standard format and change the other datasets to match. This dataframe seems to be the right size (364 days).

We will next check the SST time series from the satellite-derived JPL MUR product we downloaded in step 5b.:

In [None]:
#Check the satellite dataframe:
satSSTs.head()

This satellite-derived SST data is in a long format, whereas the surface mooring data is in a wide format. Since we are using the latter as our default, we will need to reformat the satellite data to a wide format as well, turning each site into its each row:

In [None]:
##Run a for loop to iterate through each of the Site names and assign new columns to each.

#get a list of all the unique site names
uniqueSiteNames = sorted(satSSTs.Site.unique().tolist()) #get a sorted list of all of the unique sites

#get all the unique Dates and convert to a pandas date time format
uniqueDates = sorted(satSSTs.Date.unique().tolist()) #get a sorted list of all of the unique sites

#create a dataframe with only the unique dates
wideSatSSTs = pd.DataFrame({'Date':uniqueDates})

##Iterate through the site names, add columns, and merge with the empty dataframe
for siteName in uniqueSiteNames:
  sstColName = "MUR_Temp_" + siteName #assign a new column name for the SST data
  errColName = "MUR_Error_" + siteName #assign a new column name for the SD data
  
  #extract the data for the specific siteName
  siteDF = satSSTs.loc[satSSTs['Site'] == siteName]
  
  #rename the temperature and error columns with the site specific names
  siteDF = siteDF.rename(index=str, columns={"MUR_Temp": sstColName, "MUR_Error": errColName})
  
  #drop the 'Site' column
  siteDF = siteDF.drop(columns = ['Site'])
  
  #merge with the wideSatSSTs dataframe
  wideSatSSTs = pd.merge(wideSatSSTs, siteDF, how = 'outer', on = 'Date')
  

print(wideSatSSTs.shape)
wideSatSSTs.head()  

The dataframe is now wide but it looks like there are too many rows (389) so we will resample by date.

In [None]:
#Need to convert the date to a 'date time' format
wideSatSSTs['Date'] = pd.to_datetime(wideSatSSTs['Date'])
print(type(wideSatSSTs['Date'][0]))

In [None]:
#Set the index to 'Date'
wideSatSSTs = wideSatSSTs.set_index('Date')

wideSatSSTs.head()

In [None]:
#Check all of the data types for the columns 
wideSatSSTs.dtypes

In [None]:
#Since none of the data is numeric we need to convert all of the columns to numeric
wideSatSSTs[wideSatSSTs.columns] = wideSatSSTs[wideSatSSTs.columns].apply(pd.to_numeric, errors='coerce')
wideSatSSTs.dtypes

In [None]:
#Resample by day
satSSTsDaily = wideSatSSTs.resample('1D').mean()
satSSTsDaily.head()


In [None]:
#Finally, remove the indexing from Date:
satSSTsDaily = satSSTsDaily.reset_index()
print(satSSTsDaily.shape)
satSSTsDaily.head()

Now the dataframe seems to have the right number of days and is in wide format.

Let's move on to check the third dataset - the nearby buoys:

In [None]:
#Check the nearby bouys dataframes:
St46050.head()
St46211.head()

In [None]:
###Downsample the two buoy datasets, and extract only the necessary data (time & temperature)
##Oregon Buoy (Station 46050)

#drop the 'degrees_east' and 'degrees_north' columns
oregonBuoy = St46050.drop(columns = ['degrees_east', 'degrees_north'])

#resample to get the mean and standard deviations
oregonBuoySST = oregonBuoy.resample('1D').mean()
oregonBuoySD = oregonBuoy.resample('1D').std()

#get rid of the indexes in each and rename the columns appropriately
oregonBuoySST = oregonBuoySST.reset_index()
oregonBuoySST = oregonBuoySST.rename(index=str, columns={"UTC": "Date", "degree_C": "OregonBuoySST"})

oregonBuoySD = oregonBuoySD.reset_index()
oregonBuoySD = oregonBuoySD.rename(index=str, columns={"UTC": "Date2", "degree_C": "OregonBuoySD"})

#concatenate the two into one dataset and get rid of the extra 'Date' column
oregonBuoyDaily = pd.concat([oregonBuoySST, oregonBuoySD], axis = 1)
oregonBuoyDaily = oregonBuoyDaily.drop(columns = ['Date2'])

##Do the same for the Washington Buoy (Station 46211)
#drop the 'degrees_east' and 'degrees_north' columns
washingtonBuoy = St46211.drop(columns = ['degrees_east', 'degrees_north'])

#resample to get the mean and standard deviations
washingtonBuoySST = washingtonBuoy.resample('1D').mean()
washingtonBuoySD = washingtonBuoy.resample('1D').std()

#get rid of the indexes in each and rename the columns appropriately
washingtonBuoySST = washingtonBuoySST.reset_index()
washingtonBuoySST = washingtonBuoySST.rename(index=str, columns={"UTC": "Date", "degree_C": "WashingtonBuoySST"})

washingtonBuoySD = washingtonBuoySD.reset_index()
washingtonBuoySD = washingtonBuoySD.rename(index=str, columns={"UTC": "Date2", "degree_C": "WashingtonBuoySD"})

#concatenate the two into one dataset and get rid of the extra 'Date' column
washingtonBuoyDaily = pd.concat([washingtonBuoySST, washingtonBuoySD], axis = 1)
washingtonBuoyDaily = washingtonBuoyDaily.drop(columns = ['Date2'])

washingtonBuoyDaily.head()

In [None]:
###Merge the two buoy datasets into one:
nearbyBuoys = pd.merge(oregonBuoyDaily, washingtonBuoyDaily, how = 'outer', on = 'Date')
print(nearbyBuoys.shape)
nearbyBuoys.head()

This data looks good and also seems to have about the right number of days.

Now that we have checked each of the datasets, we can attempt to merge the three by date:

In [None]:
#Check that the "Date" is the same format for all three datasets:
print(type(nearbyBuoys['Date'][0]))
print(type(satSSTsDaily['Date'][0]))
print(type(smDF['Date'][0]))

In [None]:
#Merge the surface mooring data with the satellite data
smSatTS = pd.merge(smDF, satSSTsDaily, how = 'outer', on = 'Date')
print(smSatTS.shape)
smSatTS.head()

In [None]:
#Merge the above combined data with the nearby buoy data
smSatBuoyTS = pd.merge(smSatTS, nearbyBuoys, how = 'outer', on = 'Date')
print(smSatBuoyTS.shape)
smSatBuoyTS.head()

We now have all the time series data in one dataframe and can start comparing them with graphs.

####Plotting the SST time series (surface moorings and buoys vs. satellite data)
To compare the SST time series from the OOI Surface Moorings with the satellite-derived estimates from the same locations, we will plot each of the moorings with the satellite data separately. We will then plot the difference between each of those four surface moorings the satellite data. We will then do the same for the nearby buoys against the satellite data, to get a sense of whether the difference between OOI surface moorings and satellite data is comparable to the difference between other buoys and satellite data.

First we will plot the surface mooring SSTs against the satellite data using a for loop

In [None]:
smNames = ['CE02', 'CE04', 'CE07', 'CE09']

for smName in smNames:
  #assign the column names for the surface mooring data
  smMeanColname = smName + '_sst'
  smSdColname = smName + '_sst_sd'
  
  #assign appropriate ending to the satellite column name
  if smName in ['CE02', 'CE07']:
    satEnd = 'SHSM'
  else:
    satEnd = 'OSSM'
  
  #assign the satellite data column name
  satMeanColname = 'MUR_Temp_' + smName + satEnd
  satErrColname = 'MUR_Error_' + smName + satEnd
  
  #plot a figure showing the SST's from moorings and satellites  
  plt.figure(figsize=(12, 7))
  
  plt.plot(smSatBuoyTS['Date'], smSatBuoyTS[smMeanColname], label = smName + satEnd)
  plt.plot(smSatBuoyTS['Date'], smSatBuoyTS[satMeanColname], label = 'JPL MUR')
 
  plt.title('OOI Surface Mooring (' + smName + satEnd + ') vs. Satellite Data (JPL MUR)')
  plt.xlabel('Date')
  plt.ylabel('SST (ºC)')
  plt.legend()
  
  #get the difference between the two SST's
  deltaData = smSatBuoyTS[smMeanColname] - smSatBuoyTS[satMeanColname]
  
  #plot the differences
  plt.figure(figsize=(12, 7))
  
  plt.plot(smSatBuoyTS['Date'], deltaData, label = 'Surface mooring - Satellite data')
  plt.axhline(y = 0, linewidth=1, color = 'black')
  plt.axhline(y = 1, linewidth=4, color='r')
  plt.axhline(y = -1, linewidth=4, color='r')
  
  plt.title('OOI Surface Mooring (' + smName + satEnd + ') - Satellite Data (JPL MUR)')
  plt.xlabel('Date')
  plt.ylabel('$\Delta$ SST (ºC)')
  plt.legend()

On most dates, the OOI Surface Mooring SSTs (blue lines in the SST plots) are fairly consistent with the satellite-derived data (green lines in the SST plots). The difference plots show this with even more clarity, indicating that the two sources of data are usually within 1ºC of each other (i.e. the blue lines in the difference plots are within the horizontal red lines). It is interesting to note that all of the larger differences between the two sources seem to occur either in the summer or early fall, and that the those differences are consistently in one direction during those times (i.e. the satellite data is predicting higher temperatures than the surface mooring readings). There were far fewer days when the OOI Surface Moorings measured much higher temperatures (i.e. >1ºC) than estimated by the satellite data, with the exception of some spring days for the shallow Oregon OOI Surface Mooring (CE07SHSM).

This information is particularly helpful for any biologists that are using satellite-derived SSTs as a proxy for in-situ readings where there are no instruments in the water.

We will next look at the same SST and difference plots for the nearby buoys vs. satellite data:

In [None]:
##Oregon Buoy

#SST plot
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['OregonBuoySST'], label = 'Oregon NDBC buoy') #Oregon buoy
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['MUR_Temp_NDBC_46050'], label = 'JPL MUR') #Satellite data for the same location
plt.title('Nearby Oregon Buoy (NDBC Station 46050) vs. Satellite Data (JPL MUR)')
plt.xlabel('Date')
plt.ylabel('SST (ºC)')
plt.legend();

#Difference plot
deltaData = smSatBuoyTS['OregonBuoySST'] - smSatBuoyTS['MUR_Temp_NDBC_46050']
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], deltaData, label = 'Oregon buoy - Satellite data')
plt.axhline(y = 0, linewidth=1, color = 'black')
plt.axhline(y = 1, linewidth=4, color='r')
plt.axhline(y = -1, linewidth=4, color='r')
plt.title('Nearby Oregon Buoy (NDBC Station 4605) - Satellite Data (JPL MUR)')
plt.xlabel('Date')
plt.ylabel('$\Delta$ SST (ºC)')
plt.legend();


##Washington Buoy

#SST plot
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['WashingtonBuoySST'], label = 'Washington SCRIPPS buoy') #Oregon buoy
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['MUR_Temp_SCRIPPS_46211'], label = 'JPL MUR') #Satellite data for the same location
plt.title('Nearby Washington Buoy (SCRIPPS Station 46211) vs. Satellite Data (JPL MUR)')
plt.xlabel('Date')
plt.ylabel('SST (ºC)')
plt.legend();

#Difference plot
deltaData = smSatBuoyTS['WashingtonBuoySST'] - smSatBuoyTS['MUR_Temp_SCRIPPS_46211']
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], deltaData, label = 'Washington buoy - Satellite data')
plt.axhline(y = 0, linewidth=1, color = 'black')
plt.axhline(y = 1, linewidth=4, color='r')
plt.axhline(y = -1, linewidth=4, color='r')
plt.title('Nearby Washington Buoy (SCRIPPS Station 46211) - Satellite Data (JPL MUR)')
plt.xlabel('Date')
plt.ylabel('$\Delta$ SST (ºC)')
plt.legend();

There appears to be similar patterns in the comparison between the non-OOI buoys and the satellite data as there was between the OOI Surface Moorings and the satellite data. On most days, the temperatures measured by the buoys were within 1ºC of the satellite estimates. Also consistent with the OOI Surface Moorings, the differences tended to be greater in the summer/fall (when satellite data tends to estimate higher SSTs) and, although less often, sometimes also in the spring (with satellite data tending to estimate lower SSTs). 

In general, though, the nearby non-OOI buoys are less consistent with the satellite data (with differences of up to ~5ºC) than the the OOI surface mooring readings (with differences of ~3ºC or less).

####Plotting the differences between OOI Surface Moorings and other nearby buoys
As another external cross-validation, we compared the SSTs from the OOI Surface Moorings with other nearby buoys more directly. Although the nearby buoys weren't in the same exact locations as the OOI Surface Moorings, they were close to two of those moorings. Therefore, we plotted those two nearby pairs here: 1. 2. the CE02SHSM -NDBC_46050 pair off the Oregon shelf, and 2. the CE07SHSM - SCRIPPS_46211 pair off the Washington shelf.

In [None]:
##Oregon Buoys (CE02SHSM and NDBC_46050)

#SST plot
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['CE02_sst'], label = 'CE02SHSM') #OOI Oregon shallow mooring
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['OregonBuoySST'], label = 'Oregon NDBC buoy') #Oregon buoy
plt.title('Oregon shallow OOI surface mooring (CE02SHSM) vs. Nearby Oregon Buoy (NDBC Station 46050)')
plt.xlabel('Date')
plt.ylabel('SST (ºC)')
plt.legend();

#Difference plot
deltaData = smSatBuoyTS['CE02_sst'] - smSatBuoyTS['OregonBuoySST']
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], deltaData, label = 'Oregon shallow surface mooring - Nearby buoy')
plt.axhline(y = 0, linewidth=1, color = 'black')
plt.axhline(y = 1, linewidth=4, color='r')
plt.axhline(y = -1, linewidth=4, color='r')
plt.title('Oregon shallow OOI surface mooring (CE02SHSM) - Nearby Oregon Buoy (NDBC Station 46050)')
plt.xlabel('Date')
plt.ylabel('$\Delta$ SST (ºC)')
plt.legend();


##Washington Buoys (CE07SHSM and SCRIPPS_46211)

#SST plot
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['CE07_sst'], label = 'CE07SHSM') #OOI Oregon shallow mooring
plt.plot(smSatBuoyTS['Date'], smSatBuoyTS['WashingtonBuoySST'], label = 'Washington SCRIPPS buoy') #Oregon buoy
plt.title('Washington shallow OOI surface mooring (CE07SHSM) vs. Nearby Washington Buoy (SCRIPPS Station 46211)')
plt.xlabel('Date')
plt.ylabel('SST (ºC)')
plt.legend();

#Difference plot
deltaData = smSatBuoyTS['CE07_sst'] - smSatBuoyTS['WashingtonBuoySST']
plt.figure(figsize=(12, 7))
plt.plot(smSatBuoyTS['Date'], deltaData, label = 'Washington shallow surface mooring - Nearby buoy')
plt.axhline(y = 0, linewidth=1, color = 'black')
plt.axhline(y = 1, linewidth=4, color='r')
plt.axhline(y = -1, linewidth=4, color='r')
plt.title('Washington shallow OOI surface mooring (CE07SHSM) - Nearby Washington Buoy (SCRIPPS Station 46211)')
plt.xlabel('Date')
plt.ylabel('$\Delta$ SST (ºC)')
plt.legend();

The SSTs measured by the two Oregon instruments (first two plots above) are much more consistent than the two Washington instruments (third and fourth plots above), as the differences are much more frequently greater than 1ºC in the latter. This is perhaps not that surprising when considering where the surface moorings and nearby buoys are located. In the map in section 5a. above, you can see that the Oregon buoys are located much closer to each other than the Washington buoys. The direction of the differences between the Washington instruments is also inconsistent (i.e. one instrument is not measuring consistently higher than the other), indicating they are not good proxies for each other. The Washington SCRIPPS buoy is closer to the coastline than the closest OOI Surface Mooring and its shallower location could be why there is greater variation in SST readings from that non-OOI buoy from one day to the next, particularly in the summer months when the two instruments overlap the most (i.e. green lines in the third plot).

## Conclusions and final notes

* Exploring the OOI METBKA SST data and metadata
  * ***Is the metadata complete? (3a. and 3b.)***
  <br>
  The metadata for the deployments of the four OOI surface moorings indicates different dates for the deployments than what is indicated on the ERDAPP website. This should be addressed so both sources of the data have the correct information.
  <br>
  * ***Has any of the existing data been flagged or annotated? (3c.)***
  <br>
  We found only one annotation for any of the four surface moorings: indicating the reason for a gap in the data for CE07SHSM. However, we found a second gap in the data for that mooring and another data gap for another of the moorings we assessed (CE04OSSM), both of which should have annotations added.
  * ***Does the plotted data make sense? (4b.)***
 <br>
 In addition to the data gaps that appear in the plotted data, there are some odd values (-5) for two of the moorings we assessed (CE02SHSM and CE07SHSM), that do not make sense for seawater temperature. We learned that these are the fill values for when SST instuments are not working correctly but we recommend that these should also be annotated and perhaps replaced with NaN.
* Internal cross-validation of the SST data (i.e. comparing across OOI deployments/ instruments)
  * ***Are there any disparities with overlapping deployment data? (4b.)***
  <br>
  Due to the data gaps in some of the instruments, there was less overlapping data between deployments for us to compare than we had expected. However, where overlapping data was available, it did seem to be fairly consistent.
  * ***How does the SST data from the METBKA packages compare to CTD casts from cruises at the same moorings? (6a.)***
  <br>
  In general, the SSTs measured by the METBKA packages at the four OOI Coastal Endurance Surface Moorings were either consistent with or slightly higher than those measured by CTD instruments used during the two OOI cruises ar the same locations. One reason for the discrepancies could be that, even though we extracted the shallowest CTD readings, they were measured in slightly deeper water (~2-4m) than the METBKA instruments (<1m). Even when the temperatures read by the METBKA instruments were not consistent with the CTD measurements, though, they differed by less than 1ºC, so the importance of this variation among sources will depend on how the data is being used.  
* External cross-validation of the SST data (i.e. comparing with satellite-derived data and nearby, non-OOI buoys)
  * ***How does the SST data from the surface moorings compare to satellite-derived SST for the same locations (i.e. NASA JPL MUR product)? (6a. and 6b.)***
  <br>
  On most dates throghout the year, SSTs measured by the METBKA packages on the OOI Surface Moorings tended to be fairly consistent with SSTs estimated by satellite data (i.e. they were within ~1ºC of the JPL MUR product). Most of the larger differences in temperature tended to be in the summer and fall, with the satellite data estimating higher temperatures (up to ~3ºC more) than the OOI Surface Moorings. There were far fewer days when the OOI Surface Moorings measured higher temperatures than the satellites estimated, but those days tended to be in the spring when they did occur. 
  * ***How does the SST data from the surface moorings compare to SST collected by other buoys nearby (i.e. NDBC/SCRIPPS buoys)? (6a. and 6b)***
Our indirect comparison of the OOI Surface Moorings and the other nearby buoys (i.e. comparing each with the satellite data) revealed than there was more consistency between the OOI Suface Moorings and satellites (<~3ºC) than between the other neaby buoys and satellites (up to ~5ºC). When we directly compared the OOI Surface Moorings with the nearby buoys, there was more consistency between those two in-situ instruments than between each of them and the satellite data. However, the SSTs measured by the two Oregon instruments were aligned much more closely than the two Washington intruments, perhaps owing to the closer proximity of those in Oregon compared to those in Washington. 