# Quick Start Demo of Ocean Data Gateway

Goal: to be able to search for and handle the read in of ocean datasets easily. The package we've written for this is called `ocean_data_gateway`, and here we show a short demo.

In [1]:
import ocean_data_gateway as odg
import pandas as pd
pd.set_option('display.max_rows', 5)

## Find Data in a Region

Here we will search for data in the Bering Sea region.

In [2]:
kw = {
    "min_lon": -180,
    "max_lon": -158,
    "min_lat": 50,
    "max_lat": 66,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

### All the servers

Set up search object, `data`, then do an initial metadata search to find the dataset_ids of the relevant datasets. We are searching for all variables currently.

In [3]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region')

# find dataset_ids to make sure it works
data.dataset_ids[:5]


CPU times: user 5.24 s, sys: 487 ms, total: 5.73 s
Wall time: 1min 36s


['noaa_nos_co_ops_9461081',
 'noaa_nos_co_ops_9462578',
 'wmo_46035',
 'noaa_nos_co_ops_9468226',
 'noaa_nos_co_ops_9460150']

The search checked `dataset_ids` for each of 5 readers and found the following number of datasets in them:

In [4]:
len(data.dataset_ids)

441

This searches through 2 ERDDAP servers (but more can be added by the user), 2 Axiom databases, and any known local files.

### Just one server

Since that search took 1.5 min just for the dataset_ids, let's narrow which databases are checked.

In [5]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, erddap={'known_server': 'ioos'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))

['noaa_nos_co_ops_9461081', 'noaa_nos_co_ops_9462578', 'wmo_46035', 'noaa_nos_co_ops_9468226', 'noaa_nos_co_ops_9460150'] 224
CPU times: user 13.5 ms, sys: 6.29 ms, total: 19.8 ms
Wall time: 881 ms


In [6]:
%%time
data.meta

CPU times: user 314 ms, sys: 141 ms, total: 456 ms
Wall time: 10.8 s


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
gov_noaa_water_tnua2,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/gov_...,False,60.578154,60.578154,-165.269186,-165.269186,2015-05-05T13:00:00Z,2021-08-19T10:00:00Z,"z,time,height_geoid_local_station_datum,water_...",,,49391,https://sensors.ioos.us/#metadata/49391/station,"NOAA Water Resources Regions, National Weather...",TimeSeries,,https://sensors.axds.co/api/,
gov_noaa_nws_pate,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/gov_...,False,65.233333,65.233333,-166.333333,-166.333333,2015-05-05T12:58:00Z,2021-08-19T09:56:00Z,"air_temperature,z,wind_speed,time,relative_hum...",,,15066,https://sensors.ioos.us/#metadata/15066/station,NOAA National Weather Service (NWS),TimeSeries,,https://sensors.axds.co/api/,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
noaa_nos_co_ops_9462578,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,53.678300,53.678300,-166.832000,-166.832000,2015-05-05T13:29:00Z,2021-08-26T06:14:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15494,https://sensors.ioos.us/#metadata/15494/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,
noaa_nos_co_ops_9461081,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,51.603300,51.603300,-178.617000,-178.617000,2015-05-05T20:57:00Z,2021-08-26T06:43:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15543,https://sensors.ioos.us/#metadata/15543/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,


In [7]:
%%time
data['noaa_nos_co_ops_nmta2']

CPU times: user 336 ms, sys: 109 ms, total: 445 ms
Wall time: 22.8 s


### One variable in one server

In [8]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, 
                   erddap={'known_server': 'ioos', 'variables': 'sea_water_temperature'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))

['noaa_nos_co_ops_unla2', 'wmo_46035', 'wmo_46073', 'noaa_nos_co_ops_kgca2', 'noaa_nos_co_ops_9459450'] 26
CPU times: user 82.3 ms, sys: 19.5 ms, total: 102 ms
Wall time: 761 ms


In [9]:
%%time
data.meta

CPU times: user 31 ms, sys: 5.5 ms, total: 36.5 ms
Wall time: 1.01 s


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
noaa_nos_co_ops_9464212,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,57.125278,57.125278,-170.285278,-170.285278,2002-04-12T22:00:00Z,2021-08-26T05:00:00Z,"wind_speed_qc_agg,sea_surface_height_amplitude...",,,12016,https://sensors.ioos.us/#metadata/12016/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,[sea_water_temperature]
noaa_nos_co_ops_snda2,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,False,55.337000,55.337000,-160.502000,-160.502000,2015-05-05T12:06:00Z,2021-08-19T10:48:00Z,"air_temperature,wind_speed_of_gust,sea_water_t...",,,13824,https://sensors.ioos.us/#metadata/13824/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
wmo_46035,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/wmo_...,False,57.026000,57.026000,-177.738000,-177.738000,2015-05-05T14:50:00Z,2021-08-19T11:00:00Z,"wind_speed_of_gust,sea_surface_swell_wave_to_d...",,,13841,https://sensors.ioos.us/#metadata/13841/station,NOAA National Data Buoy Center (NDBC),TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]
noaa_nos_co_ops_unla2,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,False,53.879000,53.879000,-166.540000,-166.540000,2015-05-05T12:30:00Z,2021-08-19T10:48:00Z,"air_temperature,wind_speed_of_gust,sea_water_t...",,,13808,https://sensors.ioos.us/#metadata/13808/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]


In [10]:
%%time
data['noaa_nos_co_ops_9459450']

CPU times: user 493 ms, sys: 178 ms, total: 670 ms
Wall time: 14 s


## Use Local Files

Local files can be easily input into the gateway using Python package `intake` under the hood. It is set up to automatically recognize either `csv` or `netcdf` files and be able to read them in.

In [11]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [12]:
data.meta

Unnamed: 0,coords,geospatial_lat_max,time_variable,catalog_dir,time_coverage_start,geospatial_lon_min,lat_variable,download_url,geospatial_lon_max,variables,time_coverage_end,geospatial_lat_min,lon_variable
ANIMctd14.nc,"[time, lat, lon, pressure]",71.488255,time,/Users/kthyng/.ocean_data_gateway/catalogs/,2014-07-31T15:33:33.999999314,-152.581114,lat,/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe...,-141.717438,"[station_name, sal, tem, fluoro, turbidity, PA...",2014-08-07T21:35:54.000004381,69.850874,lon
SBE16plus_01604787_2015_08_09_final.csv,,70.6349,,/Users/kthyng/.ocean_data_gateway/catalogs/,2014-08-01T12:00:05Z,-150.237,,/Users/kthyng/Downloads/Harrison_Bay_CTD_Moori...,-150.237,"[time, latitude, longitude, water_depth, Condu...",2015-08-09T06:00:05Z,70.6349,


In [13]:
data['ANIMctd14.nc']

In [14]:
data['SBE16plus_01604787_2015_08_09_final.csv']

Unnamed: 0,time,latitude,longitude,water_depth,Conductivity_[S/m],Pressure_[db],Temperature_ITS90_[deg C],Salinity_Practical_[PSU],Voltage0_[volts],Instrument_Time_[juliandays],flag
0,2014-08-01T12:00:05Z,70.6349,-150.237,13.0,2.495646,12.687,-1.4619,31.0905,0.3091,213.500058,0.0
1,2014-08-01T13:00:05Z,70.6349,-150.237,13.0,2.495454,12.699,-1.4595,31.0854,0.3265,213.541725,0.0
...,...,...,...,...,...,...,...,...,...,...,...
8945,2015-08-09T05:00:05Z,70.6349,-150.237,13.0,2.591448,12.777,0.3619,30.5086,0.3873,586.208391,0.0
8946,2015-08-09T06:00:05Z,70.6349,-150.237,13.0,2.585462,12.754,0.2862,30.5062,0.2441,586.250058,0.0


## Data QC

The user can lightly QC the data by calling `data.qc()` as demonstrated here. A summary of the results can be provided if requested (`verbose=True`), and a variable containing the qc flags ('*_qc') is created to go along with each variable used in the dataset.

In [15]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [16]:
data['ANIMctd14.nc']

In [17]:
data_qc = data.qc(verbose=True)

ANIMctd14.nc
tem_qc
Flag == 4 (FAIL): 74825
Flag == 1 (GOOD): 15634
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0
sal_qc
Flag == 4 (FAIL): 75119
Flag == 1 (GOOD): 15340
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0


In [18]:
data_qc['ANIMctd14.nc']