# Quick Start Demo of Ocean Data Gateway

Goal: to be able to search for and handle the read in of ocean datasets easily. The package we've written for this is called `ocean_data_gateway`, and here we show a short demo.

In [1]:
import ocean_data_gateway as odg
import pandas as pd
pd.set_option('display.max_rows', 5)

## Find Data in a Region

Here we will search for data in the Bering Sea region.

In [2]:
kw = {
    "min_lon": -180,
    "max_lon": -158,
    "min_lat": 50,
    "max_lat": 66,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

### All the servers

Set up search object, `data`, then do an initial metadata search to find the dataset_ids of the relevant datasets. We are searching for all variables currently.

In [3]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region')

# find dataset_ids to make sure it works
data.dataset_ids[0][:5]


CPU times: user 5.92 s, sys: 519 ms, total: 6.44 s
Wall time: 1min 33s


['noaa_nos_co_ops_9461132',
 'noaa_nos_co_ops_9462694',
 'noaa_nos_co_ops_9468039',
 'noaa_nos_co_ops_9468333',
 'gov_usgs_waterdata_15304010']

The search checked `dataset_ids` for each of 5 readers and found the following number of datasets in them:

In [4]:
for dataset_ids in data.dataset_ids:
    print(len(dataset_ids))

224
203
1
22
0


This searches through 2 ERDDAP servers (but more can be added by the user), 2 Axiom databases, and any known local files.

### Just one server

Since that search took 1.5 min just for the dataset_ids, let's narrow which databases are checked.

In [5]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, erddap={'known_server': 'ioos'})

# look up dataset_ids
print(data.dataset_ids[0][:5], len(data.dataset_ids[0]))

['noaa_nos_co_ops_9461132', 'noaa_nos_co_ops_9462694', 'noaa_nos_co_ops_9468039', 'noaa_nos_co_ops_9468333', 'gov_usgs_waterdata_15304010'] 224
CPU times: user 20.1 ms, sys: 7.45 ms, total: 27.5 ms
Wall time: 930 ms


In [6]:
%%time
data.meta[0]

CPU times: user 306 ms, sys: 123 ms, total: 429 ms
Wall time: 11 s


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
noaa_nos_co_ops_9462955,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,54.85830,54.85830,-163.407,-163.407,2014-05-24T00:00:00Z,2021-08-03T15:00:00Z,"sea_surface_height_above_sea_level_geoid_mllw,...",,,15481,https://sensors.ioos.us/#metadata/15481/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,
noaa_nos_co_ops_9461341,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,51.78000,51.78000,-176.807,-176.807,2015-05-05T21:19:00Z,2021-08-03T08:56:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15535,https://sensors.ioos.us/#metadata/15535/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
noaa_nos_co_ops_9462694,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,54.13328,54.13328,-165.800,-165.800,2009-03-07T04:00:00Z,2021-08-03T15:00:00Z,"sea_surface_height_above_sea_level_geoid_mllw,...",,,15656,https://sensors.ioos.us/#metadata/15656/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,
noaa_nos_co_ops_9461132,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,51.67170,51.67170,-178.045,-178.045,2015-05-05T20:01:00Z,2021-08-03T08:07:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15542,https://sensors.ioos.us/#metadata/15542/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,


In [7]:
%%time
data_out = data.data[0]

CPU times: user 16 µs, sys: 11 µs, total: 27 µs
Wall time: 29.1 µs


In [8]:
data_out('noaa_nos_co_ops_nmta2')

### One variable in one server

In [9]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, 
                   erddap={'known_server': 'ioos', 'variables': 'sea_water_temperature'})

# look up dataset_ids
print(data.dataset_ids[0][:5], len(data.dataset_ids[0]))

['noaa_nos_co_ops_kgca2', 'noaa_nos_co_ops_9459881', 'noaa_nos_co_ops_9468333', 'gov_usgs_waterdata_15304010', 'noaa_nos_co_ops_9462620'] 26
CPU times: user 78.8 ms, sys: 14.1 ms, total: 92.9 ms
Wall time: 802 ms


In [10]:
%%time
data.meta[0]

CPU times: user 33.1 ms, sys: 5.92 ms, total: 39.1 ms
Wall time: 969 ms


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
shageluk-lake-shageluk-ak,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/shag...,False,62.679670,62.679670,-159.560820,-159.560820,2019-09-07T12:00:00Z,2021-05-27T12:00:00Z,"air_temperature,sea_water_temperature,z,time&t...",,,100180,https://sensors.ioos.us/#metadata/100180/station,Fresh Eyes on Ice,TimeSeriesProfile,,https://app.beadedstream.com/projects/7604/sit...,[sea_water_temperature]
noaa_nos_co_ops_adka2,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,False,51.863000,51.863000,-176.632000,-176.632000,2015-05-05T12:42:00Z,2021-07-27T09:30:00Z,"air_temperature,wind_speed_of_gust,sea_water_t...",,,13820,https://sensors.ioos.us/#metadata/13820/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
noaa_nos_co_ops_9459881,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,55.059889,55.059889,-162.326111,-162.326111,2005-08-26T18:00:00Z,2021-08-03T15:00:00Z,"wind_speed_qc_agg,sea_surface_height_amplitude...",,,12010,https://sensors.ioos.us/#metadata/12010/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,[sea_water_temperature]
noaa_nos_co_ops_kgca2,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,False,55.062000,55.062000,-162.327000,-162.327000,2015-05-05T12:12:00Z,2021-07-27T09:00:00Z,"air_temperature,wind_speed_of_gust,sea_water_t...",,,13807,https://sensors.ioos.us/#metadata/13807/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]


In [11]:
%%time
data_out = data.data[0]

CPU times: user 16 µs, sys: 0 ns, total: 16 µs
Wall time: 18.4 µs


In [12]:
data_out('noaa_nos_co_ops_9459450')

## Use Local Files

Local files can be easily input into the gateway using Python package `intake` under the hood. It is set up to automatically recognize either `csv` or `netcdf` files and be able to read them in.

In [13]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [14]:
data.meta[0]

Unnamed: 0,geospatial_lat_max,coords,geospatial_lat_min,time_variable,variables,lon_variable,geospatial_lon_min,lat_variable,time_coverage_start,download_url,catalog_dir,geospatial_lon_max,time_coverage_end
ANIMctd14.nc,71.488255,"[time, lat, lon, pressure]",69.850874,time,"[station_name, sal, tem, fluoro, turbidity, PA...",lon,-152.581114,lat,2014-07-31T15:33:33.999999314,/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe...,/Users/kthyng/.ocean_data_gateway/catalogs/,-141.717438,2014-08-07T21:35:54.000004381
SBE16plus_01604787_2015_08_09_final.csv,70.6349,,70.6349,,"[time, latitude, longitude, water_depth, Condu...",,-150.237,,2014-08-01T12:00:05Z,/Users/kthyng/Downloads/Harrison_Bay_CTD_Moori...,/Users/kthyng/.ocean_data_gateway/catalogs/,-150.237,2015-08-09T06:00:05Z


In [17]:
data.data[0]('ANIMctd14.nc')

In [18]:
data.data[0]('SBE16plus_01604787_2015_08_09_final.csv')

Unnamed: 0,time,latitude,longitude,water_depth,Conductivity_[S/m],Pressure_[db],Temperature_ITS90_[deg C],Salinity_Practical_[PSU],Voltage0_[volts],Instrument_Time_[juliandays],flag
0,2014-08-01T12:00:05Z,70.6349,-150.237,13.0,2.495646,12.687,-1.4619,31.0905,0.3091,213.500058,0.0
1,2014-08-01T13:00:05Z,70.6349,-150.237,13.0,2.495454,12.699,-1.4595,31.0854,0.3265,213.541725,0.0
...,...,...,...,...,...,...,...,...,...,...,...
8945,2015-08-09T05:00:05Z,70.6349,-150.237,13.0,2.591448,12.777,0.3619,30.5086,0.3873,586.208391,0.0
8946,2015-08-09T06:00:05Z,70.6349,-150.237,13.0,2.585462,12.754,0.2862,30.5062,0.2441,586.250058,0.0


## Data QC

The user can lightly QC the data by calling `data.qc()` as demonstrated here. A summary of the results can be provided if requested (`verbose=True`), and a variable containing the qc flags ('*_qc') is created to go along with each variable used in the dataset.

In [19]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [20]:
data.data[0]('ANIMctd14.nc')

In [21]:
data_qc = data.qc(verbose=True)

ANIMctd14.nc
tem_qc
Flag == 4 (FAIL): 74825
Flag == 1 (GOOD): 15634
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0
sal_qc
Flag == 4 (FAIL): 75119
Flag == 1 (GOOD): 15340
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0


In [23]:
data_qc[0]['ANIMctd14.nc']