# Quick Start Demo of Ocean Data Gateway

Goal: to be able to search for and handle the read in of ocean datasets easily. The package we've written for this is called `ocean_data_gateway`, and here we show a short demo.

In [1]:
import ocean_data_gateway as odg
import pandas as pd
pd.set_option('display.max_rows', 5)

## Find Data in a Region

Here we will search for data in the Bering Sea region.

In [2]:
kw = {
    "min_lon": -180,
    "max_lon": -158,
    "min_lat": 50,
    "max_lat": 66,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

### All the servers

Set up search object, `data`, then do an initial metadata search to find the dataset_ids of the relevant datasets. We are searching for all variables currently.

In [3]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region')

# find dataset_ids to make sure it works
data.dataset_ids[:5]


CPU times: user 5.69 s, sys: 510 ms, total: 6.2 s
Wall time: 1min 37s


['gov_noaa_water_snka2',
 'noaa_nos_co_ops_9459895',
 'noaa_nos_co_ops_9459758',
 'noaa_nos_co_ops_9459016',
 'noaa_nos_co_ops_9468691']

The search checked `dataset_ids` for each of 5 readers and found the following number of datasets in them:

In [4]:
len(data.dataset_ids)

441

This searches through 2 ERDDAP servers (but more can be added by the user), 2 Axiom databases, and any known local files.

### Just one server

Since that search took 1.5 min just for the dataset_ids, let's narrow which databases are checked.

In [5]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, erddap={'known_server': 'ioos'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))

['gov_noaa_water_snka2', 'noaa_nos_co_ops_9459895', 'noaa_nos_co_ops_9459758', 'noaa_nos_co_ops_9459016', 'noaa_nos_co_ops_9468691'] 224
CPU times: user 19.8 ms, sys: 8.18 ms, total: 28 ms
Wall time: 978 ms


In [6]:
%%time
data.meta

CPU times: user 312 ms, sys: 126 ms, total: 438 ms
Wall time: 10.5 s


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
noaa_nos_co_ops_9459398,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,55.361700,55.361700,-160.360000,-160.360000,2015-05-05T18:15:00Z,2021-08-26T13:31:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15463,https://sensors.ioos.us/#metadata/15463/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,
noaa_nos_co_ops_9462620,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,53.879194,53.879194,-166.540306,-166.540306,1982-01-01T11:00:00Z,2021-08-26T05:00:00Z,"wind_speed_qc_agg,sea_surface_height_amplitude...",,,12014,https://sensors.ioos.us/#metadata/12014/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
noaa_nos_co_ops_9459895,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,55.118300,55.118300,-162.378000,-162.378000,2015-05-05T18:29:00Z,2021-08-26T13:35:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15474,https://sensors.ioos.us/#metadata/15474/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,
gov_noaa_water_snka2,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/gov_...,False,64.564167,64.564167,-165.507222,-165.507222,2015-05-11T18:19:00Z,2021-08-19T14:15:00Z,"river_discharge,z,time,height_geoid_local_stat...",,,11716,https://sensors.ioos.us/#metadata/11716/station,"NOAA Water Resources Regions, National Weather...",TimeSeries,,https://sensors.axds.co/api/,


In [7]:
%%time
data['noaa_nos_co_ops_nmta2']

CPU times: user 395 ms, sys: 102 ms, total: 497 ms
Wall time: 1.91 s


### One variable in one server

In [8]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, 
                   erddap={'known_server': 'ioos', 'variables': 'sea_water_temperature'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))

['gov_usgs_waterdata_15297610', 'noaa_nos_co_ops_9468333', 'noaa_nos_co_ops_atka2', 'noaa_nos_co_ops_9461380', 'gov_usgs_waterdata_15302000'] 26
CPU times: user 85.6 ms, sys: 14.4 ms, total: 100 ms
Wall time: 888 ms


In [9]:
%%time
data.meta

CPU times: user 31.6 ms, sys: 6.17 ms, total: 37.8 ms
Wall time: 1.55 s


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
noaa_nos_co_ops_9462620,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,53.879194,53.879194,-166.540306,-166.540306,1982-01-01T11:00:00Z,2021-08-26T05:00:00Z,"wind_speed_qc_agg,sea_surface_height_amplitude...",,,12014,https://sensors.ioos.us/#metadata/12014/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,[sea_water_temperature]
wmo_46073,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/wmo_...,False,55.031000,55.031000,-172.001000,-172.001000,2015-05-05T12:50:00Z,2021-08-19T18:00:00Z,"wind_speed_of_gust,sea_surface_swell_wave_to_d...",,,41997,https://sensors.ioos.us/#metadata/41997/station,NOAA National Data Buoy Center (NDBC),TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
noaa_nos_co_ops_9468333,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,63.871361,63.871361,-160.784300,-160.784300,2011-06-22T21:00:00Z,2021-08-26T05:00:00Z,"wind_speed_qc_agg,sea_surface_height_amplitude...",,,15668,https://sensors.ioos.us/#metadata/15668/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,[sea_water_temperature]
gov_usgs_waterdata_15297610,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/gov_...,False,55.176906,55.176906,-162.689511,-162.689511,2015-05-05T11:45:00Z,2021-08-19T17:15:00Z,"air_temperature,river_discharge,sea_water_temp...",,,11662,https://sensors.ioos.us/#metadata/11662/station,USGS National Water Information System (NWIS),TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]


In [10]:
%%time
data['noaa_nos_co_ops_9459450']

CPU times: user 539 ms, sys: 182 ms, total: 721 ms
Wall time: 15.2 s


## Use Local Files

Local files can be easily input into the gateway using Python package `intake` under the hood. It is set up to automatically recognize either `csv` or `netcdf` files and be able to read them in.

In [11]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [12]:
data.meta

Unnamed: 0,coords,lon_variable,geospatial_lat_max,lat_variable,time_variable,download_url,geospatial_lon_min,time_coverage_start,variables,catalog_dir,geospatial_lat_min,geospatial_lon_max,time_coverage_end
ANIMctd14.nc,"[time, lat, lon, pressure]",lon,71.488255,lat,time,/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe...,-152.581114,2014-07-31T15:33:33.999999314,"[station_name, sal, tem, fluoro, turbidity, PA...",/Users/kthyng/.ocean_data_gateway/catalogs/,69.850874,-141.717438,2014-08-07T21:35:54.000004381
SBE16plus_01604787_2015_08_09_final.csv,,,70.6349,,,/Users/kthyng/Downloads/Harrison_Bay_CTD_Moori...,-150.237,2014-08-01T12:00:05Z,"[time, latitude, longitude, water_depth, Condu...",/Users/kthyng/.ocean_data_gateway/catalogs/,70.6349,-150.237,2015-08-09T06:00:05Z


In [13]:
data['ANIMctd14.nc']

In [14]:
data['SBE16plus_01604787_2015_08_09_final.csv']

Unnamed: 0,time,latitude,longitude,water_depth,Conductivity_[S/m],Pressure_[db],Temperature_ITS90_[deg C],Salinity_Practical_[PSU],Voltage0_[volts],Instrument_Time_[juliandays],flag
0,2014-08-01T12:00:05Z,70.6349,-150.237,13.0,2.495646,12.687,-1.4619,31.0905,0.3091,213.500058,0.0
1,2014-08-01T13:00:05Z,70.6349,-150.237,13.0,2.495454,12.699,-1.4595,31.0854,0.3265,213.541725,0.0
...,...,...,...,...,...,...,...,...,...,...,...
8945,2015-08-09T05:00:05Z,70.6349,-150.237,13.0,2.591448,12.777,0.3619,30.5086,0.3873,586.208391,0.0
8946,2015-08-09T06:00:05Z,70.6349,-150.237,13.0,2.585462,12.754,0.2862,30.5062,0.2441,586.250058,0.0


## Data QC

The user can lightly QC the data by calling `data.qc()` as demonstrated here. A summary of the results can be provided if requested (`verbose=True`), and a variable containing the qc flags ('*_qc') is created to go along with each variable used in the dataset.

In [15]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [16]:
data['ANIMctd14.nc']

In [17]:
data_qc = data.qc(verbose=True)

ANIMctd14.nc
tem_qc
Flag == 4 (FAIL): 74825
Flag == 1 (GOOD): 15634
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0
sal_qc
Flag == 4 (FAIL): 75119
Flag == 1 (GOOD): 15340
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0


In [18]:
data_qc['ANIMctd14.nc']