# Quick Start Demo of Ocean Data Gateway

Goal: to be able to search for and handle the read in of ocean datasets easily. The package we've written for this is called `ocean_data_gateway`, and here we show a short demo.

In [1]:
import ocean_data_gateway as odg
import pandas as pd
pd.set_option('display.max_rows', 5)

## Find Data in a Region

Here we will search for data in the Bering Sea region.

In [2]:
kw = {
    "min_lon": -180,
    "max_lon": -158,
    "min_lat": 50,
    "max_lat": 66,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

### All the servers

Set up search object, `data`, then do an initial metadata search to find the dataset_ids of the relevant datasets. We are searching for all variables currently.

In [3]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region')

# find dataset_ids to make sure it works
data.dataset_ids[:5]


CPU times: user 5.56 s, sys: 513 ms, total: 6.08 s
Wall time: 1min 35s


['noaa_nos_co_ops_twc2307',
 'noaa_nos_co_ops_twc2311',
 'noaa_nos_co_ops_9468011',
 'noaa_nos_co_ops_twc2321',
 'noaa_nos_co_ops_9468132']

The search checked `dataset_ids` for each of 5 readers and found the following number of datasets in them:

In [4]:
len(data.dataset_ids)

441

This searches through 2 ERDDAP servers (but more can be added by the user), 2 Axiom databases, and any known local files.

### Just one server

Since that search took 1.5 min just for the dataset_ids, let's narrow which databases are checked.

In [5]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, erddap={'known_server': 'ioos'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))

['noaa_nos_co_ops_twc2307', 'noaa_nos_co_ops_twc2311', 'noaa_nos_co_ops_9468011', 'noaa_nos_co_ops_twc2321', 'noaa_nos_co_ops_9468132'] 224
CPU times: user 19.3 ms, sys: 6.71 ms, total: 26 ms
Wall time: 886 ms


In [6]:
%%time
data.meta

CPU times: user 313 ms, sys: 123 ms, total: 436 ms
Wall time: 10.9 s


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
noaa_nos_co_ops_9461264,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,51.7183,51.7183,-177.2000,-177.2000,2015-05-05T19:56:00Z,2021-08-24T13:01:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15538,https://sensors.ioos.us/#metadata/15538/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,
noaa_nos_co_ops_9462721,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,54.1400,54.1400,-165.5270,-165.5270,2011-07-11T04:00:00Z,2021-08-24T21:00:00Z,"sea_surface_height_above_sea_level_geoid_mllw,...",,,15487,https://sensors.ioos.us/#metadata/15487/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
noaa_nos_co_ops_twc2311,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,52.3667,52.3667,-173.6167,-173.6167,2015-05-05T16:05:00Z,2021-08-24T12:49:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15549,https://sensors.ioos.us/#metadata/15549/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,
noaa_nos_co_ops_twc2307,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,51.9833,51.9833,-177.5500,-177.5500,2015-05-05T21:27:00Z,2021-08-24T15:04:00Z,sea_surface_height_amplitude_due_to_geocentric...,,,15547,https://sensors.ioos.us/#metadata/15547/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,


In [7]:
%%time
data['noaa_nos_co_ops_nmta2']

CPU times: user 323 ms, sys: 131 ms, total: 455 ms
Wall time: 1.99 s


### One variable in one server

In [8]:
%%time

# setup Data search object
data = odg.Gateway(kw=kw, approach='region', readers=odg.erddap, 
                   erddap={'known_server': 'ioos', 'variables': 'sea_water_temperature'})

# look up dataset_ids
print(data.dataset_ids[:5], len(data.dataset_ids))

['wmo_46072', 'noaa_nos_co_ops_9461380', 'wmo_46075', 'gov_usgs_waterdata_15565447', 'noaa_nos_co_ops_9462620'] 26
CPU times: user 77 ms, sys: 14.5 ms, total: 91.5 ms
Wall time: 715 ms


In [9]:
%%time
data.meta

CPU times: user 32.4 ms, sys: 6.02 ms, total: 38.4 ms
Wall time: 904 ms


Unnamed: 0,database,download_url,info_url,is_prediction,geospatial_lat_min,geospatial_lat_max,geospatial_lon_min,geospatial_lon_max,time_coverage_start,time_coverage_end,defaultDataQuery,subsetVariables,keywords,id,infoUrl,institution,featureType,source,sourceUrl,variable names
noaa_nos_co_ops_nmta2,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,False,64.500000,64.500000,-165.430000,-165.430000,2015-05-05T12:24:00Z,2021-08-17T15:30:00Z,"air_temperature,wind_speed_of_gust,sea_water_t...",,,13819,https://sensors.ioos.us/#metadata/13819/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]
gov_usgs_waterdata_15304010,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/gov_...,False,61.889301,61.889301,-158.156842,-158.156842,2015-05-06T20:00:00Z,2021-08-17T15:45:00Z,"air_temperature,river_discharge,sea_water_elec...",,,11606,https://sensors.ioos.us/#metadata/11606/station,USGS National Water Information System (NWIS),TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
noaa_nos_co_ops_9461380,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/noaa...,True,51.863306,51.863306,-176.632000,-176.632000,1950-03-17T01:00:00Z,2021-08-24T21:00:00Z,"wind_speed_qc_agg,sea_surface_height_amplitude...",,,12011,https://sensors.ioos.us/#metadata/12011/station,NOAA Center for Operational Oceanographic Prod...,TimeSeries,,https://tidesandcurrents.noaa.gov/api/,[sea_water_temperature]
wmo_46072,http://erddap.sensors.ioos.us/erddap,http://erddap.sensors.ioos.us/erddap/tabledap/...,http://erddap.sensors.ioos.us/erddap/info/wmo_...,False,51.663000,51.663000,-172.162000,-172.162000,2015-05-05T13:00:00Z,2021-08-17T15:40:00Z,"wind_speed_of_gust,sea_surface_swell_wave_to_d...",,,13828,https://sensors.ioos.us/#metadata/13828/station,NOAA National Data Buoy Center (NDBC),TimeSeries,,https://sensors.axds.co/api/,[sea_water_temperature]


In [10]:
%%time
data['noaa_nos_co_ops_9459450']

CPU times: user 424 ms, sys: 153 ms, total: 577 ms
Wall time: 12.4 s


## Use Local Files

Local files can be easily input into the gateway using Python package `intake` under the hood. It is set up to automatically recognize either `csv` or `netcdf` files and be able to read them in.

In [11]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [12]:
data.meta

Unnamed: 0,time_coverage_start,time_variable,geospatial_lat_max,geospatial_lat_min,variables,lat_variable,download_url,catalog_dir,geospatial_lon_max,time_coverage_end,coords,geospatial_lon_min,lon_variable
ANIMctd14.nc,2014-07-31T15:33:33.999999314,time,71.488255,69.850874,"[station_name, sal, tem, fluoro, turbidity, PA...",lat,/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe...,/Users/kthyng/.ocean_data_gateway/catalogs/,-141.717438,2014-08-07T21:35:54.000004381,"[time, lat, lon, pressure]",-152.581114,lon
SBE16plus_01604787_2015_08_09_final.csv,2014-08-01T12:00:05Z,,70.6349,70.6349,"[time, latitude, longitude, water_depth, Condu...",,/Users/kthyng/Downloads/Harrison_Bay_CTD_Moori...,/Users/kthyng/.ocean_data_gateway/catalogs/,-150.237,2015-08-09T06:00:05Z,,-150.237,


In [13]:
data['ANIMctd14.nc']

In [14]:
data['SBE16plus_01604787_2015_08_09_final.csv']

Unnamed: 0,time,latitude,longitude,water_depth,Conductivity_[S/m],Pressure_[db],Temperature_ITS90_[deg C],Salinity_Practical_[PSU],Voltage0_[volts],Instrument_Time_[juliandays],flag
0,2014-08-01T12:00:05Z,70.6349,-150.237,13.0,2.495646,12.687,-1.4619,31.0905,0.3091,213.500058,0.0
1,2014-08-01T13:00:05Z,70.6349,-150.237,13.0,2.495454,12.699,-1.4595,31.0854,0.3265,213.541725,0.0
...,...,...,...,...,...,...,...,...,...,...,...
8945,2015-08-09T05:00:05Z,70.6349,-150.237,13.0,2.591448,12.777,0.3619,30.5086,0.3873,586.208391,0.0
8946,2015-08-09T06:00:05Z,70.6349,-150.237,13.0,2.585462,12.754,0.2862,30.5062,0.2441,586.250058,0.0


## Data QC

The user can lightly QC the data by calling `data.qc()` as demonstrated here. A summary of the results can be provided if requested (`verbose=True`), and a variable containing the qc flags ('*_qc') is created to go along with each variable used in the dataset.

In [15]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

In [16]:
data['ANIMctd14.nc']

In [17]:
data_qc = data.qc(verbose=True)

ANIMctd14.nc
tem_qc
Flag == 4 (FAIL): 74825
Flag == 1 (GOOD): 15634
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0
sal_qc
Flag == 4 (FAIL): 75119
Flag == 1 (GOOD): 15340
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0


In [18]:
data_qc['ANIMctd14.nc']