# Detailed usage

In [1]:
from cf_xarray.units import units  # isort:skip
import pint_xarray  # isort:skip

pint_xarray.unit_registry = units  # isort:skip
import ocean_data_gateway as odg
import pandas as pd
import xarray as xr
import numpy as np
pd.set_option('display.max_rows', 5)

## General Options

In [2]:
kw = {
    "min_lon": -124.0,
    "max_lon": -123.0,
    "min_lat": 39.0,
    "max_lat": 40.0,
    "min_time": '2021-4-1',
    "max_time": '2021-4-2',
}

### Parallel

You can control readers individually as needed. For example, you could input the keyword `parallel`, which every reader accepts, per individual reader (in case you want different values for different readers), or you can input it for all readers by including it in `kwargs` generally. It runs in parallel using the `joblib` `Parallel` and `delayed` modules with `multiprocesses` — running loops on different cores.

In [3]:
kwargs = {
          'kw': kw, 
          'approach': 'region',
          'parallel': True,    
          'erddap': {
                           'known_server': 'ioos',
#                            'parallel': False,
                           'variables': 'salinity',
          },
          'axds': {'catalog_name': None,
#                          'parallel': True,
                         'axds_type': 'platform2',
                         'variables': 'Salinity'},
          }
data = odg.Gateway(**kwargs)

### Reader Choice

Your reader choices can be selected as follows, where `odg.erddap` connects to ERDDAP servers, the `odg.axds` connects to Axiom databases, and the `odg.local` enables easy local file read-in. If you don't input any reader, it will use all of them. Alternatively you can input some subset.

In [4]:
readers = [odg.erddap,
           odg.axds,
           odg.local]

Use only ERDDAP reader and Axiom reader:

In [5]:
data = odg.Gateway(kw=kw, approach='region', 
                   readers=[odg.erddap,
                            odg.axds])

## Region

Search by time/space region.

### All variables

Don't input anything with the `variables` keyword, or use `'variables': None`:

In [6]:
kwargs = {
          'kw': kw, 
          'approach': 'region',
          'readers': [odg.erddap,
                      odg.axds],
          'variables': None
}
data = odg.Gateway(**kwargs)

### By variables(s)

If no `variables` are specified for a given reader, datasets with any variables will be returned from a search. This is most relevant for a `region` search.

However, if you want to specify a variable or variables, keep in mind that different readers have different names for variables, which is why you can't just input a variable name for all the readers. 

This is only relevant for the ERDDAP and Axiom readers currently (it will retain all variables in local files). The Axiom reader of type `platform2` will search by variable where the available variable names are specified, and of type `layer_group`, the `query` method will be used for variable searching.

Let's say you want to search for salinity. You can input the base of the word as `variables` ("sal" or "salinity" but not "salt" since the checker searches for matches with the whole input variable name and "salt" isn't used for any variable names) and the code will make sure it exactly matches a known variable name. If it cannot match, it will throw an error with suggestions. This is not done automatically since for example "soil_salinity" matches for "salinity". You need to do this for each `known_server` for the `erddap` reader separately, and specific variables will only be used to filter for the `axds` reader for `axds_type='platform2'`. Any variable names can be input for the `axds` reader for `axds_type='layer_group'`.

In [7]:
kwargs = {
          'kw': kw, 
          'approach': 'region',
          'stations': '8771972',
          'readers': [odg.erddap,
                      odg.axds],
                    
          'erddap': {
                          'known_server': ['coastwatch','ioos'],
                           'variables': [['sal'],
                                         ['sal']]
          },
          'axds': {
                          'axds_type': ['platform2','layer_group'],
                         'variables': ['sal','salinity']},
}


data = odg.Gateway(**kwargs)

AssertionError: The input variables are not exact matches to ok variables for known_server ioos.                      
Check all parameter group values with `ErddapReader().all_variables()`                      
or search parameter group values with `ErddapReader().search_variables(['sal'])`.                     

 Try some of the following variables:
                                              count
variable                                           
salinity                                        954
salinity_qc                                     954
...                                             ...
sea_water_practical_salinity_4161sc_a_qc_agg      1
sea_water_practical_salinity_10091sc_a            1

[1148 rows x 1 columns]

You can do this process iteratively, trying out variables for each of the ERDDAP and Axiom readers until you get what you want. Once you have selected variables that match, the code won't complain anymore.

In [8]:
kwargs = {
          'kw': kw, 
          'approach': 'region',
          'readers': [odg.erddap,
                      odg.axds],
                    
          'erddap': {
                          'known_server': ['coastwatch','ioos'],
                           'variables': [['salinity', 'sea_water_salinity'],
                                         ['salinity', 'sea_water_practical_salinity']]
          },
          'axds': {
                          'axds_type': ['platform2','layer_group'],
                         'variables': ['Salinity','Salinity']},
}

data = odg.Gateway(**kwargs)

### Actions with variables

Alternatively you can proactively search for variables for each reader. Currently the ways to call the individiual libraries aren't pretty but they'll work. Note that the number of times a variable is used in the system is also included under "count" to see what the popular names are (many are not widely used). 


#### All available variables

Return all variables for the two ERDDAP `known_server`s, then for the Axiom reader `axds_type='platform2'`.

In [9]:
odg.erddap.ErddapReader(known_server='coastwatch').all_variables().head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
abund_m3,2
ac_line,1
ac_sta,1
adg_412,8
adg_412_bias,8


In [10]:
odg.erddap.ErddapReader(known_server='ioos').all_variables().head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
air_pressure,4028
air_pressure_10011met_a,2
air_pressure_10311ahlm_a,2
air_pressure_10311ahlm_a_qc_agg,1
air_pressure_10311ahlm_a_qc_tests,1


The Axiom reader variables are for `axds_type='platform2'` not `axds_type='layer_group` since the latter are more unique grid products that don't conform well.

In [11]:
odg.axds.AxdsReader(axds_type='platform2').all_variables().head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
Ammonium,23
Atmospheric Pressure: Air Pressure at Sea Level,362
Atmospheric Pressure: Barometric Pressure,4152
Backscatter Intensity,286
Battery,2705


#### All available variables, sorted by count

In [12]:
odg.erddap.ErddapReader(known_server='coastwatch').search_variables('').head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
time,1637
longitude,1352
latitude,1352
altitude,725
sst,208


In [13]:
odg.erddap.ErddapReader(known_server='ioos').search_variables('').head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
time,38331
longitude,38331
latitude,38331
z,37377
station,37377


In [14]:
odg.axds.AxdsReader(axds_type='platform2').search_variables('').head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
Stream Height,19758
Water Surface above Datum,19489
Stream Flow,15203
Temperature: Air Temperature,8369
Precipitation,7364


#### Variables search, sorted by count

In [15]:
odg.erddap.ErddapReader(known_server='coastwatch').search_variables('sal').head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
salinity,73
salt,4
sea_water_salinity,4
surface_salinity_trend,2
bucket_salinity,1


In [16]:
odg.erddap.ErddapReader(known_server='ioos').search_variables('sal').head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
salinity,954
salinity_qc,954
sea_water_practical_salinity,778
soil_salinity_qc_agg,622
soil_salinity,622


In [17]:
odg.axds.AxdsReader(axds_type='platform2').search_variables('sal').head()

Unnamed: 0_level_0,count
variable,Unnamed: 1_level_1
Salinity,3204
Soil Salinity,622


#### Check variables

And finally you can check to make sure you have good variables. No news is good news in this. Reminder that you don't check for axds reader for axds_type='layer_group' because that is searched for in the database just by name as a query.

In [18]:
odg.erddap.ErddapReader(known_server='coastwatch').check_variables(['salinity', 'sea_water_salinity'])

In [19]:
odg.erddap.ErddapReader(known_server='ioos').check_variables(['salinity', 'sea_water_practical_salinity'])

In [20]:
odg.axds.AxdsReader(axds_type='platform2').check_variables('Salinity')

Or, all together in one call

In [21]:
kwargs = {
          'kw': kw, 
          'approach': 'region',
          'readers': [odg.erddap,
                      odg.axds],

          'erddap': {
                          'known_server': ['coastwatch','ioos'],
                           'variables': [['salinity', 'sea_water_salinity'],
                                         ['salinity', 'sea_water_practical_salinity']]
          },
          'axds': {
                          'axds_type': ['platform2','layer_group'],
                         'variables': ['Salinity',
                                       'salinity'  # this one can be called anything that might make a match
                                      ]},
}

data = odg.Gateway(**kwargs)

In [22]:
data.dataset_ids

[[],
 [],
 [],
 ['5104d464-8a30-4720-aeb7-57e801844e6e',
  'd359748a-fe78-11e7-8128-0023aeec7b98',
  '99737f5d-c984-4bf0-82cd-18508fea413f',
  '3261285c-e3c9-45fd-b777-e6d681a3eaad']]

## Stations

You can search by either a general station name to be searched for, or by the specific database dataset_id if you know it (from performing a search previously, for example).


### By station name

In the case that you know names of stations, but they might not be the names in the particular databases, you can use this approach.

In the follow example, I use some station id's I know off the top of my head. Note that the dataset_ids are returned in order of the readers in a list of lists that are being used (ERDDAP IOOS, ERDDAP Coastwatch, Axiom platform2, Axiom layer_group, localreader). The module will check all of the readers for the station names.

There are 2 listings for the station "SFBOFS" because there are two listings in the database: one for unstructured grid output and one for interpolated structured grid output. The module (not 'layer_group') uuid is the "dataset_id" for `axds_type='layer_group'` searches/stations.

In [23]:
kwargs = {
          'approach': 'stations',
          'stations': ['8771972','SFBOFS','42020','TABS_B']
}
data = odg.Gateway(**kwargs)

In [24]:
data.dataset_ids

[['noaa_nos_co_ops_8771972', None, 'wmo_42020', None],
 [None, None, None, None],
 [],
 ['03158b5d-f712-45f2-b05d-e4954372c1ce',
  '794f7bba-b3d2-4da8-8465-408c27ab433b'],
 []]

### By Dataset ID

Once we know the database dataset_ids, we can use them directly for future searches. Note that they need to be associated with the correct reader/database, as shown in the call below.

In [25]:
kwargs = {
          'approach': 'stations',
          'erddap': {
                          'known_server': 'ioos',
                           'dataset_ids': [['tabs_b', 'wmo_42020', 'noaa_nos_co_ops_8771972']]
          },
          'axds': {
                          'axds_type': 'layer_group',
                         'dataset_ids': '03158b5d-f712-45f2-b05d-e4954372c1ce'},

}
data = odg.Gateway(**kwargs)

In [26]:
data.dataset_ids

[['tabs_b', 'wmo_42020', 'noaa_nos_co_ops_8771972'],
 ['03158b5d-f712-45f2-b05d-e4954372c1ce'],
 []]

For `axds_type=='layer_group'`, Axiom module's uuid's should be used as `dataset_ids` (these are returned from the search above in "By station name"). If for some reason you have an Axiom 'layer_group' uuid specifically, you should input that as a "station". In both cases, the module uuid is returned as the dataset_id because that is how 'layer_group' information is organized.

In [27]:
# Example with module uuid input as dataset_id for 'layer_group'
kwargs = {
          'approach': 'stations',
            'axds': {
                'axds_type': 'layer_group',
                'dataset_ids': '03158b5d-f712-45f2-b05d-e4954372c1ce'}
}
data = odg.Gateway(**kwargs)
data.dataset_ids

[[], [], ['03158b5d-f712-45f2-b05d-e4954372c1ce'], []]

In [28]:
# Example with layer_group uuid input as station for 'layer_group'
kwargs = {
          'approach': 'stations',
            'axds': {
                'axds_type': 'layer_group',
              'stations': '04784baa-6be8-4aa7-b039-269f35e92e91'}
}
data = odg.Gateway(**kwargs)
data.dataset_ids

[[], [], ['03158b5d-f712-45f2-b05d-e4954372c1ce'], []]

### Include Time Range

By default, the full available time range will be returned for each dataset unless the user specifies one to narrow the returned datasets in time.

Data defined in previous cell shows long time range for any of the sources you can tell there are 4 sources considered since the list in the previous code cell has 4 elements.


In [29]:
data.sources[0].kw

{'min_time': '1900-01-01', 'max_time': '2100-12-31'}

A shorter time range is shown in the following since it is specified.

In [30]:
kwargs = {
          'kw': {'min_time': '2017-1-1', 
                 'max_time': '2017-1-2'},
          'approach': 'stations',
          'stations': ['8771972']
}
data = odg.Gateway(**kwargs)
data.sources[0].kw

{'min_time': '2017-1-1', 'max_time': '2017-1-2'}

## Reader Options

### ERDDAP Reader

By default, the Data module will use `erddap` with two known servers: IOOS and Coastwatch. 

In [31]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap]
}
data = odg.Gateway(**kwargs)
data.sources[0].name, data.sources[1].name

('erddap_ioos', 'erddap_coastwatch')

#### Choose one known server

The user can specify to use just one of these:

In [32]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap],
          'erddap': {
                      'known_server': ['ioos'],  # or 'coastwatch'
          }
}
data = odg.Gateway(**kwargs)
data.sources[0].name

'erddap_ioos'

#### New ERDDAP Server

You can give the necessary information to use a different ERDDAP server.

In [33]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.erddap],
            'erddap': {
                'known_server': 'ifremer',
                'protocol': 'tabledap',
                'server': 'http://www.ifremer.fr/erddap'
            }
}
data = odg.Gateway(**kwargs)

In [34]:
data.dataset_ids

[['ArgoFloats-synthetic-BGC',
  'ArgoFloats',
  'copernicus-fos',
  'OceanGlidersGDACTrajectories']]

### AXDS Reader

By default the Gateway class will use `axds` with two types of data: 'platform2' (like gliders) or 'layer_group' (model output, gridded products). 

In [35]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.axds]
}
data = odg.Gateway(**kwargs)
data.sources[0].name, data.sources[1].name

('axds_platform2', 'axds_layer_group')

#### Specify AXDS Type

The user can specify to use just one of these:

In [36]:
kwargs = {
          'kw': kw,
          'approach': 'region',
          'readers': [odg.axds],
          'axds': {
                          'axds_type': 'platform2',  # or 'layer_group'
          }
}
data = odg.Gateway(**kwargs)
data.sources[0].name

'axds_platform2'

### Local Files

I can't remember the process by which I got these files from a portal now, but they are just meant to be sample files anyway. Hopefully this will work reasonably well with other files too.

The `region` and `stations` approach doesn't work as well with local files if the user would only be inputting filenames if they know they want to use them. It could be useful to use the approaches in the case that the user has a bunch of files somewhere or a catalog that already exists and they just want to point to that and have the code filter down. That code is not in place but could be if that is a good use case.

So it currently doesn't matter which approach is used for local files. There is a default `kw` and `region` if nothing is input and in this case that is fine since neither are used.

In [37]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc',
             '/Users/kthyng/Downloads/Harrison_Bay_CTD_MooringData_2014-2015/Harrison_Bay_data/SBE16plus_01604787_2015_08_09_final.csv']

data = odg.Gateway(readers=odg.local, local={'filenames': filenames})

Can look at metadata or data

In [38]:
data.meta[0]

Unnamed: 0,lon_variable,geospatial_lat_max,geospatial_lon_max,time_variable,geospatial_lat_min,coords,catalog_dir,lat_variable,geospatial_lon_min,download_url,time_coverage_end,time_coverage_start,variables
ANIMctd14.nc,lon,71.488255,-141.717438,time,69.850874,"[time, lat, lon, pressure]",/Users/kthyng/.ocean_data_gateway/catalogs/,lat,-152.581114,/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSe...,2014-08-07T21:35:54.000004381,2014-07-31T15:33:33.999999314,"[station_name, sal, tem, fluoro, turbidity, PA..."
SBE16plus_01604787_2015_08_09_final.csv,,70.6349,-150.237,,70.6349,,/Users/kthyng/.ocean_data_gateway/catalogs/,,-150.237,/Users/kthyng/Downloads/Harrison_Bay_CTD_Moori...,2015-08-09T06:00:05Z,2014-08-01T12:00:05Z,"[time, latitude, longitude, water_depth, Condu..."


In [40]:
data.data[0]('ANIMctd14.nc')

In [41]:
data.data[0]('SBE16plus_01604787_2015_08_09_final.csv')

Unnamed: 0,time,latitude,longitude,water_depth,Conductivity_[S/m],Pressure_[db],Temperature_ITS90_[deg C],Salinity_Practical_[PSU],Voltage0_[volts],Instrument_Time_[juliandays],flag
0,2014-08-01T12:00:05Z,70.6349,-150.237,13.0,2.495646,12.687,-1.4619,31.0905,0.3091,213.500058,0.0
1,2014-08-01T13:00:05Z,70.6349,-150.237,13.0,2.495454,12.699,-1.4595,31.0854,0.3265,213.541725,0.0
...,...,...,...,...,...,...,...,...,...,...,...
8945,2015-08-09T05:00:05Z,70.6349,-150.237,13.0,2.591448,12.777,0.3619,30.5086,0.3873,586.208391,0.0
8946,2015-08-09T06:00:05Z,70.6349,-150.237,13.0,2.585462,12.754,0.2862,30.5062,0.2441,586.250058,0.0


## Other Functionality

### Data subselection

You can pull out the data for one, several, or all of the dataset_ids found in your search, as demonstrated here.

In [15]:
kw = {'min_lon': -94,
 'max_lon': -92,
 'min_lat': 28,
 'max_lat': 30,
 'min_time': pd.Timestamp('2021-05-27'),
 'max_time': pd.Timestamp('2021-06-02')}

kwargs = {
          'kw': kw,
          'approach': 'region',
          'parallel': False,
          'readers': [odg.erddap],
          'erddap': {
                          'known_server': ['ioos'],
                           'variables': [
                                       ['sea_surface_height_above_sea_level_geoid_mllw']
                           ]
          },
}

data = odg.Gateway(**kwargs)

In [16]:
# all dataset_ids found for this search
data.dataset_ids[0]

['noaa_nos_co_ops_8768094',
 'noaa_nos_co_ops_8770570',
 'noaa_nos_co_ops_8766072',
 'noaa_nos_co_ops_8770520',
 'noaa_nos_co_ops_8770822',
 'noaa_nos_co_ops_8770475']

#### Read in data for 1 dataset_id

Need to index with 0 to pull out the initial reader from the list (in this case there is only one).

In [4]:
data.data[0]('noaa_nos_co_ops_8770822')

In [5]:
# See what dataset_ids have been read in
data.data[0]('printkeys')

#### Read in data for 2 dataset_ids

In [7]:
data.data[0](['noaa_nos_co_ops_8770822','noaa_nos_co_ops_8770475'])

{'noaa_nos_co_ops_8770822': <xarray.Dataset>
 Dimensions:                                        (time: 1458, timeseries: 1)
 Coordinates:
     latitude                                       (timeseries) float64 29.68
     longitude                                      (timeseries) float64 -93.84
   * time                                           (time) datetime64[ns] 2021...
 Dimensions without coordinates: timeseries
 Data variables:
     sea_surface_height_above_sea_level_geoid_mllw  (time, timeseries) float64 ...
 Attributes: (12/53)
     cdm_data_type:                 TimeSeries
     cdm_timeseries_variables:      station,longitude,latitude
     contributor_email:             None,feedback@axiomdatascience.com
     contributor_name:              Gulf of Mexico Coastal Ocean Observing Sys...
     contributor_role:              funder,processor
     contributor_role_vocabulary:   NERC
     ...                            ...
     standard_name_vocabulary:      CF Standard Name Table

In [8]:
# See what dataset_ids have been read in
data.data[0]('printkeys')

dict_keys(['noaa_nos_co_ops_8770822', 'noaa_nos_co_ops_8770475'])

#### Read in data for all dataset_ids

In [9]:
data.data[0]()

{'noaa_nos_co_ops_8770822': <xarray.Dataset>
 Dimensions:                                        (time: 1458, timeseries: 1)
 Coordinates:
     latitude                                       (timeseries) float64 29.68
     longitude                                      (timeseries) float64 -93.84
   * time                                           (time) datetime64[ns] 2021...
 Dimensions without coordinates: timeseries
 Data variables:
     sea_surface_height_above_sea_level_geoid_mllw  (time, timeseries) float64 ...
 Attributes: (12/53)
     cdm_data_type:                 TimeSeries
     cdm_timeseries_variables:      station,longitude,latitude
     contributor_email:             None,feedback@axiomdatascience.com
     contributor_name:              Gulf of Mexico Coastal Ocean Observing Sys...
     contributor_role:              funder,processor
     contributor_role_vocabulary:   NERC
     ...                            ...
     standard_name_vocabulary:      CF Standard Name Table

In [10]:
# See what dataset_ids have been read in
data.data[0]('printkeys')

dict_keys(['noaa_nos_co_ops_8770822', 'noaa_nos_co_ops_8770475', 'noaa_nos_co_ops_8768094', 'noaa_nos_co_ops_8770570', 'noaa_nos_co_ops_8766072', 'noaa_nos_co_ops_8770520'])

You can also print all of the dataset_ids and data with:

`data.data[0]('printall')`

### Data QC

Some quality checking of the data is possible now, along with several improvements in Dataset functionality.

#### Variable definitions

Variables to be used in the model-data comparison need to be chosen and have some basic information attached: units, and reasonable ranges for the variable in the units. These will be used to align the data and models to be sure we are making appropriate comparisons. The ranges are used for basic QC. Currently we are using the following variable information:

In [42]:
odg.var_def

{'temp': {'units': 'degree_Celsius',
  'fail_span': [-100, 100],
  'suspect_span': [-10, 40]},
 'salt': {'units': 'psu', 'fail_span': [-10, 60], 'suspect_span': [-1, 45]},
 'u': {'units': 'm/s', 'fail_span': [-10, 10], 'suspect_span': [-5, 5]},
 'v': {'units': 'm/s', 'fail_span': [-10, 10], 'suspect_span': [-5, 5]},
 'ssh': {'units': 'm', 'fail_span': [-10, 10], 'suspect_span': [-3, 3]}}

Capability in a supported package `cf-xarray` now allows for calling variables in `xarray` Datasets by user-defined names by using regular expressions to match variables definitions with these names. Here is the current version of how we want to identify variables:

In [43]:
odg.my_custom_criteria

{'ssh': {'standard_name': 'sea_surface_height$|sea_surface_elevation|sea_surface_height_above_sea_level$',
  'name': 'sea_surface_elevation|sea_surface_height_above_sea_level_geoid_mllw|zeta$'},
 'temp': {'name': 'temp$|temperature$|tem$|s.sea_water_temperature$'},
 'salt': {'standard_name': 'sea_water_salinity$|sea_water_practical_salinity$',
  'name': 'sea_water_salinity$|sea_water_practical_salinity$|salinity$|salt$|sal$|s.sea_water_practical_salinity$'},
 'u': {'standard_name': 'eastward_sea_water_velocity$|sea_water_x_velocity|surface_eastward_sea_water_velocity',
  'name': 'eastward_sea_water_velocity|sea_water_x_velocity|uo'},
 'v': {'standard_name': 'northward_sea_water_velocity$|sea_water_y_velocity|surface_northward_sea_water_velocity',
  'name': 'northward_sea_water_velocity|sea_water_y_velocity|vo'},
 'wind_speed': {'standard_name': 'wind_speed$'}}

In the following simple Dataset, a variable called "temperature" is recognized and callable by the custom name defined previously ("temp") because it compares the name and other metadata to previously-defined criteria for matching.

In [44]:
ds = xr.Dataset()
ds["temperature"] = ("dim", np.arange(10), {"units": "degree_Fahrenheit"})
ds.cf['temp']

#### Units

The Datasets are ensured to have known units by converting, so comparisons with the models are appropriate.

In [45]:
ds = ds.pint.quantify()
ds.cf['temp'].pint.to('degree_Celsius')

0,1
Magnitude,[-17.777777777777743 -17.2222222222222 -16.66666666666663  -16.111111111111086 -15.555555555555543 -14.999999999999943  -14.4444444444444 -13.888888888888857 -13.333333333333314  -12.777777777777715]
Units,degree_Celsius


#### QC

Basic quality control is done for range testing of data. Currently, the output is available as datasets of flags and as a summary report (`verbose=True`).

In [18]:
filenames = ['/Users/kthyng/Downloads/ANIMIDA_III_BeaufortSea_2014-2015/kasper-netcdf/ANIMctd14.nc']
data = odg.Gateway(readers=odg.local, local={'filenames': filenames})
data_qc = data.qc(verbose=True)

ANIMctd14.nc
tem_qc
Flag == 4 (FAIL): 74825
Flag == 1 (GOOD): 15634
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0
sal_qc
Flag == 4 (FAIL): 75119
Flag == 1 (GOOD): 15340
Flag == 9 (MISSING): 0
Flag == 3 (SUSPECT): 0
Flag == 2 (UNKNOWN): 0


QC can be run on a specific dataset_id or ids too, where the dataset_ids have to be input as nested lists that match the heirarchy of the sources:

In [20]:
data.qc(dataset_ids=[['ANIMctd14.nc']])

[{'ANIMctd14.nc': <xarray.Dataset>
  Dimensions:   (nzmax: 1587, profile: 57)
  Coordinates:
      time      (profile) datetime64[ns] 2014-08-07T02:02:34.000002890 ... 2014...
      lat       (profile) float64 71.27 71.23 71.18 71.12 ... 70.38 70.45 70.46
      lon       (profile) float64 -152.2 -152.3 -152.4 ... -146.0 -145.8 -145.8
      pressure  (profile, nzmax) float64 2.187 2.399 ... -9.999e+03 -9.999e+03
  Dimensions without coordinates: nzmax, profile
  Data variables:
      tem       (profile, nzmax) float64 1.625 1.589 ... -9.999e+03 -9.999e+03
      sal       (profile, nzmax) float64 24.85 24.85 ... -9.999e+03 -9.999e+03
      tem_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4
      sal_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 4 1 ... 4 4 4 4 4 4 4 4 4 4
  Attributes: (12/35)
      Conventions:                CF-1.6
      Metadata_Conventions:       Unidata Dataset Discovery v1.0
      featureType:                profile
      cdm_data_type:    

Closer examination of the data, below, indicates that the missing values in the data are presented in the QC check as Failing, which is why there are so many values coming through as FAIL.

In [47]:
data_qc

[{'ANIMctd14.nc': <xarray.Dataset>
  Dimensions:   (nzmax: 1587, profile: 57)
  Coordinates:
      time      (profile) datetime64[ns] 2014-08-07T02:02:34.000002890 ... 2014...
      lat       (profile) float64 71.27 71.23 71.18 71.12 ... 70.38 70.45 70.46
      lon       (profile) float64 -152.2 -152.3 -152.4 ... -146.0 -145.8 -145.8
      pressure  (profile, nzmax) float64 2.187 2.399 ... -9.999e+03 -9.999e+03
  Dimensions without coordinates: nzmax, profile
  Data variables:
      tem       (profile, nzmax) float64 1.625 1.589 ... -9.999e+03 -9.999e+03
      sal       (profile, nzmax) float64 24.85 24.85 ... -9.999e+03 -9.999e+03
      tem_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4
      sal_qc    (profile, nzmax) uint8 1 1 1 1 1 1 1 1 4 1 ... 4 4 4 4 4 4 4 4 4 4
  Attributes: (12/35)
      Conventions:                CF-1.6
      Metadata_Conventions:       Unidata Dataset Discovery v1.0
      featureType:                profile
      cdm_data_type:    