# Introduction to Argovis's API

Argovis provides an API that indexes and distributes numerous oceanographic datasets with detailed query parameters, enabling you to search and download only and exactly data of interest. In this notebook, we'll tour some of the standard usage patterns enabled by Argovis.

## Setup: Register an API key

In order to allocate Argovis's limited computing resources fairly, users are encouraged to register and request a free API key. This works like a password that identifies your requests to Argovis. To do so:

 - Visit [https://argovisbeta02.colorado.edu/api/](https://argovisbeta02.colorado.edu/api/)
 - Fill out the form under _New Account Registration_
 - An API key will be emailed to you shortly.
 
Treat this API key like a password - don't share it or leave it anywhere public. If you ever forget it or accidentally reveal it to a third party, see the same website above to change or deactivate your token.

Put your API key in the quotes in the variable below before moving on:

In [1]:
API_ROOT='https://argovisbeta01.colorado.edu/api/'
API_KEY=''

# Argovis data structures

Argovis standard data structures divide measurements into _data_ and _metadata_ documents. Typically, a data document corresponds to measurements or gridded data associated with a discreet temporospatial column - a time, latitude and longitude. A single such document may contain measurements at multiple depths or altitudes, provided they share the same latitude, longitude, and time.

Each of these data documents will refer to a corresponding metadata document that captures additional information about the measurement. Argovis divides information between data and metadata documents in order to minimize redundancy in the data you download: many data documents will point to the same metadata document, allowing you to only download that metadata once. Typically, these metadata groupings will refer to some meaningful characteristic of the data; Argo metadata documents correspond to physical floats, while CCHDO metadata documents correspond to cruises, for example.

For more detail and specifications on the data and metadata documents for each collection, see [https://argovis.colorado.edu/docs/documentation/_build/html/data_management/schema.html](https://argovis.colorado.edu/docs/documentation/_build/html/data_management/schema.html).

# The standard data routes

## What datasets does Argovis index?

Argovis supports several different data sets with the API and data structures described here. They and their corresponding routes are:

 - Argo profiling float data, `/argo`
 - CCHDO ship-based profile data, `/cchdo`
 - tropical cyclone data from HURDAT and JTWC, `/tc`
 - Global Drifter Program data, `/drifters`
 - several gridded products:
   - Roemmich-Gilson total temperature, `/grids/temperature-rg`
   - Roemmich-Gilson total salinity, `/grids/salinity-rg`
   - ocean heat content, `/grids/ohc-kg`
 - Argo float position forecast model data, `/floatPositionForecast`
   
The examples that follow apply equally to all these routes; they all support similar query options and follow similar behavior patterns.

## Using Swagger and the `argovisHelpers` package to download data

In order to successfully explore Argovis data, there are two important tools to introduce in this section: Swagger, our API documentation engine, and `argovisHelpers`, our Python package of fuctions to help you access and interpret Argovis data.

### Using Swagger docs

Argovis' API documentation is found at [https://argovisbeta01.colorado.edu/api/docs/](https://argovisbeta01.colorado.edu/api/docs/). These docs are split into several categories; what follows applies to all categories _not_ marked experimental; the experimental categories are under development and may change or be removed at any time.

Categories have three typical routes:
 - The main _data route_, like `/argo`, or `/cchdo`. These routes provide the data documents for the dataset named in the route.
 - The _metadata route_, like `/argo/meta`. These routes provide the metadata documents referred to by data documents.
 - The _vocabulary route_, like `/argo/vocabulary`. These routes provide lists of possible options for search parameters used in the corresponding data and metadata routes.
 
Click on any of the routes, like `/argo` - a list of possible query string parameters are presented, with a short explanation of what they mean.

If you're familiar with REST APIs, this is enough information for you to construct a query string and issue a request in any programming environment that can facilitate an HTTP GET request. If you're working in Python, we provide a helper library, `argovisHelpers`, to manage these requests for you. Let's try it out by making our first request for Argo data, for profiles found within 100 km of a point in the South Atlantic in May 2011 (users of Python's `requests` module will notice a familiar pattern, providing the query string parameters listed in the Swagger docs and associated values as a dictionary):

In [2]:
from argovisHelpers import helpers as avh

In [3]:
argoSearch = {
    'startDate': '2011-05-01T00:00:00Z',
    'endDate': '2011-06-01T00:00:00Z',
    'center': '-22.5,0',
    'radius': 100
}

argoProfiles = avh.query('argo', options=argoSearch, apikey=API_KEY, apiroot=API_ROOT)

Let's have a look at what we get from the first profile returned:

In [4]:
argoProfiles[0]

{'_id': '4901283_003',
 'geolocation': {'type': 'Point', 'coordinates': [-23.139, -0.154]},
 'basin': 1,
 'timestamp': '2011-05-02T08:26:28.000Z',
 'date_updated_argovis': '2022-09-26T07:17:21.543Z',
 'source': [{'source': ['argo_bgc'],
   'url': 'ftp://ftp.ifremer.fr/ifremer/argo/dac/aoml/4901283/profiles/SD4901283_003.nc',
   'date_updated': '2022-06-29T21:21:10.000Z'},
  {'source': ['argo_core'],
   'url': 'ftp://ftp.ifremer.fr/ifremer/argo/dac/aoml/4901283/profiles/D4901283_003.nc',
   'date_updated': '2018-10-03T14:45:37.000Z'}],
 'cycle_number': 3,
 'geolocation_argoqc': 1,
 'profile_direction': 'A',
 'timestamp_argoqc': 1,
 'vertical_sampling_scheme': 'Primary sampling: averaged []',
 'data_keys_mode': {'doxy': 'D',
  'doxy_argoqc': None,
  'pressure': 'D',
  'pressure_argoqc': None,
  'salinity': 'D',
  'salinity_argoqc': None,
  'salinity_sfile': 'D',
  'salinity_sfile_argoqc': None,
  'temperature': 'D',
  'temperature_argoqc': None,
  'temperature_sfile': 'D',
  'temperature

This is a data document for Argo, matching the specification at [https://argovis.colorado.edu/docs/documentation/_build/html/data_management/schema.html](https://argovis.colorado.edu/docs/documentation/_build/html/data_management/schema.html). It contains the `timestamp` and `geolocation` properties that place this profile geospatially, and other parameters that typically change from point to point.

All data documents bear a `metadata` key, which is a pointer to the appropriate metadata document to find out more about this measurement. Let's fetch that document for this first profile by querying the `argo/meta` route for a doument with an `id` that matches this `metadata` pointer:

In [5]:
metaOptions = {
    'id': argoProfiles[0]['metadata']
}

argoMeta = avh.query('argo/meta', options=metaOptions, apikey=API_KEY, apiroot=API_ROOT)
argoMeta

[{'_id': '4901283_m0',
  'data_type': 'oceanicProfile',
  'data_center': 'AO',
  'instrument': 'profiling_float',
  'pi_name': ['BRECK OWENS'],
  'platform': '4901283',
  'platform_type': 'SOLO_W',
  'fleetmonitoring': 'https://fleetmonitoring.euro-argo.eu/float/4901283',
  'oceanops': 'https://www.ocean-ops.org/board/wa/Platform?ref=4901283',
  'positioning_system': 'GPS',
  'wmo_inst_type': '851'}]

In addition to temporospatial searches, data and metadata routes typically support _category searches_, which are searches for documents that belong to certain categories. Which categories are available to search by changes logically from dataset to dataset; Argo floats can be searched by platform number, for example, while tropical cyclones can be searched by storm name. See the swagger docs for the full set of possibilities for each category; let's now use argo's platform category search to get all profiles collected by the same platform as the first profile above:

In [6]:
platformSearch = {
    'platform': argoMeta[0]['platform']
}

platformProfiles = avh.query('argo', options=platformSearch, apikey=API_KEY, apiroot=API_ROOT)
print(len(platformProfiles))

125


At the time of writing, 125 profiles are found for this platform in this way.

For all category searches, we may wish to know the full list of all possible values a category can take on; for this, there are the _vocabulary_ routes. Let's get a list of all possible Argo platforms we can search by:

In [7]:
platformVocabSearch = {
    'parameter': 'platform'
}

platforms = avh.query('argo/vocabulary', options=platformVocabSearch, apikey=API_KEY, apiroot=API_ROOT)
print(platforms[0:10])

['13857', '13858', '13859', '15819', '15820', '15851', '15852', '15853', '15854', '15855']


Here we just print out the first 10 platform IDs found, but all 17 thousand or so are present.

## Using the `data` query option

The astute reader may have noticed something about the data document shown above: there's no actual measurements included in it! By default, only the non-measurement data is returned, in order to minimize bandwidth consumed; in order to get back actual measurements and their QC flags, we must query and filter including the `data` parameter, the behavior of which we'll see in this section.

### Basic data request

Let's start by asking for one particular profile by ID, and ask for some temperature data to go with:

In [8]:
dataQuery = {
    'id': '4901283_003',
    'data': 'temperature'
}

profile = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(profile[0]['data'])

[{'pressure': 2, 'temperature': 28.669001}, {'pressure': 4, 'temperature': 28.667999}, {'pressure': 6, 'temperature': 28.722}, {'pressure': 8, 'temperature': 28.816}, {'pressure': 10, 'temperature': 28.823}, {'pressure': 12, 'temperature': 28.826}, {'pressure': 14, 'temperature': 28.830999}, {'pressure': 16, 'temperature': 28.783001}, {'pressure': 18, 'temperature': 28.775999}, {'pressure': 20, 'temperature': 28.740999}, {'pressure': 22, 'temperature': 28.694}, {'pressure': 24, 'temperature': 28.551001}, {'pressure': 26, 'temperature': 28.497}, {'pressure': 28, 'temperature': 28.489}, {'pressure': 30, 'temperature': 28.414}, {'pressure': 32, 'temperature': 28.191999}, {'pressure': 34, 'temperature': 28.087999}, {'pressure': 36, 'temperature': 28.044001}, {'pressure': 38, 'temperature': 27.836}, {'pressure': 40, 'temperature': 27.715}, {'pressure': 42, 'temperature': 27.655001}, {'pressure': 44, 'temperature': 27.41}, {'pressure': 46, 'temperature': 27.125999}, {'pressure': 48, 'tempera

We see the returned profile now has a `data` key, and that key holds a list with one entry for each level. By default, each of those per-level entries is a dictionary with the information we requested, in this case `temperature`, and also `pressure`, whether we requested it or not, as pressure information is crucial for contextualizing all other measurements.

### Downloading with minification

While the per-level dictionaries are easy to read, they are very inefficient in terms of bandwidth, especially for profiles with many levels. We can use a compression option to get faster downloads:

In [9]:
dataQuery = {
    'id': '4901283_003',
    'data': 'temperature',
    'compression': 'array'
}

profile = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(profile[0]['data'])

[[28.669001, 2], [28.667999, 4], [28.722, 6], [28.816, 8], [28.823, 10], [28.826, 12], [28.830999, 14], [28.783001, 16], [28.775999, 18], [28.740999, 20], [28.694, 22], [28.551001, 24], [28.497, 26], [28.489, 28], [28.414, 30], [28.191999, 32], [28.087999, 34], [28.044001, 36], [27.836, 38], [27.715, 40], [27.655001, 42], [27.41, 44], [27.125999, 46], [26.805, 48], [26.500999, 50], [26.311001, 52], [26.075001, 54], [25.448, 56], [25.120001, 58], [24.632999, 60], [23.709, 62], [22.400999, 64], [21.893, 66], [21.523001, 68], [21.122, 70], [20.861, 72], [20.657, 74], [20.104, 76], [19.707001, 78], [19.681, 80], [19.573999, 82], [19.396999, 84], [19.181, 86], [18.747, 88], [18.438999, 90], [18.240999, 92], [18.045, 94], [17.882, 96], [17.783001, 98], [17.664, 100], [17.563, 102], [17.474001, 104], [17.370001, 106], [17.250999, 108], [17.006001, 110], [16.861, 112], [16.638, 114], [16.465, 116], [16.337999, 118], [16.142, 120], [15.902, 122], [15.721, 124], [15.585, 126], [15.415, 128], [15

The `data` key still contains a list with one entry per level, but those entries are now lists with the appropriate numbers packed in them, without the redundant key names repeated every time. In order to tell which number is which, look at the `data_keys` property, and find the `units` similarly:

In [10]:
print(profile[0]['data_keys'])
print(profile[0]['units'])

['temperature', 'pressure']
['degree_Celsius', 'decibar']


So for every level in the `data` list, the first number is a temperature in degrees C, and the second is a pressure in decibar. If you'd like to re-inflate this data list to the dictionary style list after download, use the `data_inflate` helper:

In [11]:
avh.data_inflate(profile[0])

[{'temperature': 28.669001, 'pressure': 2},
 {'temperature': 28.667999, 'pressure': 4},
 {'temperature': 28.722, 'pressure': 6},
 {'temperature': 28.816, 'pressure': 8},
 {'temperature': 28.823, 'pressure': 10},
 {'temperature': 28.826, 'pressure': 12},
 {'temperature': 28.830999, 'pressure': 14},
 {'temperature': 28.783001, 'pressure': 16},
 {'temperature': 28.775999, 'pressure': 18},
 {'temperature': 28.740999, 'pressure': 20},
 {'temperature': 28.694, 'pressure': 22},
 {'temperature': 28.551001, 'pressure': 24},
 {'temperature': 28.497, 'pressure': 26},
 {'temperature': 28.489, 'pressure': 28},
 {'temperature': 28.414, 'pressure': 30},
 {'temperature': 28.191999, 'pressure': 32},
 {'temperature': 28.087999, 'pressure': 34},
 {'temperature': 28.044001, 'pressure': 36},
 {'temperature': 27.836, 'pressure': 38},
 {'temperature': 27.715, 'pressure': 40},
 {'temperature': 27.655001, 'pressure': 42},
 {'temperature': 27.41, 'pressure': 44},
 {'temperature': 27.125999, 'pressure': 46},
 {'

> **Always use `compression=array`**: this setting can dramatically reduce your download times for Argovis data, and it doesn't cause any data losses as it's really just a minification, rather than a true compression. Once comfortable with the format, consider using this option for _all_ data requests.

What we've seen above allows us to be very targeted in the data we download; rather than being forced to spend time and bandwidth downloading data we aren't interested in, we can focus on just what we need. On the other hand, somtimes we really do want everything, and for that there's `data=all`:

In [12]:
dataQuery = {
    'id': '4901283_003',
    'data': 'all',
    'compression': 'array'
}

profile = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
avh.data_inflate(profile[0])[0:10]

[{'doxy': None,
  'doxy_argoqc': 4,
  'pressure': 2,
  'pressure_argoqc': 1,
  'salinity': 35.574966,
  'salinity_argoqc': 1,
  'salinity_sfile': 35.574966,
  'salinity_sfile_argoqc': 1,
  'temperature': 28.669001,
  'temperature_argoqc': 1,
  'temperature_sfile': 28.669001,
  'temperature_sfile_argoqc': 1},
 {'doxy': None,
  'doxy_argoqc': 4,
  'pressure': 4,
  'pressure_argoqc': 1,
  'salinity': 35.573761,
  'salinity_argoqc': 1,
  'salinity_sfile': 35.573761,
  'salinity_sfile_argoqc': 1,
  'temperature': 28.667999,
  'temperature_argoqc': 1,
  'temperature_sfile': 28.667999,
  'temperature_sfile_argoqc': 1},
 {'doxy': None,
  'doxy_argoqc': 4,
  'pressure': 6,
  'pressure_argoqc': 1,
  'salinity': 35.626602,
  'salinity_argoqc': 1,
  'salinity_sfile': 35.626602,
  'salinity_sfile_argoqc': 1,
  'temperature': 28.722,
  'temperature_argoqc': 1,
  'temperature_sfile': 28.722,
  'temperature_sfile_argoqc': 1},
 {'doxy': None,
  'doxy_argoqc': 4,
  'pressure': 8,
  'pressure_argoqc': 1,

### Filtering behavior of data requests

Note that adding a specific data filter is a _firm requirement_ that all returned profiles have some meaningful data for _all_ variables listed. Try demanding chlorophyl-a in addition to temperature for our current profile of interest:

In [13]:
dataQuery = {
    'id': '4901283_003',
    'data': 'temperature,chla',
    'compression': 'array'
}

profile = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(profile)

[]


We get nothing in our array of profiles; even though we asked for profile id '4901283_003' and we know it exists, `data=temperature,chla` filters our query down to _only_ profiles that have both temperature and chla reported; since the profile requested doesn't have any chla measurements, it is dropped from the returns in this case. This is useful if you only want to download profiles that definitely have data of interest; for example, try the same thing on our regional search from above:

In [14]:
argoSearch = {
    'startDate': '2011-05-01T00:00:00Z',
    'endDate': '2011-06-01T00:00:00Z',
    'center': '-22.5,0',
    'radius': 100,
    'data': 'temperature,chla'
}

argoProfiles = avh.query('argo', options=argoSearch, apikey=API_KEY, apiroot=API_ROOT)
print(len(argoProfiles))

0


Evidently Argo made no chlorophyl-a measurements in May 2011 within 100 km of our point of interest - a fact which we found using the data api without having to download or reduce any data at all. One final point on data filtering in this manner: it's not enough for a profile to nominally have a variable defined for it; it must have at least one non-null value reported for that variable somewhere in the search results. For example, when we did `data=all` for our profile of interest above, we saw dissolved oxygen, `doxy`, was defined for it. But:

In [15]:
dataQuery = {
    'id': '4901283_003',
    'data': 'doxy',
    'compression': 'array'
}

profile = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(profile)

[]


Again our search is filtered down to nothing, since every level in that profile reported `None` for `doxy`.

### Search negation

Let's find some profiles that do actually have dissolved oxygen in them, this time with a slightly different geography search: let's look for everything in August 2017 within a polygon region, defined as a list of `[longitude, latitude]` points: 

In [16]:
dataQuery = {
    'startDate': '2017-08-01T00:00:00Z',
    'endDate': '2017-09-01T00:00:00Z',
    'polygon': [[-150,-30],[-155,-30],[-155,-35],[-150,-35],[-150,-30]],
    'data': 'doxy',
    'compression': 'array'
}

profiles = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(profiles)

[{'_id': '5905107_001', 'geolocation': {'type': 'Point', 'coordinates': [-154.974, -32.415]}, 'basin': 2, 'timestamp': '2017-08-11T11:57:19.001Z', 'date_updated_argovis': '2022-09-22T19:27:47.409Z', 'source': [{'source': ['argo_bgc'], 'url': 'ftp://ftp.ifremer.fr/ifremer/argo/dac/aoml/5905107/profiles/SD5905107_001.nc', 'date_updated': '2022-07-09T07:14:33.000Z'}, {'source': ['argo_core'], 'url': 'ftp://ftp.ifremer.fr/ifremer/argo/dac/aoml/5905107/profiles/D5905107_001.nc', 'date_updated': '2019-06-24T15:29:23.000Z'}], 'cycle_number': 1, 'geolocation_argoqc': 1, 'profile_direction': 'A', 'timestamp_argoqc': 1, 'vertical_sampling_scheme': 'Primary sampling: mixed [deeper than nominal 985dbar: discrete; nominal 985dbar to surface: 2dbar-bin averaged]', 'data': [[235.335724, 7.6], [235.327026, 13.07], [235.418045, 17.720001], [235.212158, 22.02], [235.242828, 26.68], [235.235306, 31.320002], [235.273743, 36.709999], [235.165115, 41.73], [235.16153, 48.260002], [235.032471, 54.619999], [23

We find one profile with meaningful dissolved oxygen data in the region of interest.

The `data` key also accepts _tilde negation_, meaning 'filter for profiles that _don't_ contain this data', for example:

In [17]:
dataQuery = {
    'startDate': '2017-08-01T00:00:00Z',
    'endDate': '2017-09-01T00:00:00Z',
    'polygon': [[-150,-30],[-155,-30],[-155,-35],[-150,-35],[-150,-30]],
    'data': 'temperature,~doxy',
    'compression': 'array'
}

profiles = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(len(profiles))

19


We get a collection of profiles that appear in the region of interest, and have temperature but _not_ dissolved oxygen. In this way, we can split up our downloads into groups of related and interesting profiles without re-downloading the same profiles over and over.

### Minimal data responses

Sometimes, we might want to use the `data` filter as we've seen to confine our attention to only profiles that have data of interest, but we're only interested in general or metadata about those measurements, and don't want to download the actual measurements; for this, we can add the `except-data-values` token:

In [18]:
dataQuery = {
    'startDate': '2017-08-01T00:00:00Z',
    'endDate': '2017-09-01T00:00:00Z',
    'polygon': [[-150,-30],[-155,-30],[-155,-35],[-150,-35],[-150,-30]],
    'data': 'doxy,except-data-values'
}

profiles = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(profiles)

[{'_id': '5905107_001', 'geolocation': {'type': 'Point', 'coordinates': [-154.974, -32.415]}, 'basin': 2, 'timestamp': '2017-08-11T11:57:19.001Z', 'date_updated_argovis': '2022-09-22T19:27:47.409Z', 'source': [{'source': ['argo_bgc'], 'url': 'ftp://ftp.ifremer.fr/ifremer/argo/dac/aoml/5905107/profiles/SD5905107_001.nc', 'date_updated': '2022-07-09T07:14:33.000Z'}, {'source': ['argo_core'], 'url': 'ftp://ftp.ifremer.fr/ifremer/argo/dac/aoml/5905107/profiles/D5905107_001.nc', 'date_updated': '2019-06-24T15:29:23.000Z'}], 'cycle_number': 1, 'geolocation_argoqc': 1, 'profile_direction': 'A', 'timestamp_argoqc': 1, 'vertical_sampling_scheme': 'Primary sampling: mixed [deeper than nominal 985dbar: discrete; nominal 985dbar to surface: 2dbar-bin averaged]', 'data_keys_mode': {'doxy': 'D', 'pressure': 'D'}, 'data_keys': ['doxy', 'pressure'], 'units': {'doxy': 'micromole/kg', 'pressure': 'decibar'}, 'metadata': '5905107_m0'}]


Note that specifying only `'data': 'except-data-values'` is the same as just leaving the `data` query key off completely; the purpose of this option is to allow you to filter by data, but then only get back the lightweight non-measurement values. 

If we want an even more minimal response, we can use the `compression=minimal` option:

In [19]:
dataQuery = {
    'startDate': '2017-08-01T00:00:00Z',
    'endDate': '2017-09-01T00:00:00Z',
    'polygon': [[-150,-30],[-155,-30],[-155,-35],[-150,-35],[-150,-30]],
    'data': 'doxy',
    'compression': 'minimal'
}

profiles = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
print(profiles)

[['5905107_001', -154.974, -32.415, '2017-08-11T11:57:19.001Z', ['argo_bgc', 'argo_core']]]


With `compression: minimal`, for each data document we get only a minimal amount of information describing it; each data product has a slightly different minimal representation tailored to suit.

### Data-adjacent variables

Each dataset includes some _data-adjacent variables_: things like `units` for all datasets, or `data_keys_mode` for Argo, where there's some piece of metadata associated with each measurement. Different data products may assign these either to data documents or metadata documents, depending on how frequently they change from measurement to measurement; as always, see [https://argovis.colorado.edu/docs/documentation/_build/html/data_management/schema.html](https://argovis.colorado.edu/docs/documentation/_build/html/data_management/schema.html) for details.

One important note about data-adjacent variables: they will be coerced onto data documents if you perform data filtering! For example, consider a tropical cyclone data document:

In [20]:
tcQuery = {
    'id': 'AL011851_18510625000000'
}

tc = avh.query('tc', options=tcQuery, apikey=API_KEY, apiroot=API_ROOT)
tc

[{'_id': 'AL011851_18510625000000',
  'metadata': 'AL011851',
  'geolocation': {'type': 'Point', 'coordinates': [-94.8, 28]},
  'basin': 1,
  'timestamp': '1851-06-25T00:00:00.000Z',
  'record_identifier': '',
  'class': 'HU'}]

And its corresponding metadata document:

In [21]:
tcMetaQuery = {
    'id': 'AL011851'
}

tcMeta = avh.query('tc/meta', options=tcMetaQuery, apikey=API_KEY, apiroot=API_ROOT)
tcMeta

[{'_id': 'AL011851',
  'data_type': 'tropicalCyclone',
  'data_keys': ['wind', 'surface_pressure'],
  'units': ['kt', 'mb'],
  'date_updated_argovis': '2022-10-04T15:43:04.671Z',
  'source': [{'source': ['tc_hurdat']}],
  'name': 'UNNAMED',
  'num': 1}]

Note units and data keys are on the metadata document. But now let's query the same tropical cyclone document, but filter for wind data:

In [22]:
tcQuery = {
    'id': 'AL011851_18510625000000',
    'data': 'wind'
}

tc = avh.query('tc', options=tcQuery, apikey=API_KEY, apiroot=API_ROOT)
tc

[{'_id': 'AL011851_18510625000000',
  'metadata': 'AL011851',
  'geolocation': {'type': 'Point', 'coordinates': [-94.8, 28]},
  'basin': 1,
  'timestamp': '1851-06-25T00:00:00.000Z',
  'data': [{'wind': 80}],
  'record_identifier': '',
  'class': 'HU',
  'data_keys': ['wind'],
  'units': {'wind': 'kt'}}]

`data_keys` and `units` are pulled into the data document, to make it easy to understand the `data` key's measurement now that we've filtered away some of (specifically, the `surface_pressure`) what the metadata document is referring to.

## Point versus Gridded data

Point data (like Argo) and gridded data present very similar schema structures and API query options, but there are some important differences to be aware of. Firstly, the `data` key is packed differently between the two; consider the raw `data` array of one of the queries made above:

In [23]:
dataQuery = {
    'id': '4901283_003',
    'data': 'all'
}

profile = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
profile[0]['data']

[{'doxy_argoqc': 4,
  'pressure': 2,
  'pressure_argoqc': 1,
  'salinity': 35.574966,
  'salinity_argoqc': 1,
  'salinity_sfile': 35.574966,
  'salinity_sfile_argoqc': 1,
  'temperature': 28.669001,
  'temperature_argoqc': 1,
  'temperature_sfile': 28.669001,
  'temperature_sfile_argoqc': 1},
 {'doxy_argoqc': 4,
  'pressure': 4,
  'pressure_argoqc': 1,
  'salinity': 35.573761,
  'salinity_argoqc': 1,
  'salinity_sfile': 35.573761,
  'salinity_sfile_argoqc': 1,
  'temperature': 28.667999,
  'temperature_argoqc': 1,
  'temperature_sfile': 28.667999,
  'temperature_sfile_argoqc': 1},
 {'doxy_argoqc': 4,
  'pressure': 6,
  'pressure_argoqc': 1,
  'salinity': 35.626602,
  'salinity_argoqc': 1,
  'salinity_sfile': 35.626602,
  'salinity_sfile_argoqc': 1,
  'temperature': 28.722,
  'temperature_argoqc': 1,
  'temperature_sfile': 28.722,
  'temperature_sfile_argoqc': 1},
 {'doxy_argoqc': 4,
  'pressure': 8,
  'pressure_argoqc': 1,
  'salinity': 35.752064,
  'salinity_argoqc': 1,
  'salinity_sf

In this case, the list index runs over monotonically increasing depth levels, and the dictionaries at each level list the data measurements at that level. Now let's look at something similar for a piece of gridded data:

In [24]:
dataQuery = {
    'id': '20040115000000_29.5_-64.5',
    'data': 'all',
    'compression': 'array'
}

grid = avh.query('grids/grid_1_1_0.5_0.5', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
grid[0]['data']

[[0.035,
  -0.017,
  -0.156,
  -0.534,
  -1.112,
  -1.45,
  -1.545,
  -1.509,
  -1.284,
  -0.925,
  -0.491,
  -0.067,
  0.37,
  0.705,
  0.94,
  1.108,
  1.22,
  1.269,
  1.328,
  1.394,
  1.417,
  1.422,
  1.425,
  1.429,
  1.426,
  1.418,
  1.404,
  1.387,
  1.376,
  1.371,
  1.352,
  1.334,
  1.308,
  1.262,
  1.205,
  1.15,
  1.089,
  1.029,
  0.972,
  0.918,
  0.867,
  0.815,
  0.771,
  0.728,
  0.691,
  0.653,
  0.621,
  0.585,
  0.555,
  0.529,
  0.502,
  0.468,
  0.433,
  0.384,
  0.335,
  0.288,
  0.238,
  0.209],
 [33.702,
  33.718998,
  33.767002,
  33.876999,
  34.02,
  34.110001,
  34.160999,
  34.208,
  34.261002,
  34.318001,
  34.376999,
  34.435001,
  34.490002,
  34.532001,
  34.563,
  34.587997,
  34.606998,
  34.618,
  34.632999,
  34.651001,
  34.664001,
  34.674,
  34.681,
  34.688999,
  34.694,
  34.699001,
  34.702,
  34.704998,
  34.711002,
  34.707001,
  34.709003,
  34.711002,
  34.712002,
  34.711998,
  34.712997,
  34.712997,
  34.710999,
  34.709,
  34.707

We see the gridded `data` key is effectively the transpose of the point data key: a dictionary lists measurement variables, and each of those measurement keys presents a list of measurement, in depth order (see `levels` in the corresponding grid metadata document to see what those levels actually are). This was done to accommodate the fact that many different gridded products may occupy the same longitude/latitude grid, but all have different level spectra, unlike point profile data which generally makes the same measurements at every level.

Additionally, a given grid might have data from multiple upstream products; therefore, its `metadata` key is a dictionary or array, with entries for each data variable, similar to `units`. For example, from the grid data document we downloaded above:

In [25]:
avh.data_inflate(grid[0], dataschema='grid')

{'rg09_temperature': [0.035,
  -0.017,
  -0.156,
  -0.534,
  -1.112,
  -1.45,
  -1.545,
  -1.509,
  -1.284,
  -0.925,
  -0.491,
  -0.067,
  0.37,
  0.705,
  0.94,
  1.108,
  1.22,
  1.269,
  1.328,
  1.394,
  1.417,
  1.422,
  1.425,
  1.429,
  1.426,
  1.418,
  1.404,
  1.387,
  1.376,
  1.371,
  1.352,
  1.334,
  1.308,
  1.262,
  1.205,
  1.15,
  1.089,
  1.029,
  0.972,
  0.918,
  0.867,
  0.815,
  0.771,
  0.728,
  0.691,
  0.653,
  0.621,
  0.585,
  0.555,
  0.529,
  0.502,
  0.468,
  0.433,
  0.384,
  0.335,
  0.288,
  0.238,
  0.209],
 'rg09_salinity': [33.702,
  33.718998,
  33.767002,
  33.876999,
  34.02,
  34.110001,
  34.160999,
  34.208,
  34.261002,
  34.318001,
  34.376999,
  34.435001,
  34.490002,
  34.532001,
  34.563,
  34.587997,
  34.606998,
  34.618,
  34.632999,
  34.651001,
  34.664001,
  34.674,
  34.681,
  34.688999,
  34.694,
  34.699001,
  34.702,
  34.704998,
  34.711002,
  34.707001,
  34.709003,
  34.711002,
  34.712002,
  34.711998,
  34.712997,
  34.71

In [26]:
grid[0]['metadata']

['rg09_temperature_200401_Total', 'rg09_salinity_200401_Total']