# Introduction to Timeseries datasets

> Note: you will be best served by familiarizing yourself with the more basic notebooks _Introduction to Argovis_ and _Intro to Argovis' Grid API_ before following this notebook.

The generic point schema used by Argovis for point-like data such as Argo profiles and grids works well for data that can be feasilby captured as documents with unique latitude, longitude, and timestamps. However, when considering higher-resolution datasets, indexing independent documents for each such coordinate triple can dramatically exceed the scale of computing resources the point data above requires; for example, while Argo has roughly 3 million such profile documents to consider at the time of writing, a global, quarter-degree grid measured daily for 30 years (a typical scale for satellite products) would have on the order of *10 billion* such documents. In order to represent, index and serve such high-resolution grids on similar compute infrastructure to the point data, we make a minor modification to the generic point schema to form the *generic timeseries schema*:

 - Vectors in the ``data`` object represent surface measurements, estimates or flags as an ordered timeseries.
 - The ``data`` document no longer has a single ``timestamp`` key, as the data within corresponds to many timestamps.
 - The ``metadata`` or ``data`` document must bear a ``timeseries`` key, which is an ordered list of timestamps corresponding to the times associated with each element in the ``data`` vectors.

The observant reader will notice that this is very similar to the gridded products which have a ``levels`` key indicating the model depths for each entry in their ``data`` vectors. All other aspects of the generic schema remain consistent between point and timeseries datasets. In this notebook, we'll illustrate the unique features of a couple of timeseries datasets; for all other details, the reader is encouraged to apply what they learned from other Argovis API examples, as most query filters and behaviors remain identical between point and timeseries datasets.

## Setup

In addition to importing a few python packages, make sure to plug in your Argovis API key for API_KEY in the next cell. If you don't have a free Argovis API key yet, get one at https://argovis-keygen.colorado.edu/.

In [1]:
from argovisHelpers import helpers as avh

from Argovis_tasks_helpers import get_route,list_values_for_parameter_to_api_query,show_variable_names_for_collections

API_ROOT='https://argovis-api.colorado.edu/'
API_KEY=''

In [2]:
# for a list of collections, please see the Argovis swagger page

#### in the following we set parameters to plot different gridded products
selection_params = {}
#+++ example to use Argo profile data and the glodap gridded product (which provides time mean fields)
selection_params['collections']  = ['timeseries/noaasst',
                                    'timeseries/copernicussla',
                                   'timeseries/ccmpwind',
                                   'grids/rg09',
                                   'grids/glodap',
                                   'grids/kg21']
#+++


In [4]:
# for each selected collection, we list the variables that are available
vars_lists = show_variable_names_for_collections(collections_list=selection_params['collections'],API_KEY=API_KEY,verbose=True)


https://argovis-api.colorado.edu/timeseries/noaasst/vocabulary?parameter=data
>>>>> timeseries/noaasst
['sst']
https://argovis-api.colorado.edu/timeseries/copernicussla/vocabulary?parameter=data
>>>>> timeseries/copernicussla
['sla', 'adt', 'ugosa', 'ugos', 'vgosa', 'vgos']
https://argovis-api.colorado.edu/timeseries/ccmpwind/vocabulary?parameter=data
>>>>> timeseries/ccmpwind
['uwnd', 'vwnd', 'ws', 'nobs']
https://argovis-api.colorado.edu/grids/rg09/vocabulary?parameter=data
>>>>> grids/rg09
['rg09_salinity', 'rg09_temperature']
https://argovis-api.colorado.edu/grids/glodap/vocabulary?parameter=data
>>>>> grids/glodap
['Cant', 'Cant_Input_N', 'Cant_Input_mean', 'Cant_Input_std', 'Cant_error', 'Cant_relerr', 'NO3', 'NO3_Input_N', 'NO3_Input_mean', 'NO3_Input_std', 'NO3_error', 'NO3_relerr', 'OmegaA', 'OmegaA_Input_N', 'OmegaA_Input_mean', 'OmegaA_Input_std', 'OmegaA_error', 'OmegaA_relerr', 'OmegaC', 'OmegaC_Input_N', 'OmegaC_Input_mean', 'OmegaC_Input_std', 'OmegaC_error', 'OmegaC_rel

In [None]:
# indicate the variable of interest for each collection
selection_params['varnames']     = ['doxy', 'oxygen']
selection_params['varnames_qc']  = [',1', ''] #[',1', ''] # argoqc = 1 is best quality
selection_params['varname_title']     = 'Oxygen'


In [10]:
# for each collection, we show the metadata
for icollection in selection_params['collections']:
    metaQuery = {'id': icollection.split('/')[1]}
#     print(icollection.split('/')[0])
#     print(icollection.split('/')[1])
    meta = avh.query(icollection.split('/')[0]+'/meta', options=metaQuery, apikey=API_KEY, apiroot=get_route(icollection),verbose=True)
    print('--> '+icollection)
    try:
        print(meta.keys())
        print(meta)
    except:
        if not meta:
            print(meta)
        else:
            for imeta in meta:
                print(imeta.keys())
        

https://argovis-api.colorado.edu/timeseries/meta?id=noaasst
--> timeseries/noaasst
dict_keys(['_id', 'data_type', 'data_info', 'date_updated_argovis', 'timeseries', 'source', 'lattice'])
https://argovis-api.colorado.edu/timeseries/meta?id=copernicussla
--> timeseries/copernicussla
dict_keys(['_id', 'data_type', 'data_info', 'date_updated_argovis', 'timeseries', 'source', 'tpa_correction', 'lattice'])
https://argovis-api.colorado.edu/timeseries/meta?id=ccmpwind
--> timeseries/ccmpwind
dict_keys(['_id', 'data_type', 'data_info', 'date_updated_argovis', 'timeseries', 'source', 'lattice'])
https://argovis-api.colorado.edu/grids/meta?id=rg09
--> grids/rg09
dict_keys(['code', 'message'])
{'code': 404, 'message': 'No grid product matching ID rg09'}
https://argovis-api.colorado.edu/grids/meta?id=glodap
--> grids/glodap
[]
https://argovis-api.colorado.edu/grids/meta?id=kg21
--> grids/kg21
dict_keys(['code', 'message'])
{'code': 404, 'message': 'No grid product matching ID kg21'}


In [None]:
to katie: no need to read below

# NOAA Sea surface temperature timeseries

Argovis indexes the weekly average sea surface temperature on a 1 degree grid as provided by NOAA via [https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.html](https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.html). Let's start by having a look at the metadata for this collection:

In [None]:
sstMetaQuery = {
    'id': 'noaasst' 
}

sstMeta = avh.query('timeseries/meta', options=sstMetaQuery, apikey=API_KEY, apiroot=API_ROOT,verbose=True)
print(sstMeta)

We can see from the usual `data_info` that this dataset contains one variable called `sst` corresponding to weekly mean sea surface temperature. A feature unique to timeseries datasets is that the metadata document (of which there is one per dataset) contains a `timeseries` key; this lists all the timesteps for all the timeseries in the dataset.

Additionally, all metadata documents for data products interpolated to a longitude/latitude grid also include a `lattice` key that describes the structure of the grid in latitude and longitude.

> **What does lattice center and spacing mean?** Each product with a regular grid indexed by argovis describes its grid with a centerpoint, which is an arbitrary point on the grid close to [0,0] denoted as [longitude, latitude]. Other grid points are found stepping along by the amount in `lattice.spacing`, denoted as [longitude step, latitude step].

We can also have a look at a corresponding data document, ID'ed as `<longitude>_<latitude>`:

In [None]:
sstQuery = {
    'id': '14.5_39.5',
    'data': 'sst'
}

sst = avh.query('timeseries/noaasst', options=sstQuery, apikey=API_KEY, apiroot=API_ROOT)
print(sst)

The `data` key here is structured according to `data_info` like all other Argovis datasets; the elements correspond to the timestamps in order as found on the metadata document. Asides from looking at `data_info`, the same vocabulary routes seen in other Argovis data products also exist for timeseries. As always, use `enum` to see the options, and then drill into any one of them individually like so:

In [None]:
vocab = {
    'parameter': 'enum'
}

avh.query('timeseries/noaasst/vocabulary', options=vocab, apikey=API_KEY, apiroot=API_ROOT)

In [None]:
vocab = {
    'parameter': 'data'
}

avh.query('timeseries/noaasst/vocabulary', options=vocab, apikey=API_KEY, apiroot=API_ROOT)

Going back to our data query, if instead we provide a time range:

In [None]:
sstQuery = {
    'id': '14.5_39.5',
    'data': 'sst',
    'startDate': '1993-01-01T00:00:00Z',
    'endDate': '1993-02-01T00:00:00Z'
}

sst = avh.query('timeseries/noaasst', options=sstQuery, apikey=API_KEY, apiroot=API_ROOT)
print(sst)

we get a `timeseries` key appended to the data document to indicate the timestamps of the filtered timeseries; note this is in close analogy to how levels are filtered in Argovis' representation of Argo grids, for example.

## Zonal and meridional area-weighted averages for timeseries data

Much like gridded data, timeseries data can be arranged to easily compute zonal and meridional averages, with area weighting. Lets start by downloading a year of data for a region in the North Atlantic:

In [None]:
sstQuery = {
  "startDate": '2002-01-01T00:00:00Z',
  "endDate": '2012-01-01T00:00:00Z',
  "polygon": [[-50,50],[-50,55],[-45,55],[-45,50],[-50,50]],
  "data": 'sst'
}
sst = avh.query('timeseries/noaasst', options=sstQuery, apikey=API_KEY, apiroot=API_ROOT)

> **Temporospatial limits of timeseries queries**: because timeseries documents contain information for every timestep in the series, we currently support queries on only small geographic areas, about 50 square degrees at the equator. However, long time duration queries like the one above are well supported. You can, of course, tile multipl requests to cover an arbitrary region.

If we then arrange these data documents into a dataframe with columns for longitude, latitude, timestamp and measurement, we can compute and plot area-weighted meridional and zonal averages with our helpers:

In [None]:
df = helpers.level_df(sst, 
                      ['sst', 'longitude', 'latitude'], 
                      timesteps=sst[0]['timeseries'], 
                      index=["latitude","longitude","timestamp"]
                     )
ds = df.to_xarray()

In [None]:
sst_mer = helpers.regional_mean(ds, form='meridional')
sst_mer['sst'].plot(y="timestamp")

In [None]:
sst_zon = helpers.regional_mean(ds, form='zonal')
sst_zon['sst'].plot(y="timestamp")

# Copernicus sea level anomaly timeseries

Argovis indexes a quarter-degree grid of sea level anomalies and absolute dynamic topologies from [https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-sea-level-global](https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-sea-level-global); note that the original daily data reported at this link has been averaged down to weekly averages with timestamps aligned with the NOAA SST dataset described above, for scale and comparison purposes.

Let's again start by looking at the single metadata document for this collection:

In [None]:
slaMetaQuery = {
    'id': 'copernicussla' 
}

slaMeta = avh.query('timeseries/meta', options=slaMetaQuery, apikey=API_KEY, apiroot=API_ROOT)
print(slaMeta)

Identical in structure to the SST metadata, though this dataset contains two data variables: the sea height anomaly `sla` as compared to the local average sea height over the reference period 1993-2012, and the absolute sea height including this anomaly, `adt`. We can query this dataset much the same as any other timeseries data:

In [None]:
slaQuery = {
    'id': '-46.875_35.625',
    'data': 'all',
    'startDate': '1993-01-01T00:00:00Z',
    'endDate': '1993-02-01T00:00:00Z'
}

sla = avh.query('timeseries/copernicussla', options=slaQuery, apikey=API_KEY, apiroot=API_ROOT)
print(sla)

Here's an example of making an xarray dataset out of SLA data, similar to the SST example above:

In [None]:
slaQuery_reg = {
  "startDate": '2002-01-01T00:00:00Z',
  "endDate": '2012-01-01T00:00:00Z',
  "polygon": [[-50,50],[-50,55],[-45,55],[-45,50],[-50,50]],
  "data": 'sla'
}

sla_reg = avh.query('timeseries/copernicussla', options=slaQuery_reg, apikey=API_KEY, apiroot=API_ROOT)

In [None]:
df_sla = helpers.level_df(sla_reg, 
                      ['sla', 'longitude', 'latitude'], 
                      timesteps=sla_reg[0]['timeseries'], 
                      index=["latitude","longitude","timestamp"]
                     )
ds_sla = df_sla.to_xarray()

In [None]:
ds_sla

In [None]:
ds_sla['sla'][:,:,0].plot()

# REMSS CCMP wind vector product

Similarly to the sea level anaomaly time series, Argovis indexes a weekly average of the [REMSS CCMP wind vector product](https://www.remss.com/measurements/ccmp/). Have a look at the metadata:

In [None]:
ccmpMetaQuery = {
    'id': 'ccmpwind' 
}

ccmpMeta = avh.query('timeseries/meta', options=ccmpMetaQuery, apikey=API_KEY, apiroot=API_ROOT)
print(ccmpMeta)

Let's find the wind data for the same location and time period as the sea surface heights above:

In [None]:
ccmpQuery = {
    'id': '-46.875_35.625',
    'data': 'all',
    'startDate': '1993-01-01T00:00:00Z',
    'endDate': '1993-02-01T00:00:00Z'
}

ccmp = avh.query('timeseries/ccmpwind', options=ccmpQuery, apikey=API_KEY, apiroot=API_ROOT)
print(ccmp)

We can plot wind speeds in a region and on a date:

In [None]:
params = {
  'startDate': '1993-01-01T00:00:00Z',
  'endDate': '1993-02-01T00:00:00Z',
  "polygon": [[-50,50],[-50,55],[-45,55],[-45,50],[-50,50]],
  "data": 'all'
}

wsdata = avh.query('timeseries/ccmpwind', options=params, apikey=API_KEY, apiroot=API_ROOT)

In [None]:
df = helpers.level_df(wsdata, ['ws', 'longitude', 'latitude'], timesteps=wsdata[0]['timeseries'], index=['longitude', 'latitude', 'timestamp'])
ds = df.to_xarray()

In [None]:
gridmap = ds.loc[{"timestamp":avh.parsetime('1993-01-10T00:00:00.000Z')}]
gridmap['ws'].plot()