# Intro to Argovis' Grid API

Argovis offers a growing list of gridded products, indexed and downloadable through its API. In this notebook, we'll illustrate some basic operations and handling of this data.

## Setup

In addition to importing a few python packages, make sure to plug in your Argovis API key for `API_KEY` in the next cell. If you don't have a free Argovis API key yet, get one at https://argovis-keygen.colorado.edu/.

In [56]:
import requests, xarray, pandas, math
from datetime import datetime, timedelta

API_KEY=''

## Downloading Gridded Data

Argovis offers gridded data at its `/grids` endpoint. Available query string parameters are:

 - `gridName` (mandatory, one of rgTempTotal, rgPsalTotal, ohc): name of gridded product to search.
 - `startDate` (mandatory, format YYYY-MM-DDTHH:MM:SSZ at GMT0): beginning of time window to query.
 - `endDate` (mandatory, format YYYY-MM-DDTHH:MM:SSZ at GMT0): end of time window to query.
 - `polygon` (mandatory, format [[lon0,lat0],[lon1,lat1],...,[lonN,latN],[lon0,lat0]]): geographical region to query.
 - `presRange` (optional, format minimum_pressure,maximum_pressure): pressure window to filter for.
 
Let's try a simple request to download a piece of the Roemmich-Gilson Argo climatology, a 2 degree box over the North Atlantic from the first quarter of 2012:

In [19]:
params = {
  "gridName": 'rgTempTotal',
  "startDate": '2012-01-01T00:00:00Z',
  "endDate": '2012-04-01T00:00:00Z',
  "polygon": '[[ -73, 40],[ -73, 38],[ -71, 38],[ -71, 40],[ -73, 40]]'
}

r = requests.get('https://argovis-api.colorado.edu/grids', params=params, headers={'x-argokey': API_KEY})
rgdata = r.json()

Like most Argovis API requests, you get a list of documents matching your query. Let's have a look at the first record in what the API returned to us:

In [20]:
rgdata[0]

{'_id': 'rgTempTotal',
 'units': 'degree celcius (ITS-90)',
 'levels': [2.5,
  10,
  20,
  30,
  40,
  50,
  60,
  70,
  80,
  90,
  100,
  110,
  120,
  130,
  140,
  150,
  160,
  170,
  182.5,
  200,
  220,
  240,
  260,
  280,
  300,
  320,
  340,
  360,
  380,
  400,
  420,
  440,
  462.5,
  500,
  550,
  600,
  650,
  700,
  750,
  800,
  850,
  900,
  950,
  1000,
  1050,
  1100,
  1150,
  1200,
  1250,
  1300,
  1350,
  1412.5,
  1500,
  1600,
  1700,
  1800,
  1900,
  1975],
 'date_added': '2022-04-22T00:56:19.631Z'}

The first object in a request to `/grids` is always the appropriate *metadata record* for the grid. The most important piece of information it includes is the `levels` key, which tells us the pressure in dbar of each level of this climatology; we'll need this to interpret the rest of the gridded data.

All the records returned by `/grids` after the first specify the actual data we wanted; let's have a look at one of them:

In [21]:
rgdata[1]

{'_id': '6264912baa7c850607148180',
 'g': {'type': 'Point', 'coordinates': [-72.5, 38.5]},
 't': '2012-03-15T00:00:00.000Z',
 'd': [14.328,
  14.364,
  14.000999,
  13.493001,
  13.293,
  13.512,
  13.667,
  13.725,
  13.753,
  13.749,
  13.731999,
  13.727,
  13.743,
  13.719,
  13.682,
  13.629001,
  13.542,
  13.439,
  13.219,
  12.803,
  12.322,
  11.679,
  11.028999,
  10.431,
  9.852,
  9.393,
  8.955999,
  8.55,
  8.164,
  7.798,
  7.456,
  7.137,
  6.83,
  6.39,
  5.946,
  5.645,
  5.392,
  5.208,
  5.054,
  4.89,
  4.717,
  4.57,
  4.5,
  4.437,
  4.36,
  4.3,
  4.227,
  4.096,
  4.046,
  3.993,
  3.964,
  3.906,
  3.814,
  3.718,
  3.624,
  3.519,
  3.407,
  3.313]}

By default, Argovis returns gridded data in a *profile-like* structure: this record, located in space by its `g` key (for geolocation) and time by its `t` key, contains the grid points for all corresponding depths in its `d` (for data) key. To interpret the list of numbers, compare it entry-by-entry to the `levels` key in the metadata record; for example, the two records printed above indicate that the temperature at these coordinates and 10 dbar depth is 14.364.

## Remapping Gridded Data

When working with gridded data, we often think of the data as maps at a constant time and pressure level across a region of interest, rather than the profile structure seen above. We can reconstruct gridded data in this format with a function like the following:

In [22]:
def gridmap(time, pressure, region):
    # given a datetime <time>, a <pressure>, and a <region> defined like polygon was above,
    # return a list of dictionaries {geolocation, meas} that represent the grid at the time and depth specified.
   
    params = {
        "gridName": "rgTempTotal",
        "startDate": datetime.strftime(time - timedelta(minutes=1), '%Y-%m-%dT%H:%M:%SZ'),
        "endDate": datetime.strftime(time + timedelta(minutes=1), '%Y-%m-%dT%H:%M:%SZ'),
        "polygon": region,
        "presRange": str(pressure-.001) + ',' + str(pressure+.001)
    }

    r = requests.get('https://argovis-api.colorado.edu/grids', params=params, headers={'x-argokey': API_KEY})
    data = r.json()
    return [{"geolocation": p["g"], "meas": p["d"][0]} for p in data[1:]]
    
dmap = gridmap(datetime.strptime('2012-01-15T00:00:00Z', '%Y-%m-%dT%H:%M:%SZ'), 2.5, params['polygon'])
dmap

[{'geolocation': {'type': 'Point', 'coordinates': [-72.5, 38.5]},
  'meas': 15.683001},
 {'geolocation': {'type': 'Point', 'coordinates': [-72.5, 39.5]},
  'meas': 14.914},
 {'geolocation': {'type': 'Point', 'coordinates': [-71.5, 39.5]},
  'meas': 15.139999},
 {'geolocation': {'type': 'Point', 'coordinates': [-71.5, 38.5]},
  'meas': 15.451}]

The resulting list represents temperature grid points in the region of interest from the 2.5 dbar grid on Jan 15 2012.

## Ingestion by xarray

Xarray is a familiar pythonic data structure; we can transform a raw API response to an xarray with a helper similar to the following.

In [51]:
def xargrid(grid):
    # given the json response <grid> of a request to /grids,
    # return an xarray object with coordinates time, lat, lon, depth, and measurement value.
    
    lat = []
    lon = []
    time = []
    pres = []
    meas = []
    for p in grid[1:]:
        for i, e in enumerate(p['d']):
            lon.append(p['g']['coordinates'][0])
            lat.append(p['g']['coordinates'][1])
            time.append(p['t'])
            meas.append(p['d'][i])
            pres.append(grid[0]['levels'][i])
            
    df = pandas.DataFrame({"latitude": lat, 
                           "longitude": lon, 
                           "time": time, 
                           "pressure": pres, 
                           "measurement": meas}).set_index(["latitude","longitude","time","pressure"])
    return df.to_xarray()
    
ds = xargrid(rgdata)

Now we can do all the usual xarray operations; lets see what the ranges of our coordinate variables are:

In [52]:
print('latitudes:',ds['latitude'].data)
print('longitudes:',ds['longitude'].data)
print('times:',ds['time'].data)
print('pressures:',ds['pressure'].data)

latitudes: [38.5 39.5]
longitudes: [-72.5 -71.5]
times: ['2012-01-15T00:00:00.000Z' '2012-02-15T00:00:00.000Z'
 '2012-03-15T00:00:00.000Z']
pressures: [   2.5   10.    20.    30.    40.    50.    60.    70.    80.    90.
  100.   110.   120.   130.   140.   150.   160.   170.   182.5  200.
  220.   240.   260.   280.   300.   320.   340.   360.   380.   400.
  420.   440.   462.5  500.   550.   600.   650.   700.   750.   800.
  850.   900.   950.  1000.  1050.  1100.  1150.  1200.  1250.  1300.
 1350.  1412.5 1500.  1600.  1700.  1800.  1900.  1975. ]


And lets use a selection of these to look up data from our xarray dataset:

In [55]:
ds.loc[{"latitude": 38.5, "longitude":-71.5, "time":'2012-02-15T00:00:00.000Z', "pressure":500}]['measurement'].data

array(4.707)

## Area-Weighted Means

A common operation when considering gridded data is to weight a mean by area of grid cells, which changes with latitude. A helper to do this with Argovis grid data could look like the following.

In [69]:
def amean(grid):
    # given a map of grid data like what's returned from gridmap(),
    # calculate the mean of the data variable, weighted by grid cell area
    
    cellsize = 1 # degrees
    total = 0
    totalweight = 0
    for cell in grid:
        highlat = abs(cell['geolocation']['coordinates'][1])+cellsize/2
        if highlat > 90:
            highlat = 180 - highlat
        lowlat = abs(cell['geolocation']['coordinates'][1])-cellsize/2
        weight = math.sin(math.pi/180*highlat) - math.sin(math.pi/180*lowlat)
        total += cell['meas']*weight
        totalweight += weight
        
    return total / totalweight
        
amean(dmap)

15.298908060646564