# Intro to Argovis' Grid API

Argovis offers a growing list of gridded products, indexed and downloadable through its API. In this notebook, we'll illustrate some basic operations and handling of this data.

## Setup

In addition to importing a few python packages, make sure to plug in your Argovis API key for `API_KEY` in the next cell. If you don't have a free Argovis API key yet, get one at https://argovis-keygen.colorado.edu/.

In [95]:
import requests, xarray, pandas, math
from datetime import datetime, timedelta

API_KEY=''

## Downloading Gridded Data

Argovis offers gridded data at its `/grids` endpoint. Available query string parameters are:

 - `gridName` (mandatory, one of rgTempTotal, rgPsalTotal, ohc): name of gridded product to search.
 - `startDate` (mandatory, format YYYY-MM-DDTHH:MM:SSZ at GMT0): beginning of time window to query.
 - `endDate` (mandatory, format YYYY-MM-DDTHH:MM:SSZ at GMT0): end of time window to query.
 - `polygon` (mandatory, format [[lon0,lat0],[lon1,lat1],...,[lonN,latN],[lon0,lat0]]): geographical region to query.
 - `presRange` (optional, format minimum_pressure,maximum_pressure): pressure window to filter for.

As noted, `gridName`, `startDate`, `endDate` and `polygon` are all required for downloading gridded data, but we can get just the metadata record that describes some high-level information about each grid by providing only the `gridName` parameter:


In [96]:
params = {
  "gridName": 'rgTempTotal',
}

r = requests.get('https://argovis-api.colorado.edu/grids', params=params, headers={'x-argokey': API_KEY})
print(r.json())

{'_id': 'rgTempTotal', 'units': 'degree celcius (ITS-90)', 'levels': [2.5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 182.5, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 420, 440, 462.5, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1412.5, 1500, 1600, 1700, 1800, 1900, 1975], 'date_added': '2022-04-27T23:07:27.943Z', 'lonrange': [-179.5, 179.5], 'latrange': [-64.5, 79.5], 'timerange': ['2004-01-15T00:00:00.000Z', '2018-12-15T00:00:00.000Z'], 'loncell': 1, 'latcell': 1}


We see information about the Roemmich-Gilson Argo climatology, like what pressure levels are present, what the minimum and maximum of latitude, longitude and time is in the grid, and soforth.

Let's try a simple request to download a piece of data from this climatology, a 10 degree box over the North Atlantic from the first quarter of 2012:

In [164]:
params = {
  "gridName": 'rgTempTotal',
  "startDate": '2012-01-01T00:00:00Z',
  "endDate": '2012-04-01T00:00:00Z',
  "polygon": '[[ -73, 40],[ -73, 30],[ -63, 30],[ -63, 40],[ -73, 40]]'
}

r = requests.get('https://argovis-api.colorado.edu/grids', params=params, headers={'x-argokey': API_KEY})
rgdata = r.json()

Like most Argovis API requests, you get a list of documents matching your query. Let's have a look at the first record in what the API returned to us:

In [165]:
rgdata[0]

{'_id': 'rgTempTotal',
 'units': 'degree celcius (ITS-90)',
 'levels': [2.5,
  10,
  20,
  30,
  40,
  50,
  60,
  70,
  80,
  90,
  100,
  110,
  120,
  130,
  140,
  150,
  160,
  170,
  182.5,
  200,
  220,
  240,
  260,
  280,
  300,
  320,
  340,
  360,
  380,
  400,
  420,
  440,
  462.5,
  500,
  550,
  600,
  650,
  700,
  750,
  800,
  850,
  900,
  950,
  1000,
  1050,
  1100,
  1150,
  1200,
  1250,
  1300,
  1350,
  1412.5,
  1500,
  1600,
  1700,
  1800,
  1900,
  1975],
 'date_added': '2022-04-27T23:07:27.943Z',
 'lonrange': [-179.5, 179.5],
 'latrange': [-64.5, 79.5],
 'timerange': ['2004-01-15T00:00:00.000Z', '2018-12-15T00:00:00.000Z'],
 'loncell': 1,
 'latcell': 1}

The first object in a request to `/grids` is always the appropriate *metadata record* for the grid, the same as what you got from `/grids?gridName=rgTempTotal` above. We automatically include it at the front of the full data requests as it is necessary for interpreting that data, as we'll see immediately below.

All the records returned by `/grids` after the first specify the actual data we wanted; let's have a look at one of them:

In [166]:
rgdata[1]

{'_id': '62649121aa7c850607147ba4',
 'g': {'type': 'Point', 'coordinates': [-70.5, 30.5]},
 't': '2012-03-15T00:00:00.000Z',
 'd': [20.604,
  20.57,
  20.526999,
  20.497,
  20.458,
  20.417999,
  20.386999,
  20.335001,
  20.261,
  20.217999,
  20.158001,
  20.059999,
  19.962,
  19.766001,
  19.559999,
  19.317001,
  19.201,
  19.138,
  18.983999,
  18.778999,
  18.555,
  18.421,
  18.333,
  18.248001,
  18.166,
  18.087,
  18.014,
  17.941,
  17.854,
  17.773001,
  17.646,
  17.496,
  17.334999,
  16.996,
  16.354,
  15.572001,
  14.469,
  13.345,
  12.179,
  10.858,
  9.599,
  8.599,
  7.766,
  7.4,
  6.79,
  6.27,
  5.87,
  5.5,
  5.264,
  5.036,
  4.884,
  4.701,
  4.485,
  4.29,
  4.115,
  3.963,
  3.849,
  3.753]}

By default, Argovis returns gridded data in a *profile-like* structure: this record, located in space by its `g` key (for geolocation) and time by its `t` key, contains the grid points for all corresponding depths in its `d` (for data) key. To interpret the list of numbers, compare it entry-by-entry to the `levels` key in the metadata record; for example, the two records printed above indicate that the temperature at these coordinates and 10 dbar depth is 20.57.

## Ingestion by xarray

Xarray is a familiar pythonic data structure; we can transform a raw API response to an xarray with a helper similar to the following.

In [167]:
def xargrid(grid):
    # given the json response <grid> of a request to /grids,
    # return an xarray object with coordinates time, lat, lon, depth, and measurement value.
    
    lat = []
    lon = []
    time = []
    pres = []
    meas = []
    for p in grid[1:]:
        for i, e in enumerate(p['d']):
            lon.append(p['g']['coordinates'][0])
            lat.append(p['g']['coordinates'][1])
            time.append(p['t'])
            meas.append(p['d'][i])
            pres.append(grid[0]['levels'][i])
            
    df = pandas.DataFrame({"latitude": lat, 
                           "longitude": lon, 
                           "time": time, 
                           "pressure": pres, 
                           "measurement": meas}).set_index(["latitude","longitude","time","pressure"])
    return df.to_xarray()
    
ds = xargrid(rgdata)

Now we can do all the usual xarray operations; lets see what the ranges of our coordinate variables are:

In [168]:
print('latitudes:',ds['latitude'].data)
print('longitudes:',ds['longitude'].data)
print('times:',ds['time'].data)
print('pressures:',ds['pressure'].data)

latitudes: [30.5 31.5 32.5 33.5 34.5 35.5 36.5 37.5 38.5 39.5]
longitudes: [-72.5 -71.5 -70.5 -69.5 -68.5 -67.5 -66.5 -65.5 -64.5 -63.5]
times: ['2012-01-15T00:00:00.000Z' '2012-02-15T00:00:00.000Z'
 '2012-03-15T00:00:00.000Z']
pressures: [   2.5   10.    20.    30.    40.    50.    60.    70.    80.    90.
  100.   110.   120.   130.   140.   150.   160.   170.   182.5  200.
  220.   240.   260.   280.   300.   320.   340.   360.   380.   400.
  420.   440.   462.5  500.   550.   600.   650.   700.   750.   800.
  850.   900.   950.  1000.  1050.  1100.  1150.  1200.  1250.  1300.
 1350.  1412.5 1500.  1600.  1700.  1800.  1900.  1975. ]


We can easily select a slice of this array at constant pressure and time, to produce a possibly more conventional, map-like grid representation, and plot it with xarray's built in plots:

In [None]:
gridmap = ds.loc[{"time":'2012-01-15T00:00:00.000Z', "pressure":2.5}]
gridmap['measurement'].plot()

## Area-Weighted Means

A common operation when considering gridded data is to weight a mean by area of grid cells, which changes with latitude. A helper to do this with Argovis grid data could look like the following.

In [163]:
def amean(grid, cellsize):
    # given an xarray dataset <grid> for a given depth and time,
    # calculate the mean of the gridded data variable, weighted by grid cell area
    
    total = 0
    totalweight = 0

    for lon in gridmap['longitude'].data:
        for lat in gridmap['latitude'].data:
            meas = gridmap['measurement'].loc[{"latitude":lat, "longitude":lon}].data
            highlat = abs(lat)+cellsize/2
            if highlat > 90:
                highlat = 180 - highlat
            lowlat = abs(lat)-cellsize/2
            weight = math.sin(math.pi/180*highlat) - math.sin(math.pi/180*lowlat)
            total += meas*weight
            totalweight += weight        
    return total / totalweight
        
amean(gridmap, rgdata[0]['latcell'])

15.298908060646562