## Run this notebook

You can launch this notebook in SMCE DaskHub by clicking the link below.

[Launch in SMCE DaskHub (requires access)](https://daskhub.veda.smce.nasa.gov/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FNASA-IMPACT%2Fveda-docs&urlpath=lab%2Ftree%2Fveda-docs%2Fexample-notebooks%2Ftimeseries-rioxarray-stackstac.ipynb&branch=main) 


<details><summary>Learn more</summary>
    
### Inside the Hub

This notebook was written on the VEDA SMCE DaskHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM. 

Please request access by emailng aimee@developmentseed.org providing your affiliation, interest in or expected use of the dataset, and an AWS IAM role or user Amazon Resource Name (ARN).

### Outside the Hub

The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

```
%run -i 'cognito_login.py'
```
    
</details>

## Approach

   1. Use `pystac_client` to open the STAC catalog and retrieve the items in the collection
   2. Use `stackstac` to create an `xarray` dataset containing all the items cropped to AOI
   3. Calculate the mean for each timestep over the AOI

In [None]:
from pystac_client import Client
import stackstac
import rioxarray  # noqa

## Declare your collection of interest

You can discover available collections the following ways:

* Programmatically: see example in the `list-collections.ipynb` notebook
* JSON API: https://staging-stac.delta-backend.com/collections
* STAC Browser: http://veda-staging-stac-browser.s3-website-us-west-2.amazonaws.com

In [None]:
STAC_API_URL = "https://staging-stac.delta-backend.com/"
collection = "no2-monthly"

## Discover items in collection for region and time of interest

Use `pystac_client` to search the STAC collection for a particular area of interest within specified datetime bounds.

In [None]:
china_bbox = [
    73.675,
    18.198,
    135.026,
    53.459,
]
datetime = "2000-01-01/2022-01-02"

In [None]:
catalog = Client.open(STAC_API_URL)

search = catalog.search(
    bbox=china_bbox,
    datetime=datetime,
    collections=[collection],
    limit=1000
)
items = list(search.items())
print(f"Found {len(items)} items")

## Read data

Create an `xarray.DataSet` using `stackstac`

In [None]:
# This is a workaround that is planning to move up into stackstac itself
import rasterio as rio
import boto3

session = rio.session.AWSSession(boto3.Session())
gdal_env = stackstac.DEFAULT_GDAL_ENV.updated(always=dict(session=rio.session.AWSSession(boto3.Session())))

In [None]:
da = stackstac.stack(search.get_all_items(), gdal_env=gdal_env)
da = da.assign_coords({"time": da.start_datetime})
da

## Clip the data to the bounding box for China

In [None]:
# Subset to Bounding Box for China
subset = da.rio.clip_box(*china_bbox)
subset

## Select a band of data

There is just one band in this case, `cog_default`.

In [None]:
# select the band default
data_band = da.sel(band='cog_default')

## Aggregate the data

Calculate the mean at each time across the whole dataset. Note this is the first time that the data is actually loaded.

In [None]:
# Average over entire spatial bounding box for each month
means = data_band.mean(dim=('x', 'y')).compute()

In [None]:
means.plot()