# Create Monthly NDVI composites using Harmonized Landsat Sentinel (HLS) data from NASA CMR STAC api using `pystac_client` & `stackstac` 

In this tutorial we will do the following tasks:

1. Use leafmap to select an Area of Interest (AOI) and save it as a geojson
2. Explore NASA CMR STAC API using `pystac_client` CLI & Python API
3. Understand & visualize the availability of Harmonized Landsat Sentinel (HLS) data for an AOI
4. Use `dask` and `stackstac` to pull data efficiently
5. Load the data as lazy `xarrays`, filter by cloudcover, compute NDVI & create monthly composites
6. Visualize the evolution of beautiful crop circles over time

Before starting this tutorial make sure you signed-up for https://urs.earthdata.nasa.gov/ & have a `.netrc` file configured. If not please [run this script](https://git.earthdata.nasa.gov/projects/LPDUR/repos/daac_data_download_python/browse/EarthdataLoginSetup.py) before proceeding any further.

In [None]:
# Import all the packages we will be using in our workflow
import json
import warnings
import os
from pathlib import Path

import leafmap
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import rasterio as rio
import stackstac
import xarray as xr
from dask import distributed
from geojson_rewind import rewind
from osgeo import gdal
from pystac_client import Client
from IPython import display

warnings.filterwarnings('ignore')

### Set GDAL configuration to successfully access LP DAAC Cloud Assets

In [None]:
rio_env = rio.Env(GDAL_DISABLE_READDIR_ON_OPEN='TRUE',
                  GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
                  GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/cookies.txt'))
rio_env.__enter__()

Create a `Path` object to `DATA` and look what we have inside it. We have a default `aoi.geojson` file that points to somewhere near Egypt's Thoshka Projekt. You can choose to use the default file or create your own AOI 

In [None]:
DATA = Path('data')
!ls {DATA}

# Select an AOI

We will use `leafmap` to select an AOI & save it as GeoJSON inside the `data/` folder. For reference on how to do this using leafmap please check the create_vector notebook: https://leafmap.org/notebooks/45_create_vector/

In [None]:
import leafmap.leafmap as leafmap

In [None]:
m = leafmap.Map(center=(31.5, 22.5), zoom=8, 
                draw_control=True, measure_control=False, fullscreen_control=False, attribution_control=True)
m

We can check if our newly created AOI is saved inside the `data/` folder. Here I saving it as `aoi.geojson`. 

In [None]:
!ls {DATA}

In [None]:
# Hack: Leaflet saves polygons in a clockwise manner in Feature Collection. We can fix this using `geojson_rewind`
aoi = json.load((DATA/"aoi.geojson").open("r"))
aoi = rewind(aoi)
json.dump(aoi, (DATA/"aoi.geojson").open("w"))
aoi

# NASA CMR STAC

NASA's Common Metadata Repository (CMR) is a metadata catalog of NASA Earth Science data. STAC, or SpatioTemporal Asset Catalog, is a specification for describing geospatial data with JSON and GeoJSON. The related STAC-API specification defines an API for searching and browsing STAC catalogs.

To know more about STAC & ARCO data formats please visit https://stacindex.org/ and https://pangeo-forge.readthedocs.io/en/latest/

In [None]:
# NASA CMR STAC URL
CMR_STAC_URL = "https://cmr.earthdata.nasa.gov/stac"
providers = Client.open(CMR_STAC_URL)

### Sub Catalog

NASA CMR STAC base catalog has a list of sub-catalog such as NOAA, JAXA, LPCLOUD etc. Let us list them down here for reference.

In [None]:
for provider in providers.get_children():
    print(provider.title)

### We will use the Harmonised Landsat Sentinel (HLS) data available under `LPCLOUD` Sub-Catalog

HLS consists of input data from the joint NASA/USGS Landsat 8 and the ESA (European Space Agency) Sentinel-2A and Sentinel-2B satellites to generate a harmonized, analysis-ready surface reflectance data product with observations every two to three days.

# Access NASA CMR STAC LPCLOUD API using `pystac_client`

We will look at using both the python-client & CLI of `pystac_client` to access STAC data. 

## 1. `pystac_client` CLI

Here we are searching for tiles in collection "HLSS30.v2.0" & "HLSL30.v2.0" that intersects our defined AOI. We are defining the date-range between Jan-2019 to Jan-2022 & saving the results inside `data/aoi-catalog.json`

In [None]:
!stac-client search 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD' \
    --collection HLSS30.v2.0 HLSL30.v2.0 \
    --intersects {DATA/'aoi.geojson'} \
    --datetime 2019-01-01/2022-01-31 > data/aoi-catalog.json

Now, we can look manually inside the raw json file which has details about each tile like properties, bounds, assets etc.

We can also use [`stacterm`](https://github.com/stac-utils/stac-terminal) which is a library for displaying information (tables, calendars, plots, histograms) about STAC Items in the terminal to get a sense of the data coverage in our AOI.

### List down the days on which we have HLS data available 

In [None]:
!cat data/aoi-catalog.json | stacterm cal

> We have data from only HLSL30 (Landsat) collection available till Aug 2020, HLSS30 (Sentinel) collection starts appearing from Sept, 2020. We can also represent the same data in a tabular manner by applying filter & sort functions.

In [None]:
!cat data/aoi-catalog.json | stacterm table \
    --fields collection date eo:cloud_cover \
    --sort eo:cloud_cover | head -20

## 2. `pystac_client` Python API

Connect to the LPCLOUD CMR catalog & look at the list of available collections.

In [None]:
catalog = Client.open(f'{CMR_STAC_URL}/LPCLOUD')

collections = catalog.get_children()
for collection in collections:
    print(collection.id, collection.title)

> We will be using the HLSL30.v2.0 & HLSS30.v2.0 collections in this notebook

### Query the STAC catalog to access HLS data separately for Sentinel & Landsat data

We can also pass in a `query` or `filter` parameter to the `search` function to further filter down the item search results. STAC catalog search API gives us options to filter results based on spatio-temporal factors.  

In [None]:
s30 = catalog.search(
        collections=['HLSS30.v2.0'],
        intersects=aoi['features'][0]['geometry'],
        datetime='2019-01-01/2022-01-31',
)
l30 = catalog.search(
        collections=['HLSL30.v2.0'],
        intersects=aoi['features'][0]['geometry'],
        datetime='2019-01-01/2022-01-31',
)

In [None]:
s30.matched(), l30.matched()

> 387 tiles of HLSS30 collection available for our AOI between Jan-2019 - Jan-2022  
> 140 tiles of HLSL30 collection available for our AOI between Jan-2019 - Jan-2022  

Read the filtered results & call `to_dict()` on them that stores them as `feature collections`

In [None]:
s30_tiles, l30_tiles = s30.get_all_items(), l30.get_all_items()
s30_tiles_json, l30_tiles_json = s30_tiles.to_dict(), l30_tiles.to_dict()

In [None]:
display.JSON(s30_tiles_json)

In [None]:
display.JSON(l30_tiles_json)

### Plot the tile boundaries of HLSS30 & HLSL30 collection & our AOI

Convert the `geojsons` into `GeoDataFrames` to visualize inside Leafmap. Use the layers icon to filter different tiles & hover over to look at the properties.

In [None]:
s30_tiles_gdf = gpd.GeoDataFrame.from_features(s30_tiles_json, crs="EPSG:4326")
l30_tiles_gdf = gpd.GeoDataFrame.from_features(l30_tiles_json, crs="EPSG:4326")
aoi_gdf = gpd.GeoDataFrame.from_features(aoi["features"], crs="EPSG:4326")

m = leafmap.Map(center=(40, -74), zoom=9)
m.add_gdf(s30_tiles_gdf, layer_name="Sentinel Tiles", fill_colors=["red"])
m.add_gdf(l30_tiles_gdf, layer_name="Landsat Tiles", fill_colors=["blue"])
m.add_gdf(aoi_gdf, layer_name="AOI", fill_colors=["black"], zoom_to_layer=False)
m

# Story so far

We used `pystac_client` to query all the tiles from HLSS30 & HLSL30 collections with in our defined spatial & temporal extent. 

We want to create montly NDVI composites over our AOI. To do this we have to
- Go over each tile in the collection result from previous steps
- Download the required assets i.e NIR & Red bands
- Clip the tiles to AOI
- Compute NDVI & create a montly composite

**OR**

We can take advantage of STAC & COGs structures in NASA CMR API and use `stackstac` along with `dask` to compute montly NDVI composites over the AOI in an efficient manner without downloading & processing all the tiles. 

# Stackstac

`stackstac` converts STAC collections into lazy `xarrays`. It can read STAC metadata into xarray coordinates, that helps in indexing, filtering and computing aggregations over the dataset.  

`stackstac` can also use `dask` to perform the computations parallely.

For this example, we will create a local `dask` cluster to perform our `stackstac` operations. You can easily replace this with a cluster in the cloud to speed up the operations.

`Dask` has a nice UI that lets you visualize each step in the process and also provides information on compute & data usage. Visit the dashboard link to see more details.

In [None]:
cluster = distributed.LocalCluster()
client = distributed.Client(cluster)
client.dashboard_link

In [None]:
# Configure GDAL options to access COGs from Earthdata system
dist_env = stackstac.DEFAULT_GDAL_ENV.updated(dict(
    GDAL_DISABLE_READDIR_ON_OPEN='TRUE',
    GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'),
    GDAL_HTTP_COOKIEJAR=os.path.expanduser('~/cookies.txt'))
)

# Monthly NDVI composites 

We are trying to create monthly NDVI composites from Jan 2019 - Jan 2022 for defined AOI, in this case Thoshka Projekt, Egypt. We will need `NIR` & `Red` bands to compute NDVI from Sentinel & Landsat HLS Imagery. 

**Sentinel 2**:
 - "narrow" NIR = B8A  
 - Red = B04  

**Landsat 8**:
 - NIR = B05  
 - Red = B04  

`stackstac` can convert the STAC items into lazy xarray's. We can then use them to filter by cloud cover, clip to our defined AOI, compute monthly composite etc.

Here, we define the `bbox` of our AOI, `bands` (NIR, Red) we need & the resolution of imagery to access.

In [None]:
bbox = tuple(map(float, aoi_gdf.bounds.values[0]))

s30_stack = stackstac.stack(
    s30_tiles,
    assets=['B8A', 'B04'],
    bounds_latlon=bbox,
    resolution=30,
    epsg=32636,
    gdal_env=dist_env
)
l30_stack = stackstac.stack(
    l30_tiles,
    assets=['B05', 'B04'],
    bounds_latlon=bbox,
    resolution=30,
    epsg=32636,
    gdal_env=dist_env
)

> Great, that's all there is to `stackstac`. Now we have a lazy xarray & can perform all the operations on top of it. Note: All the operations are perfomed on the metadata & actual computation happens only when you call the `persists()` or `compute()` method on lazy xarray object.

#### Fix the band mis-match in Sentinel & Landsat data

In [None]:
s30_stack.coords['band'] = ['nir', 'red']
l30_stack.coords['band'] = ['nir', 'red']

#### Combine both into a single stack

In [None]:
stack = xr.concat((s30_stack, l30_stack), dim='time').sortby("time")
stack.data

#### Filter by cloud cover score

In [None]:
cloudless = stack[stack['eo:cloud_cover'] < 10]

#### Compute NDVI for the AOI

In [None]:
nir, red = cloudless.sel(band='nir'), cloudless.sel(band='red')
ndvi = (nir - red)/((nir + red) + 1e-10)

#### Monthy composite with median

In [None]:
ndvi_monthly = ndvi.resample(time='M').median(dim='time')
ndvi_monthly.data

#### Do the actual computation

With `stackstac` we are not pulling all the tiles into our machine (which would have been several GBs), but just a subset of it i.e our AOI. You can monitor the progress using the dask UI.

In [None]:
data = ndvi_monthly.compute()

#### Save the compute DataArray in NetCDF format

In [None]:
data.to_netcdf('data/egypt-thoska-ndvi-egypt.nc')

#### Visualize the NDVI composites

In [None]:
fig, axes = plt.subplots(nrows=3, ncols=12, figsize=(25,25))

for idx, ax in enumerate(axes.flatten()):
    datum = data.isel(time=idx)
    ax.imshow(datum, vmin=-1, vmax=1, cmap='RdYlGn')
    ax.set_title(datum.time.dt.strftime("%b-%Y").values)
    ax.set_axis_off()
    plt.subplots_adjust(hspace=0.1, wspace=0.1, bottom=0.2, top=0.45)

> Ahh finally! We can see the beautiful crop circles evolving over time as a result of center-pivot irrigation.