# Exercise - Terschelling NDVI using CoCliCo & MPC STAC

In this exercise we are going to use both the MPC and Deltares CoCliCo STAC to look at vegetation change on the island of Terschelling. For this, we load the Coastal Mask in the CoCliCo STAC and overlay it on Sentinel-2 derived NDVI images from the MPC STAC. Eventually we will make some interactive plots, both in space & time and compute some statistics.

### Add project directory to Python path

This code defines two functions and retrieves the project directory path. It's useful when we want to define some generic functions that can be imported. If you retrieve the project directoy path like this, it both works in an Ipython and Python environment. 

- `is_interactive()`: Checks if the code is running in an interactive environment.
- `get_proj_dir()`: Determines the project directory path based on the execution context. If running interactively, it infers the project directory from the Jupyter kernel. Otherwise, it infers it from the Python file. The function returns the project directory as a `pathlib.Path` object.

In [None]:
import os
import pathlib
import sys


def is_interactive() -> bool:
    """
    Check if the Python code is running in an interactive environment.
    """
    import __main__ as main

    return not hasattr(main, "__file__")


def get_proj_dir() -> pathlib.Path:
    """
    Get the project directory path.

    Returns:
        A `pathlib.Path` object representing the project directory path.
    """
    if is_interactive():
        print("Inferring project directory from the Jupyter kernel.")
        cwd = pathlib.Path().resolve()
        proj_dir = cwd.parent
    else:
        print("Inferring project directory from the Python file.")
        cwd = pathlib.Path(__file__)
        proj_dir = cwd.parent.parent

    return proj_dir


proj_dir: pathlib.Path = get_proj_dir()
sys.path.append(str(proj_dir / "src"))

### Import libraries

In [None]:
import colorcet as cc
import dask
import geopandas as gpd
import hvplot.pandas  # noqa
import hvplot.xarray  # noqa
import pandas as pd
import panel as pn
import planetary_computer
import pystac_client
import stackstac
import xarray as xr

# from azure.storage.blob import BlobServiceClient
from dask.distributed import Client
from ipyleaflet import Map, basemaps

### Load the Region of Interest (RoI) - Terschelling

In [None]:
m = Map(basemap=basemaps.Esri.WorldImagery, scroll_wheel_zoom=True)
m.center = 53.4, 5.35  # Terschelling
m.zoom = 12
m.layout.height = "800px"
m

Extract the coords from the interactive map -- IMPORTANT: wait 2 seconds until map is rendered, otherwise you cannot extract the coords

In [None]:
from coastmonitor.geo.geometries import bbox_to_geometry, geo_bbox, geometry_to_bbox

bbox = [m.west, m.south, m.east, m.north]
bbox_geom = bbox_to_geometry(bbox)
roi = geo_bbox(*bbox, src_crs=4326, dst_crs=4326)
roi.explore()

### Load the CoCliCo STAC catalog (developed by Deltares within the [CoCliCo project](https://coclicoservices.eu/))

See the STAC here: [CoCliCo STAC](https://radiantearth.github.io/stac-browser/#/external/storage.googleapis.com/dgds-data-public/coclico/coclico-stac/catalog.json?.language=en). Notice that this STAC is still in development, i.e. there are some double links and it doesn't look as nice as the [MPC STAC](https://radiantearth.github.io/stac-browser/#/external/planetarycomputer.microsoft.com/api/stac/v1?.language=en). Yet, is also contains some nice datasets already.

Load the CoCliCo STAC

In [None]:
catalog = pystac_client.Client.open(
    "https://storage.googleapis.com/dgds-data-public/coclico/coclico-stac/catalog.json"
)

Print all the datasets in the STAC

In [None]:
list(
    catalog.get_children()
)  # list all the STAC Collections (i.e. datasets hosted by planetary computer)

### Open the Coastal Mask dataset and filter on the RoI

Open the Coastal Mask dataset

In [None]:
cm_collection = catalog.get_collection("cm")
cm_items = list(cm_collection.get_all_items())
# cm_items[0]

Explore an item

In [None]:
cm_items[0]

Check all the Coasl Mask bounding boxes on a map, tip: bbox is a property from the items

In [None]:
cm_bboxes = pd.concat([geo_bbox(*i.to_dict()["bbox"]) for i in cm_items])
cm_bboxes = cm_bboxes.reset_index(drop=True)
cm_bboxes.explore()

Spatial join the Coastal Mask boxes on the RoI

In [None]:
cm_bboxes_roi = gpd.sjoin(cm_bboxes, roi)[cm_bboxes.columns]
cm_bboxes_roi.explore()

Obtain all Coastal Mask STAC hrefs in the remaining RoI items

In [None]:
# obtain STAC items that cover the ROI
items_roi = [cm_items[i] for i in cm_bboxes_roi.index]
cm_hrefs = [i.assets["cm"].href for i in items_roi]
# cm_hrefs

### Local Dask cluster

Here we launch a local Dask cluster, a Python-based multiprocessing library, which will speed up the computation. The cluster we make here is local, when you want to upscale your computations you should use a Dask gateway, hosted on a remote server, close to the data.

In [None]:
# when running locally (parallel)
client = Client(threads_per_worker=1, processes=True, local_directory="/tmp")
client

# asking for plots (.plot()) or numerical values (.compute()) will trigger the computation, which you can see in the dask dashboard

### Read the Coastal Mask data in the RoI 

Read the Coastal Mask lazily using Xarray with a rasterio engine

In [None]:
%%time
@dask.delayed
def lazy_open(href):
    chunks = dict(band=1, x=512, y=512)
    return xr.open_dataset(href, chunks=chunks, engine="rasterio")


das = dask.compute(
    *[lazy_open(href) for href in cm_hrefs]
)  # here we start the computation
print(f"len das: {len(das)}")
das[0]

Combine the coordinates in the Xarray Coastal Mask dataset

In [None]:
%%time
cm = xr.combine_by_coords(das).compute()

Plot the Coastal Mask using HoloViz's hvplot

In [None]:
%%time
cm.squeeze("band").hvplot(
    rasterize=True, x="x", y="y", aspect="equal", tiles="EsriImagery"
)

### Load the Microsoft Planetary Computer (MPC) STAC catalog 

Have a look at their [documentation](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/) for examples

Load the MPC STAC

In [None]:
catalog2 = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

Print all the datasets in the STAC

In [None]:
list(
    catalog2.get_children()
)  # list all the STAC Collections (i.e. datasets hosted by planetary computer)

### Open the S2 dataset and filter on a cloud cover, datetime and the RoI

Open the S2 dataset and print all items

In [None]:
search = catalog2.search(
    collections=[
        "sentinel-2-l2a"
    ],  # atmospherically corrected Surface Reflectances (SR)
    intersects=bbox_geom,
    datetime="2022-07-01/2023-06-22",  # "2020-01-01/2020-01-31",
    query={"eo:cloud_cover": {"lt": 50}},
)

items = search.item_collection()
print(f"{len(items)} items found in catalog search.")

list all bands within the S2 collection, together with its description

In [None]:
s2 = catalog2.get_collection("sentinel-2-l2a")
pd.DataFrame(s2.summaries.get_list("eo:bands"))

Merge all items in a `xr.dataArray` 

In [None]:
# stackstac contains many more arguments to filter the data (on the bounding box, a certain number of bands and by sorting the dates for instance)
# merge items in dataset exactly matching the bounding box
BAND = {
    "B02": "blue",
    "B03": "green",
    "B04": "red",
    "B08": "nir",
    "B11": "swir1",
    "SCL": "SCL",
}


stack = stackstac.stack(
    items,
    epsg=cm.rio.crs.to_epsg(),
    assets=list(BAND.keys()),
    bounds_latlon=bbox,
    sortby_date="desc",  # sort by date
)
stack

### Compute the NDVI and make an interactive plot

Compute the NDVI (red & nir band)

In [None]:
# decrease to only maintain the ndvi
red = stack.sel({"band": "B04"})
nir = stack.sel({"band": "B08"})
ndvi = (nir - red) / (nir + red)  # this is still a lazy Dask computation
ndvi

Make an interactive plot for the second timestep in the NDVI dataArray

In [None]:
ndvi.isel({"time": 2}).hvplot(x="x", y="y", cmap=cc.CET_D9[::-1], data_aspect=1)

### Combine the NDVI from MPC and Coastal Mask from CoCliCo

We could directly match the NDVI (MPC) and Coastal Mask (CoCliCo) data as both are rasters. However, these do need to have the same coordinates / raster to overlay it properly. `Reindex` the Coastal Mask layer to match the NDVI raster with a tolerance of 0.001 and a fill value of 0.

In [None]:
cmr = cm.reindex(x=ndvi.x, y=ndvi.y, method="nearest", tolerance=0.001, fill_value=0)

Make a plot of the newly constructed Coastal Mask and check your reindexing. The Coastal Mask should now have the same RoI as the NDVI layer.

In [None]:
cmr.squeeze("band").hvplot(
    rasterize=True, x="x", y="y", aspect="equal", tiles="EsriImagery"
)

Mask the NDVI layer with the reindexed Coastal Mask layer to get rid of the oceanic information. Plot the second timestep in the masked NDVI dataArray

In [None]:
# mask to match coastal mask land area
ndvi_masked = ndvi.where(cmr.squeeze("band") == True)

In [None]:
ndvi_masked.isel({"time": 2}).hvplot(x="x", y="y", cmap=cc.CET_D9[::-1], data_aspect=1)

### Make a dashboard from the data using Panel, in which you can slide through the time steps

In [None]:
ndvi_masked["time"] = pd.DatetimeIndex(ndvi_masked["time"]).strftime(
    "%Y-%m-%dT%H:%M:%S"
)
time_options = ndvi_masked["time"].values.tolist()
time_slider = pn.widgets.DiscreteSlider(name="Time", options=time_options)


@pn.depends(time_slider.param.value)
def plot_ndvi(time, **kwargs):
    plot = ndvi_masked.sel({"time": time}).hvplot(
        x="x", y="y", cmap=cc.CET_D9[::-1], data_aspect=1
    )
    return plot.opts(title=f"NDVI of {time}")

In [None]:
pn.extension()
title_bar = pn.Row(
    pn.pane.Markdown(
        "## Interactive NDVI dashboard",
        styles={"color": "black"},
        width=800,
        sizing_mode="fixed",
        margin=(10, 5, 10, 15),
    ),
    pn.Spacer(),
)
eo_panel = pn.Column(title_bar, pn.Row(time_slider), pn.Row(plot_ndvi))

In [None]:
eo_panel

### Make a timeseries plot from the data using HvPlot

Make a timeseries plot from the single image mean NDVI value within the masked NDVI layer

In [None]:
# make the plot; water hyacinth gets less over time (i.e. open water area grows), meaning mean ndvi goes up (this is other way around for S2 MPC data)

df = ndvi_masked.mean(dim=["x", "y"]).to_dataframe().reset_index()  # makes the compute
# df.hvplot.line(x="time", y="band_data")
df.plot.line(x="time", y="band_data")

Make a timeseries plot from the monthly resampled mean NDVI value within the masked NDVI layer

In [None]:
ndvi_masked["time"] = pd.to_datetime(ndvi_masked["time"])
df = ndvi_masked.sortby("time").resample(time="1M").mean()  # over time (compute)
dfm = (
    df.mean(dim=["x", "y"]).to_dataframe().reset_index()
)  # over space (compute & write to df)
# dfm.hvplot.line(x="time", y="band_data")
dfm.plot.line(x="time", y="band_data")

### Compute statistics

Compute statistics using the `.describe()` method for dataframes

In [None]:
dfm.band_data.describe()

Close the cluster now that we are done with the analysis

In [None]:
# close the cluster
client.close()