# Load Timeseries from STAC Geoparquet

This notebook follows on from the previous notebooks by demonstrating how to
load a data product that has been produced through the use of cloud native
methodologies.

The full source code used to produce the data used in the below notebook is
[available here](https://github.com/auspatious/ldn-land-productivity/).

In short, this is an operational implementation of the land productivity metric
explored in notebook 2. The output was 720 tiles over 24 years of Landsat
data for three case study sites. The notebook below loads data over Belize
masks to the land boundary, and calculates land productivitymetrics for the
whole country.

In [None]:
import geopandas as gpd
import stacrs
from dask.distributed import Client
from odc.geo import Geometry
from odc.geo.xr import mask
from odc.stac import load
from pystac import Item

## Country boundary

We first load a geospatial file containing a vector boundary of Belize. This is
done in memory, using Geopandas.

In [None]:
url = (
    "https://media.githubusercontent.com/media/wmgeolab/geoBoundaries/"
    "9469f09592ced973a3448cf66b6100b741b64c0d/releaseData/gbOpen/BLZ/ADM0/geoBoundaries-BLZ-ADM0-all.zip"
)

geometry = gpd.read_file(url, layer="geoBoundaries-BLZ-ADM0")
geometry.explore()

## Access STAC Geoparquet Index to Tiles

Next, we use a STAC Geoparquet file as an index to the hundreds of items,
thoudands of GeoTIFF files, as a shortcut to load data over our region
of interest.

In [None]:
url = "https://data.ldn.auspatious.com/geo_ls_lp/geo_ls_lp_0_1_0.parquet"

dict_list = await stacrs.search(
    url, bbox=list(geometry.geometry.bounds.values.flatten())
)

items = [Item.from_dict(d) for d in dict_list]

data = load(
    items,
    geopolygon=geometry,
    chunks={"time": 1, "longitude": 3000, "latitude": 3000},
    crs="utm",
    resolution=300
)

data

## Mask data, then do a visual check

In [None]:
masked = mask(data, Geometry(geometry.to_geo_dict()))

In [None]:
masked.isel(time=-1).evi2.odc.explore()

## Summarise data using Dask

This next step is a big job, and uses Dask to run the work in parallel.
It will take a while, but have a think about the work it's doing. The result
is a single value, the mean of the land prodictivity metric EVI, for each year.

So, it's processing a lot of data! With a good internet connection, this should
still only take a couple of minutes.

In [None]:
with Client(n_workers=2, threads_per_worker=16) as client:
    annual_mean = masked.evi2.mean(dim=["x", "y"]).compute()

## Visualise results

This final cell graphs the results. What can we infer from the graph?

In [None]:
import matplotlib.pyplot as plt

annual_mean.plot(size=3, aspect=2.5)
_ = plt.title("Annual Mean EVI2")