# Lab 3: Build a Data Product

*Goal: Build a new data product and add it to your organization's data catalog*

This Lab will walk through the steps of:

1. Developing a more advanced workflow that combines multiple datasets (elevation data from a DEM and rainfall data from ERA5).
2. We will use these datasets to create a new, derived dataset—for example -- "Landslide Susceptibility Index" — and write it back into the Earthmover platform, making it a new, reusable asset.

In [None]:
import xarray as xr
import zarr
from arraylake import Client
from dask.diagnostics import ProgressBar

# Bounding Box for New Zealand
bbox = {"longitude": slice(165, 179), "latitude": slice(-33, -47)}

client = Client()
client.login()

## Input datasets

In [None]:
# open the DEM dataset
dem_repo = client.get_repo("earthsciencesnz/copernicus_dem")
dem_session = dem_repo.readonly_session("main")
dem_ds = xr.open_zarr(dem_session.store, group="90m_new_zealand_complete")
dem_ds

In [None]:
# Task 1: plot the DEM
# Hint: you may want to "coarsen" the data before plotting!

In [None]:
# open the ERA5 dataset
era5_repo = client.get_repo("earthmover-public/era5-surface-aws")
era5_session = era5_repo.readonly_session("main")
era5_ds = xr.open_zarr(era5_session.store, group="spatial")

In [None]:
era5_ds["tp"] = era5_ds.cp + era5_ds.lsp
era5_ds

In [None]:
# Task 2: plot the ERA5 total precipitation for the period February 6–15, 2023 over New Zealand
# Hint: Slice over time and space, then sum over time...

## Calculate Slope from the DEM

In [None]:
from landslide import calculate_slope

calculate_slope??

In [None]:
dem_ds["slope"] = calculate_slope(dem_ds["elevation"], 90, 90)
dem_ds

In [None]:
from landslide import landslide_index

landslide_index??

👆 requires accumulated daily precipitation

## Data prep

Before we can calculate the full index, we need to create a daily version of the precipitation data on the 90m DEM grid.

In [None]:
time_period = slice("2023-01-15", "2023-02-28")

daily_precip = era5_ds["tp"].sel(time=time_period).resample(time="1d").sum()
daily_precip_nz = daily_precip.sel(**bbox)
daily_precip_nz

In [None]:
daily_precip_nz_7d = daily_precip_nz.rolling(time=7).sum()
daily_precip_nz_7d

In [None]:
# Interpolate onto the 90m grid
daily_precip_nz_7d_90m = daily_precip_nz_7d.interp(
    latitude=dem_ds.latitude, longitude=dem_ds.longitude, method="linear"
)
daily_precip_nz_7d_90m

In [None]:
dem_ds["index"] = landslide_index(
    dem_ds["elevation"], daily_precip_nz_7d_90m.sel(time="2023"), dx=90, dy=90
)
dem_ds

In [None]:
with ProgressBar():
    dem_ds["index"].sel(time="2023-02-01").coarsen(
        latitude=10, longitude=10
    ).max().plot(robust=True)

Finally, we'll update the metadata of the dataset to be more cf-compliant:

In [None]:
dem_ds["longitude"].attrs["axis"] = "X"
dem_ds["latitude"].attrs["axis"] = "Y"
dem_ds

## Write the Landslide Index to the Arraylake Catalog

In the final step, we'll each create our own Icechunk repository and write our Landslide Index to it.

The steps are as follows:

1. Create the repository, populating it with relevant metadata
2. Create a writable-session using Icechunk
3. Write the data to the Icechunk Store using Xarray and Zarr
4. Finalize the write by calling `session.commit()`

In [None]:
my_name = (
    "jhamman"  # <- replace with a string that uniquely identifies you! (IMPORTANT)
)
description = ""  # Add a 1-line description to the catalog entry
metadata = {}  # Add Key: Value style metadata to the catalog entry

repo = client.create_repo(
    f"earthsciencesnz/landslide-{my_name}", description=description, metadata=metadata
)

# or reopen your repository if you already created it
# repo = client.get_repo(f"earthsciencesnz/landslide-{my_name}")

In [None]:
# create the writable session
session = repo.writable_session("main")
session

In [None]:
# write the data
# (for the sake of time and resources, we'll just write a few days out -- feel free to experiment with this)
with ProgressBar():
    dem_ds.sel(time=slice("2023-02-12", "2023-02-15")).chunk(
        {"latitude": 1000, "longitude": 1000, "time": 1}
    ).to_zarr(session.store)

# Inspect the changes before committing
session.status()

In [None]:
session.commit("added landslide index for feb 2023")

# Lab Activities

1. Explore the web catalog -- find your dataset and add/edit metadata there: https://app.earthmover.io/earthsciencesnz/repositories


2. Add a new field `max_index`

   🧐 Hint: you'll may want to use Xarray's to_zarr in append mode...

3. Find a dataset produced by a colleague and open their dataset... what did they do differently?


4. Add another dataset to the catalog -- perhaps something derived from ERA5 or a public dataset you often work with.