# regridding

The regridding usually happens on the fly for satellite imagery and in-situ data, but to demonstrate how this works this notebook does this separately.

In [None]:
import distributed

client = distributed.Client()
client

In [None]:
import pathlib
import warnings

import geopandas as gpd
import pystac
import stac_geoparquet
import xarray as xr
import xdggs
from rich.progress import track

from pangeo_iaocea.regridding import aggregation_regridding, categorize_points

warnings.filterwarnings(
    category=UserWarning, message="Consolidated metadata", action="ignore"
)

In [None]:
cache_root = pathlib.Path.home() / "work/data/stac/cache"
data_root = pathlib.Path.cwd() / "data"

## regrid SST imagery

First, we need to define the target resolution:

In [None]:
grid_info = xdggs.HealpixInfo(level=11, indexing_scheme="nested")

To regrid, we can first read the stored items back into memory:

In [None]:
image_items = gpd.read_parquet(data_root / "avhrr-sst-metop_b.parquet").pipe(
    stac_geoparquet.to_item_collection
)
image_items

and then apply the regridding by looping over the items. For each item, we:
- use `xpystac` to load the given asset into an `xarray` object
- apply aggregation regridding (bin the original data into healpix cells and compute bin means)
- write the result with uniform chunk sizes

In [None]:
regridded_root = cache_root / "healpix/avhrr-sst-metop_b"
regridded_root.mkdir(parents=True, exist_ok=True)
for item in track(image_items):
    ds = xr.open_dataset(
        item.assets["data"], engine="stac", chunks={}, decode_timedelta=True
    )

    regridded = aggregation_regridding(grid_info, ds).chunk({"cells": 100000})

    path = regridded_root.joinpath(item.id).with_suffix(".zarr")
    regridded.to_zarr(path, mode="w")

We can then open one of these and visualize the result:

In [None]:
image = xr.open_dataset(
    regridded_root.joinpath(image_items[1].id).with_suffix(".zarr"),
    engine="zarr",
    decode_timedelta=True,
    chunks={},
).dggs.decode()
image

In [None]:
image["sea_surface_temperature"].compute().dggs.explore()

## transform in-situ data

For the in-situ data, the procedure is the same:
- open the datasets
- define the grid
- bin the coordinates

However, there is no regridding involved.

In [None]:
items = [
    pystac.Item.from_dict(item)
    for item in stac_geoparquet.json_reader.read_json(
        data_root / "insitu_global_phybgcwav_discrete_mynrt_013_030.jsonl"
    )
]

We'll use a higher-resolution grid to accomodate the point / trajectory data:

In [None]:
grid_info = xdggs.HealpixInfo(level=13, indexing_scheme="nested")

With that, we can derive cell ids from the geographic coordinates provided by the dataset:

In [None]:
regridded = []
for item in track(items[:3]):
    ds = xr.open_dataset(item.assets["public"], engine="stac", chunks={}).compute()
    regridded.append(
        ds.assign_coords(
            {"cell_ids": categorize_points(grid_info, ds["LONGITUDE"], ds["LATITUDE"])}
        )
    )

The datasets are small enough to stay in memory, so we can immediately visualize the result:

In [None]:
regridded[2].dggs.decode(grid_info).get("TEMP").dggs.explore()