Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

round-trip land-only unstructured grid #117

Closed
mathause opened this issue Dec 2, 2021 · 2 comments · Fixed by #217
Closed

round-trip land-only unstructured grid #117

mathause opened this issue Dec 2, 2021 · 2 comments · Fixed by #217

Comments

@mathause
Copy link
Member

mathause commented Dec 2, 2021

I thought about how we can round-trip an unstructured land-only grid. We will in some way need the original grid. I am ignoring this for now but this could be stored as coords on the new dataset (but then all coords and dims need renaming...).

Related to #105

import xarray as xr
from packaging.version import Version

def to_unstructured(ds, x_dim="lon", y_dim="lat", cell_dim="cell", multiindex=False):

    dims = {cell_dim: (y_dim, x_dim)}

    ds = ds.stack(dims)

    if not multiindex:
        # the behaviour changed in xarray 2022.06 (Index refactor)
        drop = True if Version(xr.__version__) > Version('2022.3') else False
        ds = ds.reset_index(list(dims.keys()), drop=drop)

    return ds


def to_landonly(
    ds, x_dim="lon", y_dim="lat", cell_dim="cell", x_coords="lon", y_coords="lat"
):

    mask = land_110.mask_3D(ds, lon_name=x_coords, lat_name=y_coords).squeeze(drop=True)

    # convert (lat, lon) -> (cells)
    ds = to_unstructured(ds, x_dim, y_dim, cell_dim)
    mask = to_unstructured(mask, x_dim, y_dim, cell_dim)

    return ds.sel({cell_dim: mask})


def from_unstructured(ds, ds_orig, x_dim="lon", y_dim="lat", cell_dim="cell"):

    if not isinstance(ds.indexes.get(cell_dim), pd.MultiIndex):
        ds = ds.set_index({cell_dim: (y_dim, x_dim)})

    ds = ds.unstack(cell_dim)

    return xr.align(ds, ds_orig, join="right")[0]

I don't comment the examples - have a look at the "Dimensions":

Example 1: 1D lat and lon coords

air = xr.tutorial.open_dataset("air_temperature")
air.attrs = {}

print("\n# Original dataset:\n")
print(air)

air_unstructured = to_unstructured(air)

print("\n# Unstructured:\n")
print(air_unstructured)

air_landonly = to_landonly(air)

print("\n# Land only:\n")
print(air_landonly)

air_restored = from_unstructured(air_landonly, air)

print("\n# Restored from land-only:\n")
print(air_restored)
Original dataset:
 - Dimensions:  (lat: 25, time: 2920, lon: 53)
Unstructured:
 - Dimensions:  (time: 2920, cell: 1325)
Land only:
 - Dimensions:  (time: 2920, cell: 517)
Restored from land-only:
 - Dimensions:  (lat: 25, time: 2920, lon: 53)

Example 2: 2D lat and lon coords

rasm = xr.tutorial.open_dataset("rasm")
# we require coords...
rasm = rasm.assign_coords(x=rasm.x, y=rasm.y)
rasm.attrs = {}

print("\n# Original dataset:\n")
print(rasm)

rasm_unstructured = to_unstructured(rasm, x_dim="x", y_dim="y")
print("\n# Unstructured:\n")
print(rasm_unstructured)

rasm_landonly = to_landonly(rasm, x_dim="x", y_dim="y", x_coords="xc", y_coords="yc")
print("\n# Land only:\n")
print(rasm_landonly)

rasm_restored = from_unstructured(rasm_landonly, rasm, x_dim="x", y_dim="y")

print("\n# Restored from land-only:\n")
print(rasm_restored)
Original dataset:
 - Dimensions:  (time: 36, y: 205, x: 275)
Unstructured:
 - Dimensions:  (time: 36, cell: 56375)
Land only:
 - Dimensions:  (time: 36, cell: 23472)
Restored from land-only:
 - Dimensions:  (x: 275, time: 36, y: 205)
@mathause
Copy link
Member Author

mathause commented Dec 3, 2021

Here is how one could attach the original coords to the unstructured dataset:

def renamed_orig_coords(ds, x_dim="lon", y_dim="lat"):

    # read dims from dataset
    dsc = ds[[y_dim, x_dim]]

    # if len(dsc[c].dims -> avoid attaching coords like "height"
    
    name_dict = {c: f"{c}_orig" for c in dsc.coords if len(dsc[c].dims)}
    coords = dsc.rename(name_dict).coords

    return coords


def restore_orig_coords_as_dataset(ds, x_dim="lon", y_dim="lat"):

    # read dims from dataset
    dsc = ds[[f"{y_dim}_orig", f"{x_dim}_orig"]]

    name_dict = {c: c.replace("_orig", "") for c in dsc.coords}
    coords_ds = dsc.rename(name_dict)

    return coords_ds

Pros

  • Can round-trip unstructured grid (from which we removed the ocean grid points)
  • No external knowledge needed
  • Works for any original grid (as far as I can tell)

Cons

  • It's ugly
  • Does not work with DataArray, only Dataset

Example 1

print(air_unstructured.assign_coords(renamed_orig_coords(air)))
<xarray.Dataset>
Dimensions:   (time: 2920, cell: 1325, lat_orig: 25, lon_orig: 53)
Coordinates:
  * time      (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
    lat       (cell) float64 75.0 75.0 75.0 75.0 75.0 ... 15.0 15.0 15.0 15.0
    lon       (cell) float64 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * lat_orig  (lat_orig) float32 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
  * lon_orig  (lon_orig) float32 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
Dimensions without coordinates: cell
Data variables:
    air       (time, cell) float32 241.2 242.5 243.5 244.0 ... 296.5 296.2 295.7

Example 2

print(rasm_unstructured.assign_coords(renamed_orig_coords(rasm, x_dim="x", y_dim="y")))
<xarray.Dataset>
Dimensions:  (time: 36, cell: 56375, y_orig: 205, x_orig: 275)
Coordinates:
  * time     (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
    xc       (cell) float64 189.2 189.4 189.6 189.7 ... 17.65 17.4 17.15 16.91
    yc       (cell) float64 16.53 16.78 17.02 17.27 ... 28.26 28.01 27.76 27.51
    y        (cell) int64 0 0 0 0 0 0 0 0 0 ... 204 204 204 204 204 204 204 204
    x        (cell) int64 0 1 2 3 4 5 6 7 8 ... 267 268 269 270 271 272 273 274
  * y_orig   (y_orig) int64 0 1 2 3 4 5 6 7 ... 197 198 199 200 201 202 203 204
  * x_orig   (x_orig) int64 0 1 2 3 4 5 6 7 ... 267 268 269 270 271 272 273 274
    yc_orig  (y_orig, x_orig) float64 16.53 16.78 17.02 ... 28.01 27.76 27.51
    xc_orig  (y_orig, x_orig) float64 189.2 189.4 189.6 ... 17.4 17.15 16.91
Dimensions without coordinates: cell
Data variables:
    Tair     (time, cell) float64 nan nan nan nan nan ... 29.8 28.66 28.19 28.21

@mathause
Copy link
Member Author

Note: to_landonly should not convert to a unstructured grid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant