Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert grids to/ from unstructured #217

Merged
merged 11 commits into from Nov 11, 2022

Conversation

mathause
Copy link
Member

@mathause mathause commented Nov 2, 2022


This is the first of several of PRs which will add helper functions to deal with Dataset and DataArray objects. This first one deals with converting arrays from regular grids to unstructured grids (i.e. flattening the lon and lat dimension). What will follow are PRs on masking (land & antarctica), calculating the global mean, and potentially, stacking ensembles, and calculating anomalies.

I do have a number of questions - most of them concerning naming.


  1. General question: The functions are currently in mesmer.xarray_utils. I imagined this as follows:

    import mesmer.xarray_utils as mxu
    
    ds = mxu.to_unstructured(ds)

    Does this sound good? I had a number of other ideas but start to prefer this name - as the functionality is xarray-specific. However, I am open to suggestions.

    Other ideas I had:

    mesmer.data_utils
    mesmer.preproc
    mesmer.process
    mesmer.tools
    mesmer.utils
    mesmer.xarray_utils
  2. I now went with cell_dim="cell" - objections? See discussion in unstructured grid #105.

  3. Should the function be (quasi-)top-level (1) or not (2)?

    import mesmer.xarray_utils as mxu
    
    ds = mxu.to_unstructured(ds) # (1)
    ds = mxu.grid.to_unstructured(ds) # (2)
  4. What about the name unstructured? I think it comes from models that have a truly unstructured grid (e.g. ICON), and their spatial data has to be 1D. Thus, the name may not be entirely correct in this context as in most cases we will be flattening a regular grid. So what is a better name? (I mean I am happy to keep the current name if people understand what it means.)

  5. There is a difficulty with round-tripping because we remove cells that are NaN. We need the original coordinates to restore the original coordinates. I decided that the user will be responsible to supplying them. Other solutions did not convince me (round-trip land-only unstructured grid #117). This may look like:

    import mesmer.xarray_utils as mxu
    
    ds = mxu.grid.to_unstructured(ds) 
    coords_orig = ds.coords.to_dataset()[["lon", "lat"]]
    
    ds = mxu.grid.from_unstructured(ds, coords_orig) # (2)

This is a draft - I plan to tag some people later.

mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved
mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved
@mathause mathause marked this pull request as ready for review November 3, 2022 13:36
@codecov-commenter
Copy link

codecov-commenter commented Nov 3, 2022

Codecov Report

Base: 84.71% // Head: 85.03% // Increases project coverage by +0.31% 🎉

Coverage data is based on head (5a37c18) compared to base (1687f37).
Patch coverage: 96.00% of modified lines in pull request are covered.

❗ Current head 5a37c18 differs from pull request most recent head 3b38396. Consider uploading reports for the commit 3b38396 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #217      +/-   ##
==========================================
+ Coverage   84.71%   85.03%   +0.31%     
==========================================
  Files          33       35       +2     
  Lines        1452     1483      +31     
==========================================
+ Hits         1230     1261      +31     
  Misses        222      222              
Flag Coverage Δ
unittests 85.03% <96.00%> (+0.31%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mesmer/xarray_utils/grid.py 95.83% <95.83%> (ø)
mesmer/xarray_utils/__init__.py 100.00% <100.00%> (ø)
mesmer/utils/regionmaskcompat.py 95.45% <0.00%> (+0.33%) ⬆️
mesmer/stats/linear_regression.py 91.02% <0.00%> (+0.35%) ⬆️
mesmer/calibrate_mesmer/train_gv.py 93.33% <0.00%> (+1.66%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@znichollscr
Copy link
Contributor

  1. I had a number of other ideas but start to prefer this name - as the functionality is xarray-specific

If you like it, let's go with that

2. objections?

None from me

3. Should the function be (quasi-)top-level (1) or not (2)?

I would do 2 (users can always do from mesmer.xarray_utils.grid import to_unstructured if they want something else

4. What about the name unstructured?

Hmm yes tricky. My instinct would be 'flat' or 'long' as they feel more like the usual data science terms. You could use the names stack_lat_lon and unstack_lat_lon so it's a bit clearer what is actually going on?

5. There is a difficulty with round-tripping because we remove cells that are NaN

Do we have to do this? Maybe it's naive, but I feel like we should be able to preserve nans unless the user doesn't want to? If we have to keep the nans, then sounds fine (I'll make suggestions during review)

Thanks for a great PR!

Copy link
Contributor

@znichollscr znichollscr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, couple of questions about naming and exactly how things are setup

mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved
mesmer/xarray_utils/grid.py Show resolved Hide resolved
mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved
Copy link
Member Author

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the helpful feedback! I agree stack and unstack are more intuitive names (flatten might work as well but I would not know how to name the inverse operation). Let's see how this looks like:

tas_masked = xru.mask.mask_ocean(tas)

tas_stacked = xru.grid.stack_lat_lon(tas_masked)

tas_unstacked = xr.grid.unstack_lat_lon(tas_stacked, coords_orig)

Maybe cell_dim="cell" should then be stack_dim="cell"?


We remove the <NA> because our statistical functions don't handle them, and, because we don't emulate the ocean (currently), it allows to remove ~70% of the data. That this cannot be round-tripped is an annoying little thing that bugs me without end. A minimal example:

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

%matplotlib

air = xr.tutorial.open_dataset("air_temperature")
coords_orig = air[["lat", "lon"]]

air = air.air
# add a row of Na -> e.g. by masking the ocean
air[:, 12, :] = np.NaN

air_stacked = air.stack(cell=("lat", "lon"))
air_stacked = air_stacked.dropna("cell")

air_unstacked = air_stacked.unstack("cell")
air_unstacked_aligned = xr.align(air_unstacked, coords_orig, join="right")[0]

print(f"{air.lat.shape=}")
print(f"{air_unstacked.lat.shape=}")
print(f"{air_unstacked_aligned.lat.shape=}")

# ---

f, axs = plt.subplots(2, 1)
# difficult to see but one lat-band is too big
air_unstacked.isel(time=0).plot(ax=axs[0])
air_unstacked_aligned.isel(time=0).plot(ax=axs[1])

mesmer/xarray_utils/grid.py Show resolved Hide resolved
mesmer/xarray_utils/grid.py Outdated Show resolved Hide resolved
@znicholls
Copy link
Collaborator

Maybe cell_dim="cell" should then be stack_dim="cell"?

Yep sounds good to me!

mathause and others added 4 commits November 10, 2022 15:08
Co-authored-by: znichollscr <114576287+znichollscr@users.noreply.github.com>
Co-authored-by: Zeb Nicholls <zebedee.nicholls@climate-energy-college.org>
Copy link
Member Author

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I implemented the naming updates. I also changed cell_dim="cell" to stack_dim="gridcell".

I'll merge in a few days unless I get more feedback.

mesmer/xarray_utils/grid.py Show resolved Hide resolved
docs/source/api.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@znicholls znicholls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, nice work!

@mathause
Copy link
Member Author

Thanks! My plan is to create a tutorial with a potential workflow once all these PRs are done.

@mathause mathause merged commit e8430f4 into MESMER-group:main Nov 11, 2022
@mathause mathause deleted the to_from_unstructured branch November 11, 2022 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

round-trip land-only unstructured grid unstructured grid
4 participants