Implementation of salem-style x, y, and z coordinates #14

jthielen · 2021-11-20T00:32:32Z

Change Summary

As alluded to in #2, including dimension coordinates in the grid mapping/projection space is a key feature for integrating with other tools in the ecosystem like metpy and xgcm. In this (draft) PR, I've combined code ported from salem with some of my own one-off scripts and what already exists in xwrf to meet this goal. In particular, this introduces a pyproj dependency (for CRS handling and transforming the domain center point from lon/lat to easting/northing). Matching the assumptions already present in xwrf and salem, this implementation assumes we do not have a moving domain (which simplifies things greatly). Also, this implements the c_grid_axis_shift attr as-needed, so xgcm should be able to interpret our coords automatically, eliminating the need for direct handling (like #5) in xwrf.

Also, because it existed in salem and my scripts alongside the dimension coordinate handling, I also included my envisioned diagnostic field calculations. These are deliberately limited to only those four fields that require WRF-specific handling:

~~ 'T' going to potential temperature has a magic number offset of 300 K~~
~~ 'P' and 'PB' combine to form pressure, and are not otherwise used~~
~~ 'PH' and 'PHB' combine to form geopotential, and are not otherwise used~~
~~ Geopotential to geopotential height conversion depends on a particular value of g (9.81 m s**2) that may not match the value used elsewhere~~

Unless I'm missing something, any other diagnostics should be derivable using these or other existing fields in a non-WRF-specific way (and so, fit outside of xwrf). If the netcdf4 backend already handles Dask chunks, then this should "just work" as it is currently written. However, I'm not sure how this should behave with respect to lazy-loading when chunks are not specified, so that is definitely a discussion to have in relation to #10.

~~Right now, no tests are included, as this is just a draft implementation to get the conversation started on how we want to approach these features. So, please do share your thoughts and ask questions!~~

Related issue number

Closes Add to_xgcm_grid_dataset function to help read into XGCM #3
Closes Coordinate UX #11

Checklist

Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable

lpilz · 2021-11-23T10:12:20Z

Good work on the PR, I think this is a nice starting point. Did you get the dask integration working yet? I tried it and somehow the calculation, especially of the air_pressure diagnostic doesn't seem to be delayed. As a first try, I implemented _check_if_dask as hasattr(arr, 'chunks'), so these should be dask arrays which are getting added.

lpilz

Some minor comments :)

xwrf/grid.py

lpilz · 2021-11-23T10:28:37Z

xwrf/diagnostics.py

+    This operation should be called before destaggering or any other cleaning operation.
+    """
+    # Potential temperature
+    if _check_if_dask(dataset['T']):


Should this only work if dask is present?

I guess we should either

only allow xwrf usage with dask -> check somewhere earlier for dask (like in __init__ or open_dataset)

also allow xwrf usage without dask -> remove these checks

This is really just a placeholder to capture the intent of needing a Dask check of some kind before diagnostics are computed. It would be a good conversation to have as to what level of Dask-based operations are required vs. optional, and if optional, how those choices are specified.

xwrf/diagnostics.py

jthielen · 2021-11-23T16:37:08Z

Good work on the PR, I think this is a nice starting point. Did you get the dask integration working yet? I tried it and somehow the calculation, especially of the air_pressure diagnostic doesn't seem to be delayed. As a first try, I implemented _check_if_dask as hasattr(arr, 'chunks'), so these should be dask arrays which are getting added.

Thanks! And no, Dask isn't working with this yet since I'm not sure how to get Dask-backed arrays through the wrapped netcdf4 backend yet. Once I (or someone else) can figure out how best to route the chunks argument in open_dataset through the backend API properly, then this should be closer to working.

lpilz · 2021-11-23T16:46:46Z

And no, Dask isn't working with this yet since I'm not sure how to get Dask-backed arrays through the wrapped netcdf4 backend yet.

I thought so. I'm also not really sure on how to do this and not getting a huge amount of info from the dask and xarray documentations. Maybe @andersy005 can help?

andersy005 · 2021-11-23T18:13:33Z

Dask isn't working with this yet since I'm not sure how to get Dask-backed arrays through the wrapped netcdf4 backend yet.

chunks are handled by xarray. So, I'd expect passing chunks in open_dataset() to suffice:

In [5]: ds = xr.open_dataset(path, engine="xwrf", chunks={"south_north": 50})

In [6]: ds
Out[6]: 
<xarray.Dataset>
Dimensions:  (Time: 1, south_north: 546, west_east: 480)
Coordinates:
    XLONG    (south_north, west_east) float32 dask.array<chunksize=(50, 480), meta=np.ndarray>
    XLAT     (south_north, west_east) float32 dask.array<chunksize=(50, 480), meta=np.ndarray>
Dimensions without coordinates: Time, south_north, west_east
Data variables:
    Q2       (Time, south_north, west_east) float32 dask.array<chunksize=(1, 50, 480), meta=np.ndarray>
    PSFC     (Time, south_north, west_east) float32 dask.array<chunksize=(1, 50, 480), meta=np.ndarray>

P.S.: The existing backend plugin isn't robust enough(things such as time decoding aren't handled properly when using dask)... I plan to look into this at some point. Also, notice how XLONG and XLAT are chunked coordinates... They should have been loaded in memory instead

jthielen · 2021-11-23T18:29:19Z

chunks are handled by xarray. So, I'd expect passing chunks in open_dataset() to suffice:
...
P.S.: The existing backend plugin isn't robust enough(things such as time decoding aren't handled properly when using dask)... I plan to look into this at some point. Also, notice how XLONG and XLAT are chunked coordinates... They should have been loaded in memory instead

Alright! So if I'm interpreting what you're saying correctly along with https://github.com/pydata/xarray/blob/main/xarray/backends/api.py#L335-L512, chunks is actually handled outside/after the backend API portion? If so, I think this means we can't have direct Dask operations within the backend, but would rather need to design custom backend arrays that play nicely with the Dask chunking xarray itself does, or re-evaluate the approach for derived quantities so that they are outside the backend. Perhaps the intake-esm approach could help in that regard at least?

andersy005 · 2021-11-23T19:33:00Z

If so, I think this means we can't have direct Dask operations within the backend, but would rather need to design custom backend arrays that play nicely with the Dask chunking xarray itself does, or re-evaluate the approach for derived quantities so that they are outside the backend. Perhaps the intake-esm approach could help in that regard at least?

Wouldn't creating custom backend arrays be overkill? Assuming we want to support reading files via the Python-netCDF4 library, we might be able to write a custom data store that borrows from xarray's NetCDF4DataStore: https://github.com/pydata/xarray/blob/5db40465955a30acd601d0c3d7ceaebe34d28d11/xarray/backends/netCDF4_.py#L291. With this custom datastore, we would have more control over what to do with variables, dimensions, attrs before passing them to xarray. Wouldn't this suffice for the data loading (without the derived quantities)?

I think there's value in keeping the backend plugin simple (e.g. performing simple tasks such as decoding coordinates, fixing attributes/metadata, etc) and everything else outside the backend. Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

Some of the benefits of deriving quantities outside the backend are that this approach:

(1) doesn't obfuscate what's going on,
(2) gives users the opportunity to fix aspects of the dataset that might be missed by xwrf during data loading before passing this cleaned dataset to the functionality for deriving quantities.
(3) removes the requirement for deriving quantities to be a lazy operation i.e. if your dataset is in memory, deriving the quantity is done eagerly...

andersy005 · 2021-11-23T19:37:32Z

Some of the benefits of deriving quantities outside the backend are that this approach:

Also, Wouldn't it be beneficial for deriving quantities to be backend agnostic? I'm imagining cases in which the data have been post-processed and saved in a different format (e.g. Zarr) and you still want to be able to use the same code for deriving quantities on the fly.

lpilz · 2021-11-23T20:48:58Z

Deriving quantities doesn't seem simple enough to warrant having this functionality during the data loading.

This sounds like it factors directly into the "keep the solutions as general as possible (so that maybe also MPAS can profit from it)" discussion. However, I feel that we have to think about the user-perspective too. I don't have any set opinions on this and we should definitely discuss this maybe in a larger group too. Here some thoughts on this so far:

I think the reason users like wrf-python is because it is an easy one-stop-shop for getting wrf output to work with python - this is especially true because lots of users are scientists and not software engineers or programmers. I personally take from this point that it would be prudent to keep the UX as easy as possible. I think this is what the Backend-approach does really well. Basically users just have to add the engine='xwrf' kwarg and then it just works (TM). Meaning that it provides the users with CF-compliant de-WRFified meteo data. Also, given that the de-WRFification of the variable data is not too difficult (it's basically just adding fields for three variables), I think the overhead in complexity wouldn't be too great. However, while I do see that it breaks the conceptual barrier between data loading (and decoding etc.) and computation, this breakage would be required in order to provide the user with meteo data rather than raw wrf fields.

@andersy005 do you already have some other ideas on how one could handle this elegantly?

Also, should we move this discussion to a separate issue maybe?

jthielen · 2021-11-23T20:52:53Z

@andersy005 @lpilz I think discussing this API question (what xwrf features go in the backend and what (if any) go elsewhere) is good to discuss on its own issue. I'll refine the scope of this PR to just the coordinate operations (both because of that conversation, and that the approach used here won't really work anyway given how the backends and Dask chunking interact).

xwrf/grid.py

lpilz · 2022-01-17T16:14:59Z

Something is still fishy with the CRS for me. If I transform like this:

trf = Transformer.from_crs(CRS.from_epsg(4326), CRS.from_string(str(ds['wrf_projection'].values)))
x, y = trf.transform(8.553972274305062, 49.509090717658204)

and then use ds.interp(west_east=x, south_north=y), I get N 49.38863641, E 8.79648642 which is quite a ways off from the starting point. Can anyone point me to what I'm doing wrong?

lpilz

Suggestion to add XGCM-attributes to vertical coordinates

xwrf/io_plugin.py

jthielen · 2022-01-19T17:04:30Z

I've pushed some updates to remove the Berlin-specific conditions, fix the TRUELAT2 missing case, and to add CF/COMODO attrs to the vertical coords!

Something is still fishy with the CRS for me. If I transform like this:
trf = Transformer.from_crs(CRS.from_epsg(4326), CRS.from_string(str(ds['wrf_projection'].values)))
x, y = trf.transform(8.553972274305062, 49.509090717658204)
and then use ds.interp(west_east=x, south_north=y), I get N 49.38863641, E 8.79648642 which is quite a ways off from the starting point. Can anyone point me to what I'm doing wrong?

I unfortunately don't quite have the right intuition to tell if that's a datum shift error as described in https://fabienmaussion.info/2018/01/06/wrf-projection/ or if something else is going on. What dataset are you using? Also, @fmaussion do you have any insight on this?

xwrf/grid.py

lpilz · 2022-01-20T14:07:06Z

I attached the dataset in question (had to rename it to zip but it's still a netCDF file).
wrfout_d03_2018-10-18_00:00:00_only_1time.zip

I don't think this is the problem described in the blog post, because we do explicitly add the spherical datum (a and b) to the Lambert projection, but maybe Fabien has some more insights.

Also: here the code

ds = xr.open_dataset("wrfout_d03_2018-10-18_00:00:00_only_1time.nc", chunks='auto', engine=xwrf.WRFBackendEntrypoint)
from pyproj import Transformer, CRS
trf = Transformer.from_crs(CRS.from_epsg(4326), CRS.from_cf(ds['wrf_projection'].attrs))
x, y = trf.transform(8.553972274305062, 49.509090717658204)
var_in_question = ds.T.isel(bottom_top=0).interp(west_east=x, south_north=y)
print(var_in_question)

fmaussion · 2022-01-20T15:59:44Z

I attached the dataset in question (had to rename it to zip but it's still a netCDF file).

Can Salem read and plot this properly?

fmaussion · 2022-01-20T16:55:15Z

This looks OK in salem to me: https://gist.github.com/fmaussion/97beef70976231a58c914819966643f5

With a great plot haha:

lpilz · 2022-02-02T18:05:27Z

@jthielen Would you mind merging main into your branch? I'm currently already using your code so having it updated would be great :)

lpilz · 2022-02-02T18:06:58Z

Maybe we should add some tests of xwrf against salem.... I still could not get it to work (have recently had trouble with metpy.plots) so asserting it works would be useful

jthielen · 2022-02-11T02:02:29Z

This PR has been reworked to be based on #44 (so for now should not be merged until I can rebase on main after #44 is merged), but is otherwise ready for review! Given the discussion in #36, I've included renaming south_north -> y and others likewise, but let me know whether or not I should keep that here.

jthielen · 2022-02-13T03:55:19Z

Now that #44 has been merged, this has been updated to just include the relevant commit!

andersy005

Looks good to me! 👍🏽

The code coverage for postprocess.py is relatively low but we can address this in a separate PR

andersy005 · 2022-02-14T14:47:14Z

Also, this implements the c_grid_axis_shift attr as-needed, so xgcm should be able to interpret our coords automatically, eliminating the need for direct handling (like #5) in xwrf.

Also, should we add a test for xgcm interoperability to ensure xgcm is able to construct the grid object?

lpilz · 2022-02-15T17:31:55Z

Unfortunately, with the new changes metpy.parse_cf() breaks for me as it seems to try to cast XTIME to meters. I am getting a DimensionalityError

var = var.metpy.convert_coordinate_units(coord_name, 'meters')

DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'meter' ([length])

I tried adding some units attributes and setting the coordinates kwarg of metpy.parse_cf(), but nothing helped. Does anybody have a hint as to what is happening here?

kmpaul · 2022-02-15T22:19:40Z

Unfortunately, with the new changes metpy.parse_cf() breaks for me as it seems to try to cast XTIME to meters. I am getting a DimensionalityError

var = var.metpy.convert_coordinate_units(coord_name, 'meters')
DimensionalityError: Cannot convert from 'dimensionless' (dimensionless) to 'meter' ([length])
I tried adding some units attributes and setting the coordinates kwarg of metpy.parse_cf(), but nothing helped. Does anybody have a hint as to what is happening here?

@lpilz: This sound like a good test to add.

jthielen · 2022-02-15T22:46:26Z

@lpilz Thanks for the heads up on that! Unidata/MetPy#2347 has been created upstream in MetPy to resolve it.

I've added a test against xgcm, and I can definitely do likewise against MetPy (and cf_xarray?) next for coordinate identification (it'll probably have to be version bounded given Unidata/MetPy#2347, but oh well).

lpilz · 2022-02-16T15:14:38Z

Thanks for always fixing these so quickly. Awesome work @jthielen!

While I in principle agree that we should test this against as many utilities which might be used in conjunction with xwrf as possible, maybe this can be done in future PRs. We could e.g. do this in the documentation phase when writing tutorials and stuff... I think this PR is quite nice already and doesn't have to be the MOAPRs ;)

kmpaul · 2022-02-16T16:19:44Z

@lpilz @jthielen: I totally agree that this can be done in future PRs. I'd just make a note of it in a new issue and then move forward. The fewer PRs we have blocking other work, the better.

Awesome work on this!

kmpaul

Looks great @jthielen! Thanks for all the great work!

kmpaul · 2022-02-16T16:22:43Z

Merge whenever ready.

jthielen · 2022-02-16T21:24:09Z

With 3 approvals, I went ahead and merged it after resolving the merge conflicts with main. #49 has been created to keep track of the testing-related concerns brought up here!

jthielen added the enhancement New feature or request label Nov 20, 2021

This was referenced Nov 20, 2021

Add to_xgcm_grid_dataset function to help read into XGCM #3

Closed

Coordinate UX #11

Closed

lpilz reviewed Nov 23, 2021

View reviewed changes

jthielen changed the title ~~Draft implementation of salem-style coordinates and basic diagnostics~~ Draft implementation of salem-style coordinates Nov 23, 2021

jthielen force-pushed the salem-style-coords branch from 974ffdc to 413afe2 Compare November 23, 2021 21:00

jthielen mentioned this pull request Nov 23, 2021

Division of Features in Top-Level API #20

Closed

kmpaul added this to In progress in Development Dec 16, 2021

lpilz reviewed Jan 5, 2022

View reviewed changes

xwrf/grid.py Outdated Show resolved Hide resolved

fmaussion reviewed Jan 7, 2022

View reviewed changes

xwrf/grid.py Outdated Show resolved Hide resolved

lpilz reviewed Jan 18, 2022

View reviewed changes

xwrf/io_plugin.py Outdated Show resolved Hide resolved

jthielen force-pushed the salem-style-coords branch from 90af0ad to 7917f31 Compare January 19, 2022 16:59

andersy005 reviewed Jan 19, 2022

View reviewed changes

xwrf/grid.py Outdated Show resolved Hide resolved

jthielen mentioned this pull request Jan 19, 2022

Add xgcm module #5

Closed

jthielen force-pushed the salem-style-coords branch from c4aaf96 to ba8a064 Compare February 11, 2022 01:53

jthielen changed the title ~~Draft implementation of salem-style coordinates~~ Implementation of salem-style x, y, and z coordinates Feb 11, 2022

jthielen marked this pull request as ready for review February 11, 2022 01:54

jthielen force-pushed the salem-style-coords branch 2 times, most recently from 36ce734 to 3223827 Compare February 11, 2022 02:00

jthielen force-pushed the salem-style-coords branch from 3223827 to e6e4f84 Compare February 13, 2022 03:44

andersy005 approved these changes Feb 14, 2022

View reviewed changes

Development automation moved this from In progress to Reviewer approved Feb 14, 2022

jthielen mentioned this pull request Feb 15, 2022

Update coord criteria regex so that X prefixes or _ infixes don't break it Unidata/MetPy#2347

Merged

1 task

kmpaul approved these changes Feb 16, 2022

View reviewed changes

lpilz approved these changes Feb 16, 2022

View reviewed changes

jthielen added 2 commits February 16, 2022 14:17

Implement Salem-style horizontal coordinates with dimension renaming

9f05ec7

Add test with xgcm

9b3c016

jthielen force-pushed the salem-style-coords branch from 8b02525 to 9b3c016 Compare February 16, 2022 21:18

jthielen mentioned this pull request Feb 16, 2022

[MISC]: Improve Testing (Coverage and Cross-Package Integration) #49

Closed

jthielen merged commit 890e63d into xarray-contrib:main Feb 16, 2022

Development automation moved this from Reviewer approved to Done Feb 16, 2022

andersy005 mentioned this pull request Feb 17, 2022

Increase test coverage #50

Merged

3 tasks

jthielen mentioned this pull request Mar 6, 2022

[FEATURE]: Essential diagnostic variables #57

Closed

jthielen mentioned this pull request Sep 20, 2022

[FEATURE]: Pull in coordinates from separate file #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of salem-style x, y, and z coordinates #14

Implementation of salem-style x, y, and z coordinates #14

jthielen commented Nov 20, 2021 •

edited

lpilz commented Nov 23, 2021

lpilz left a comment

lpilz Nov 23, 2021 •

edited

jthielen Nov 23, 2021

jthielen commented Nov 23, 2021

lpilz commented Nov 23, 2021 •

edited

andersy005 commented Nov 23, 2021 •

edited

jthielen commented Nov 23, 2021

andersy005 commented Nov 23, 2021

andersy005 commented Nov 23, 2021 •

edited

lpilz commented Nov 23, 2021

jthielen commented Nov 23, 2021

lpilz commented Jan 17, 2022

lpilz left a comment

jthielen commented Jan 19, 2022

lpilz commented Jan 20, 2022 •

edited

fmaussion commented Jan 20, 2022

fmaussion commented Jan 20, 2022

lpilz commented Feb 2, 2022

lpilz commented Feb 2, 2022

jthielen commented Feb 11, 2022 •

edited

jthielen commented Feb 13, 2022

andersy005 left a comment

andersy005 commented Feb 14, 2022

lpilz commented Feb 15, 2022 •

edited

kmpaul commented Feb 15, 2022

jthielen commented Feb 15, 2022

lpilz commented Feb 16, 2022

kmpaul commented Feb 16, 2022

kmpaul left a comment

kmpaul commented Feb 16, 2022

jthielen commented Feb 16, 2022

Implementation of salem-style x, y, and z coordinates #14

Implementation of salem-style x, y, and z coordinates #14

Conversation

jthielen commented Nov 20, 2021 • edited

Change Summary

Related issue number

Checklist

lpilz commented Nov 23, 2021

lpilz left a comment

Choose a reason for hiding this comment

lpilz Nov 23, 2021 • edited

Choose a reason for hiding this comment

jthielen Nov 23, 2021

Choose a reason for hiding this comment

jthielen commented Nov 23, 2021

lpilz commented Nov 23, 2021 • edited

andersy005 commented Nov 23, 2021 • edited

jthielen commented Nov 23, 2021

andersy005 commented Nov 23, 2021

andersy005 commented Nov 23, 2021 • edited

lpilz commented Nov 23, 2021

jthielen commented Nov 23, 2021

lpilz commented Jan 17, 2022

lpilz left a comment

Choose a reason for hiding this comment

jthielen commented Jan 19, 2022

lpilz commented Jan 20, 2022 • edited

fmaussion commented Jan 20, 2022

fmaussion commented Jan 20, 2022

lpilz commented Feb 2, 2022

lpilz commented Feb 2, 2022

jthielen commented Feb 11, 2022 • edited

jthielen commented Feb 13, 2022

andersy005 left a comment

Choose a reason for hiding this comment

andersy005 commented Feb 14, 2022

lpilz commented Feb 15, 2022 • edited

kmpaul commented Feb 15, 2022

jthielen commented Feb 15, 2022

lpilz commented Feb 16, 2022

kmpaul commented Feb 16, 2022

kmpaul left a comment

Choose a reason for hiding this comment

kmpaul commented Feb 16, 2022

jthielen commented Feb 16, 2022

jthielen commented Nov 20, 2021 •

edited

lpilz Nov 23, 2021 •

edited

lpilz commented Nov 23, 2021 •

edited

andersy005 commented Nov 23, 2021 •

edited

andersy005 commented Nov 23, 2021 •

edited

lpilz commented Jan 20, 2022 •

edited

jthielen commented Feb 11, 2022 •

edited

lpilz commented Feb 15, 2022 •

edited