Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_cf loads all data into memory when grid_mapping is missing since .metpy.units checks .data #1748

Closed
jthielen opened this issue Mar 5, 2021 · 5 comments · Fixed by #1832
Labels
Area: Xarray Pertains to xarray integration Type: Bug Something is not working like it should
Milestone

Comments

@jthielen
Copy link
Collaborator

jthielen commented Mar 5, 2021

In the process of troubleshooting #1705, I followed the code to discover that for all variables without a grid_mapping, MetPy will inadvertently load that variable's data into memory when performing the lat/lon fallback check (since check_axis looks for units, and units checks the type of var.data to see if it is a Quantity, and var.data when a lazy-loaded array ends up loading the data).

While the more direct fix may just be removing the check_axis conditional (which should no longer even be needed after #1651 ...I guess I should have cleared it out then), that wouldn't change the root problem that looking up .metpy.units on a DataArray will load lazy data into memory just because of the type check. Unless there are other suggestions on how to handle this, I would propose

  • For 1.0.1, "cheating" and using xarray private API to check against .variable._data instead of .data so that loading isn't triggered
  • Propose an upstream change to have a way in the public API to get the actual data object xarray is wrapping even if it is an internal lazy-loader so that our type checks don't have side effects.
@jthielen jthielen added Type: Bug Something is not working like it should Area: Xarray Pertains to xarray integration labels Mar 5, 2021
@jthielen jthielen added this to the 1.0.1 milestone Mar 5, 2021
@kgoebber
Copy link
Collaborator

kgoebber commented Mar 8, 2021

I'll just confirm that I have run into this issue as well and it took me a while to realize why some loading of remote datasets was failing!

It would be good to have the "cheating" quick fix for now as we work to a more permanent solution. It will help people not have code fail for something it would be very difficult for them to actually know why their code was failing.

@dopplershift
Copy link
Member

👍 to getting something in for this release. How low risk is it to use the private API?

@jthielen
Copy link
Collaborator Author

jthielen commented Mar 9, 2021

How low risk is it to use the private API?

._data on Variable has been quite stable and I don't see any reason why they would change it, but then again, we've been broken before for relying on xarray's private API.

@dopplershift
Copy link
Member

If it's been around awhile, then I guess using that is the best path forward in the interim--we test against git master for a reason. We should then progress on getting an upstream method so we can move to that.

@dopplershift
Copy link
Member

#1835 is the tracker issue for a more permanent fix that uses a public API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Xarray Pertains to xarray integration Type: Bug Something is not working like it should
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants