-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Avoid copying vectorized indexes #10316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. |
Very nice! I have two minor requests:
Can you also add a note in |
OK, @dcherian I believe this is RFAL. The test failures seem to be because dask released a new version yesterday -- for the ones I spot checked, I can reproduce them locally with |
Thanks for tracking this down @jder! I know this is not an easy part of the codebase to edit :) |
@dcherian Thanks for the quick review, feedback, and landing it! |
- Lazily indexed arrays now use less memory to store keys by avoiding copies | ||
in :py:class:`~xarray.indexing.VectorizedIndexer` and :py:class:`~xarray.indexing.OuterIndexer` | ||
(:issue:`10316`). | ||
By `Jesse Rusak <https://github.com/jder>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
While investigating #10311, I noticed that the large memory usage described there was exacerbated by duplicating the passed indexes in
VectorizedIndexer.__init__
. While this could be a defensive copy to avoid someone else mutating the passed ndarrays, looking at the PR it was introduced I would guess this was an unintentional regression as part of supporting duck arrays, sinceasarray
defaults to not copying the data andastype
defaults to copying the data, but cc @dcherian in case that's not true.Also included in this PR is increasing the size of the benchmark to show the improvement more clearly. (On my machine, it's a ~20% time savings.) Happy to break this out or remove it if you'd prefer to leave the benchmarks as is.