Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy mask_fillvalues preprocessor function #2340

Merged
merged 13 commits into from
Jun 4, 2024
Merged

Conversation

bouweandela
Copy link
Member

@bouweandela bouweandela commented Feb 22, 2024

Description

Make the mask_fillvalues preprocessor function lazy.

Note that this changes how provenance is recorded: now all cubes/datasets that are used as input to the function are considered ancestors of the output cubes/datasets. Previously, input cubes/datasets were only considered ancestors if they contributed to the shared mask. However, for lazy computations, this requires loading all the input data from disk and doing the computations before mask_fillvalues twice: once for computing the provenance and once for computing the result. This seemed too much of a computational burden for a feature that is very little used and therefore the choice was made to have less fine-grained provenance.

Needs Iris v3.9 (SciTools/iris#5775)


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the πŸ›  Technical or πŸ§ͺ Scientific review.


To help with the number pull requests:

Copy link

codecov bot commented Feb 22, 2024

Codecov Report

All modified and coverable lines are covered by tests βœ…

Project coverage is 94.61%. Comparing base (d37a502) to head (119fefb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2340      +/-   ##
==========================================
+ Coverage   94.52%   94.61%   +0.09%     
==========================================
  Files         246      246              
  Lines       14036    14032       -4     
==========================================
+ Hits        13267    13276       +9     
+ Misses        769      756      -13     

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

@bouweandela bouweandela mentioned this pull request Feb 26, 2024
62 tasks
@bouweandela bouweandela added preprocessor Related to the preprocessor dask related to improvements using Dask labels May 26, 2024
@bouweandela
Copy link
Member Author

The tests pass, but the tests are not very comprehensive. Comparing data coming out diagnostic_1/ta of examples/recipe_preprocessor_test.yml with the comparison tool suggests results are different.

@bouweandela
Copy link
Member Author

bouweandela commented May 30, 2024

Comparing data coming out diagnostic_1/ta of examples/recipe_preprocessor_test.yml with the comparison tool suggests results are different.

This has now been solved: da.ones_like(a) with a a masked dask array drops the mask, while np.ones_like(a) with a a masked numpy array keeps the mask.

@bouweandela bouweandela marked this pull request as ready for review May 30, 2024 18:31
@bouweandela bouweandela requested a review from schlunma May 30, 2024 18:33
@schlunma schlunma modified the milestone: v2.12.0 Jun 3, 2024
Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bouwe, looks very good to me already! Just have some minor comments.

esmvalcore/preprocessor/_mask.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@schlunma
Copy link
Contributor

schlunma commented Jun 4, 2024

Thanks Bouwe! Will approve once the merge conflict is solved and all tests are green πŸš€

@bouweandela
Copy link
Member Author

Thanks for reviewing @schlunma! Should be good to go now.

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸš€

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work folks - @bouweandela you want this in the 2.11 release or you think it can wait for later on after the release? If it can wait, pls add M2.12 to it 🍺

@valeriupredoi valeriupredoi merged commit af5490f into main Jun 4, 2024
6 checks passed
@valeriupredoi valeriupredoi deleted the lazy-mask-fillvalues branch June 4, 2024 14:37
@bouweandela
Copy link
Member Author

Thanks V! This is for the v2.12 release.

@valeriupredoi
Copy link
Contributor

good man 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask related to improvements using Dask preprocessor Related to the preprocessor
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants