Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve subsample across Coreg subclasses and pipelines #436

Merged
merged 15 commits into from
Sep 26, 2023

Conversation

rhugonnet
Copy link
Contributor

@rhugonnet rhugonnet commented Sep 6, 2023

This PR makes the subsample argument consistent across all Coreg subclasses, and allows them to function in pipelines independently for each step. It also reworks Coreg.fit() to be more durable for future chunking routines.

All details in the full discussion here: #428
Only difference with the discussion: Now passing directly inlier_mask to each subclass' _fit_func, so that the subsampling mask can be computed on the final mask of valid pixels within the subclass (valid data changes in the subclass because of NaNs in auxiliary data not yet available to Coreg.fit(): slope, curvature, etc...). This way we will always have the exact number of valid points asked when subsampling 😄!
To perform the subsampling consistently on the final valid mask, each Coreg subclass calls the method get_subsample_on_valid_mask.

Tests significantly improved.

As now each class has a default subsample argument during instantiation, methods that currently rely on optimization without binning (NuthKaab, Deramping)* have 5e5 to avoid very long computing time, otherwise 1 for all others.

Also a small fix to geoutils.raster.subsampling was necessary: GlacioHack/geoutils#402.

Resolves #428
Resolves #243
Resolves #137

@rhugonnet rhugonnet marked this pull request as draft September 6, 2023 23:21
@rhugonnet rhugonnet marked this pull request as ready for review September 23, 2023 05:18
Copy link
Contributor

@erikmannerfelt erikmannerfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this and I love you

Just minor comments!

@@ -52,4 +52,4 @@ dependencies:
- noisyopt

# To run CI against latest GeoUtils
# - git+https://github.com/GlacioHack/GeoUtils.git
- git+https://github.com/GlacioHack/geoutils.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't there a reason we left pulling from github? I think if we make a breaking change on geoutils, xdem will fail on new PRs without it being the "PR's fault". Any way around this? Otherwise, please make an issue for it so we can revert this asap!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should only be used to test that CI passes temporarily on the main branch for devs, we don't want to make a release with it! Should we add a reminder in HOW_TO_RELEASE?

Now that GeoUtils 0.0.15 is published, I can revert it back!

tests/test_coreg/test_affine.py Show resolved Hide resolved
tests/test_coreg/test_affine.py Outdated Show resolved Hide resolved
tests/test_coreg/test_affine.py Outdated Show resolved Hide resolved
# Check that the estimated biases are similar
assert coreg_sub._meta["coefficients"] == pytest.approx(coreg_full._meta["coefficients"], rel=1e-1)

def test_subsample__pipeline(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double underscore is a typo, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No haha it's for subdividing tests more clearly! Many packages do it, and I liked it a lot. I thought it could improve the clarity/organization of our tests that were getting a bit messy.
I think we talked about this in the biascorr PR a bit already! Maybe we should establish guidelines for our tests?

tests/test_coreg/test_base.py Show resolved Hide resolved
tests/test_coreg/test_biascorr.py Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
tests/test_coreg/test_base.py Show resolved Hide resolved
xdem/coreg/base.py Show resolved Hide resolved
@rhugonnet
Copy link
Contributor Author

I love this and I love you

Just minor comments!

Happy you like it!! Thanks a lot for the quick review 😊
Next will come the re-structuration to have a single fit() function for all inputs + be able to pass any optimizer/binning also for Affine classes + some of the rest mentioned in #435.

@rhugonnet
Copy link
Contributor Author

All accounted for, merging and focusing on the next steps! 😄

@rhugonnet rhugonnet merged commit ead2f1b into GlacioHack:main Sep 26, 2023
12 of 13 checks passed
@rhugonnet rhugonnet deleted the subsample_pipeline branch September 26, 2023 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants