New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a grow parameter to sigma_clip et al. #10613
Conversation
Gah, this dratted |
I think you can just move the scipy import to be a local one. astropy/astropy/stats/tests/test_funcs.py Lines 8 to 13 in d65dabb
astropy/astropy/stats/tests/test_funcs.py Line 142 in d65dabb
|
How keen are people for Alternatively, the (I suppose the last resort would be to add |
Thanks, @saimn, I always forget that. Do we need that comment |
The way it is done currently in astropy is to move the scipy import locally where it is needed, so people will need scipy only if they use a feature requiring it, and get an About the comment, just |
OK, so let's just do |
Having the fast cython version in astropy would be even more interesting, but I'm not sure how |
Could also add a |
I agree that it would be good to think about combining these 2 implementations of sigma clipping (or at least have coherent functionality in one place) at some point (not in the scope of what I'm currently doing).
Any reason why we can't call |
One thing that bothers me slightly here is whether we'll always want to "grow" bad pixels along the same axes that sigma is calculated over. Obviously it makes sense not to grow along a model set axis, because the models could be independent, but eg. if you have a 3D cube from an IFU, pixels will only be neighbours on the detector in up to 2 of the 3 dimensions, even though you're measuring sigma over all 3 (though OTOH if you've already interpolated you've probably spread the defect in 3D). Also, you can have a model set consisting of adjacent image rows, in which case it might make sense to grow in both dimensions anyway. So perhaps there's an argument for an additional parameter (pun not intended), but I think this is good baseline behaviour and does everything IRAF does. |
Because it's implemented with a C extension, not Cython (I had a look just before ;)). So linking to it is more difficult, it would be easier if they provide a cython wrapper.
The kernel could be passed to the cython routine, but what I meant is that to support the grow case it would be hard to do this with a generic ND function. Without grow it would be easier, since the ND case, with or without axis, can be seeing as a set of 1D cases. |
Thanks again, @jehturner. As @saimn noted above, the only required dependency of One other quick note -- new keywords need to be added at the end of the function signature to not break the API (e.g., in case keywords are used as positional arguments). On the other hand, we've discussed using the |
OK. This seemed like the most logically-consistent place for the new keyword, but it's really not a big deal here. On the other hand, if this is policy, it might be a good idea to consider doing some API clean up for every major release, @pllim? I'm not sure how one would keep track of that though. |
Actually, it does get a bit messier in astropy/astropy/stats/sigma_clipping.py Lines 515 to 517 in f47f629
? |
Anyway, I have that change locally if you want it. When the other PR is merged I'll rebase from it. |
The test file already has a block like this with |
This complicates the existing |
f47f629
to
db96058
Compare
Oh, right, need to remove the default function name from the regexes because of the optional bottleneck thing. Will have to do that next week as I must do something else urgently. Not sure off hand why readthedocs has started complaining about axis... |
So we expect |
I've got rid of the regex for str/repr again and instead matched the substrings corresponding to This raises a minor question though... I see that I'm still not sure about the readthedocs failure, but presumably it's the warnings about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single backticks create sphinx references, so you need double backticks for the axis occurrences.
Does this look ready to merge? It would be helpful to have it in |
Any final thoughts on this, @larrybradley? It would help our development significantly if we could get it into Do we definitely want to keep the new parameter at the end of the function signature, as you had requested, even though it splits up the parameters that control the clipping (bearing in mind that this won't go in a bug fix release)? Thanks! |
@jehturner I agree that it's much nicer to group common keywords together, but unfortunately reordering keywords breaks backwards compatibility. I'd be in favor of adding the To do that, add a I'll review the rest of the PR. |
@larrybradley - Some major packages started to use kwarg only arguments even as they introduced an API break. For astropy I think it makes sense to wait with such a change until the next LTS release, otherwise it may lead to confusions. And I don't think the benefits outbit the harm for a a kwarg reshuffle. We suffered this same problem with astroquery a lot, but had to stick to the no API break, and thus we gradually started to use kwarg only for new functionality, and will enforce more for the next bigger release (but keep in mind, astroquery has a linear release model without an LTS series). So, long story short, while the reshuffle makes sense, I don't think it's appropriate reason for an API break. And wait with the kwarg only for existing functionality, but have a push for a full refactor for v5.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jehturner! BTW, don't think the CI tests are going to pass until you rebase this PR on the current master. I needed to rebase to run the tests locally.
Yeah, will do in a min. Just putting children in bed in between changing stuff... |
Does |
… re.match to match the substrings containing a median/std repr (so it doesn't only work with bottleneck installed or depend on the CPython __str__ implementation).
…of solving readthedocs failure. Co-authored-by: Simon Conseil <contact@saimon.org>
for how to interpret the grow parameter. Co-authored-by: Larry Bradley <larry.bradley@gmail.com>
Co-authored-by: Larry Bradley <larry.bradley@gmail.com>
ac0a10b
to
adcf7ab
Compare
As long as you put it somewhere in the commit message (doesn't matter which interface creates the commit message), CI will honor the directive. |
Thanks, @jehturner! |
👍 Thanks for finding some time to finish looking at it. |
Very sorry for the delay. |
No worries. |
Description
Adds a
grow
parameter toSigmaClip
,sigma_clip
&sigma_clipped_stats
, to expand the masking of each deviant value to its neighbours within a specified pixel radius. This is a common feature of bad pixel rejection algorithms, notably in IRAF, and will also be useful forFittingWithOutlierRejection
inmodeling
.This will need merging with my PR #10610 from yesterday.
I've used
grow=False
as the default value, which seems clear to me, but0.
would also work (and is fairly clear) if people prefer a fixed type (but that's not the case elsewhere).This could perhaps be optimized slightly by caching the growth kernel calculated in
SigmaClip.__call__
-- but since it depends on the dimensionality of the input data and the value ofaxis
, it would need storing in a dict byaxis
value or something. I would think the gain would be very marginal for the array sizes we typically deal with in astronomy and existing uses withoutgrow
will be unaffected either way, so I'm inclined to consider this "premature optimization" until demonstrated otherwise.