-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added in snr_thresholding with 1D and 3D tests #509
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising! A few high-level thoughts/questions though:
- Since this basically worked out-of-the-box, perhaps it's worth just trying if it works just as easily with
SpectralCollection
? You could use the exact same test but havespectral_axis
be (4,3,10) (i.e., replicate the existing one 12 times and reshape), and then you can just test if they all match the existing case. If it doesn't work, this PR probably doesn't need to worry about fixing it, but if it does work all the better. - I'm not sure I understand what makes this
higherlevel
? I think it's not that different from SNR which is inanalysis
. (Or it might be "manipulation" depending on the response to my next question) - This function returns essentially a mask. I had been thinking it's more useful to return a shallow copy of the input, with the
mask
set to what is currently returned from this function. Then if the user really just wants the mask they can doresult = snr_threshold(inspec, val).mask
, but if they want a "fully-functional" spectrum with the mask already in it, they can take the return value directly. How does that sound?
2d7e46c
to
536f1d7
Compare
I added I put the two |
In some out-of-band discussion with @brechmos-stsci , @camipacifici, and @nmearl, we realized that At the same time, there might still be an application for the NaN-masked versions, most notably that that might the easiest way to make matplotlib plots of cubes and the like. But there are still some questions there too so I'll make an issue for that. In this PR, though, I think the decision was that |
@brechmos-stsci - I was just looking at the diff here and it seems like there's some template matching stuff in here now. Was there perhaps an unintentional merge or something? |
@eteq argh, let me ck |
Ok. So remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor implementation question, and also some doc items in brechmos-stsci#3 .
One other thing I realized, though: I'm not sure the sign of the mask is right: shouldn't it be masked if the threshod is less than the threshold? I.e., the "good" ones are the pixels that are above the threshold?
Moreover, I imagine there are use cases for both. Perhaps that implies this should be slightly generalized:
- Add a keyword that sets the type of comparison - e.g.
<
/>
/>=
/<=
/==
/!=
. These would all be interpreted as "set the mask to that" meaning "get rid of pixels that meet the condition". - Add a keyword that's just "lessthan" which is either True or False, to limit the scope here to just those cases.
- Split this into two separate functions, one called
snr_greater_than
andsnr_less_than
. (They could use the existing function as a private shared implementation though, and just invert the mask at the end or similar.)
I tend to favor 3, because it's explicit and fairly readable, and I don't really see a strong need for anything other than the less than and greater than use cases. (Or more to the point, we should tell people to just set the mask on their own for anything more complicated.)
I would rather do 1. We should be able to pass an operator or something and it should be clear enough for people how to use (if they want to do, for example, less-than). |
Giving the two options I agree with @brechmos, although I do not really see the use for all those cases. I presume somebody else could and giving the option is a plus. Also, I would do "keep all the pixels that meet the condition", rather than "get rid of pixels that meet the condition". If we give all the options, this has to be super clear in the documentation. Otherwise, we stick to |
Gotcha @camipacifici and @brechmos - I'm ok with 1, and also using the "keep" (vs get rid of) convention. re:
Maybe we adjust this to be a "lite" version of 1 where we only implement |
@eteq See 6a6fe1e for what I am thinking of how it could be implemented. (@camipacifici too) |
I was thinking about this a little further and the third |
Lurking on this discussion, I'm starting to feel like even having an snr_threshold() function might not be a great idea. If this is just basically a one-liner for most use cases, would it be better to give people the one-liner in documentation and tutorials rather than having a function that has to take a bunch of options -- all because we packaged the mask together with the spectrum and uncertainties (for convenience), and are trying to hide that level of complexity from the user. When hiding the complexity introduces more complexity, maybe we should just not try so hard to hide it? The line that does all the work in snr_threshold() is
But then there are lots of lots of tests and there's extensive documentation. So I'm worried this is overkill for this particular operation. |
@brechmos-stsci - I like the @hcferguson - I can see your point here, but I'm concerned about some of the complexities that came out in #516. That is, I think the "simplest" version of this function as a documentation example would be this:
so we could have that be in the docs instead, but that has the disadvantages that 1) it's three lines of code instead of 1, and 2) that way of doing SNR is wrong if the uncertainty is not StdDev (see #523), so by not wrapping it in a function, people will start using it in their science code in a forward-incompatible way. That said, a third way presents itself now that I think about it: we could change this function to be a method on |
I am totally with @hcferguson on this one -- this PR is becoming a bit over engineered with trying to encapsulate all use cases inside a single function call. But the solution @eteq mentioned above, I think, is excellent. It lets the user have much more freedom (the current implementation in this PR doesn't handle the case of a chain of operations, e.g. |
@eteq Your suggestion sounds interesting, but I'm not sure I completely understand it. Is I like this, but it involves a deep copy. I could see an advantage of perhaps also having a set_mask() method. For example, to add an SNR threshold on top of some existing mask without copying data: But once you have a set_mask method, then you not might really need the |
Personally, I would lean to having the @camipacifici, as the sprint PO, do you have an opinion? |
The idea of this function is to make the life of the user simpler when dealing with masks in the context of An experienced user will surely be able to create their own masks just looking at the documentation. I am thinking more towards the less experienced users here. So, I still think that a |
After discussing with @eteq and @hcferguson, the decision is to keep this function "as is". It is simple but includes some necessary checks that will be of help to the non-expert user. |
Some small offline conversation about what mask means. Based on astropy ndata:
So I am going to change it my PR to fix/confirm my PR to follow this convention. |
LGTM now, thanks @brechmos-stsci ! |
There has been some discussion about starting to write and use specutils for ND datasets, where N > 1. This PR creates a higher level function that sets the mask on a spectrum based on a threshold of the S/N of the spectrum.
Tests were added for 1D and 3D data.