Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added in snr_thresholding with 1D and 3D tests #509

Merged
merged 25 commits into from
Oct 2, 2019

Conversation

brechmos-stsci
Copy link

There has been some discussion about starting to write and use specutils for ND datasets, where N > 1. This PR creates a higher level function that sets the mask on a spectrum based on a threshold of the S/N of the spectrum.

Tests were added for 1D and 3D data.

Copy link
Member

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising! A few high-level thoughts/questions though:

  1. Since this basically worked out-of-the-box, perhaps it's worth just trying if it works just as easily with SpectralCollection? You could use the exact same test but have spectral_axis be (4,3,10) (i.e., replicate the existing one 12 times and reshape), and then you can just test if they all match the existing case. If it doesn't work, this PR probably doesn't need to worry about fixing it, but if it does work all the better.
  2. I'm not sure I understand what makes this higherlevel? I think it's not that different from SNR which is in analysis. (Or it might be "manipulation" depending on the response to my next question)
  3. This function returns essentially a mask. I had been thinking it's more useful to return a shallow copy of the input, with the mask set to what is currently returned from this function. Then if the user really just wants the mask they can do result = snr_threshold(inspec, val).mask, but if they want a "fully-functional" spectrum with the mask already in it, they can take the return value directly. How does that sound?

@brechmos brechmos marked this pull request as ready for review September 9, 2019 19:13
@eteq eteq added this to the v0.7.0 milestone Sep 12, 2019
@brechmos
Copy link
Contributor

brechmos commented Sep 18, 2019

I added spectrum1d.flux_masked and spectrum1d.uncertainty_masked as properties in order to properly extract a masked flux or masked uncertainty.

I put the two *_masked in the mixin for Spectrum1D, only because .flux was in there.

@eteq
Copy link
Member

eteq commented Sep 19, 2019

In some out-of-band discussion with @brechmos-stsci , @camipacifici, and @nmearl, we realized that flux_masked/uncertainty_masked might be problematic because of the fact that NaN's are "infectious" (i.e., arithmetic on them tends to propagate the NaNs more than desired). So I think (although others can object if I mis-understood) that for this application it's OK to keep the mask in spectrum.mask, but we realized that we need to make sure that the mask is used and a reasonable way in "downstream" operations that the user would want to do after they do this SNR masking. I'll make a separate issue about that.

At the same time, there might still be an application for the NaN-masked versions, most notably that that might the easiest way to make matplotlib plots of cubes and the like. But there are still some questions there too so I'll make an issue for that.

In this PR, though, I think the decision was that flux_masked and uncertainty_masked are not desired, instead the recommendation is for the user to continue to carry the new spectrum around. But maybe you can pull them out as a draft PR @brechmos-stsci, as a way to provide an implementation example for the issue I'm going to create?

@eteq
Copy link
Member

eteq commented Sep 19, 2019

@brechmos-stsci - I was just looking at the diff here and it seems like there's some template matching stuff in here now. Was there perhaps an unintentional merge or something?

@brechmos-stsci
Copy link
Author

@eteq argh, let me ck

@brechmos-stsci
Copy link
Author

In this PR, though, I think the decision was that flux_masked and uncertainty_masked are not desired, instead the recommendation is for the user to continue to carry the new spectrum around. But maybe you can pull them out as a draft PR @brechmos-stsci, as a way to provide an implementation example for the issue I'm going to create?

Ok. So remove *_masked and put in a new PR (?). I'll have to update the docs, too, to show how to do this without *_masked (rather than the intuitive, but wrong, flux[mask].

Copy link
Member

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor implementation question, and also some doc items in brechmos-stsci#3 .

One other thing I realized, though: I'm not sure the sign of the mask is right: shouldn't it be masked if the threshod is less than the threshold? I.e., the "good" ones are the pixels that are above the threshold?

Moreover, I imagine there are use cases for both. Perhaps that implies this should be slightly generalized:

  1. Add a keyword that sets the type of comparison - e.g. </>/>=/<=/==/!=. These would all be interpreted as "set the mask to that" meaning "get rid of pixels that meet the condition".
  2. Add a keyword that's just "lessthan" which is either True or False, to limit the scope here to just those cases.
  3. Split this into two separate functions, one called snr_greater_than and snr_less_than. (They could use the existing function as a private shared implementation though, and just invert the mask at the end or similar.)

I tend to favor 3, because it's explicit and fairly readable, and I don't really see a strong need for anything other than the less than and greater than use cases. (Or more to the point, we should tell people to just set the mask on their own for anything more complicated.)

specutils/manipulation/manipulation.py Outdated Show resolved Hide resolved
@brechmos
Copy link
Contributor

I would rather do 1. We should be able to pass an operator or something and it should be clear enough for people how to use (if they want to do, for example, less-than).

@camipacifici
Copy link

Giving the two options snr_greater_than and snr_less_than seems unnecessary to me. For example, if you want the latter you can just take the [False] of the former. The only difference would be the "=" part, which makes it confusing.

I agree with @brechmos, although I do not really see the use for all those cases. I presume somebody else could and giving the option is a plus. Also, I would do "keep all the pixels that meet the condition", rather than "get rid of pixels that meet the condition".

If we give all the options, this has to be super clear in the documentation. Otherwise, we stick to > and if the user wants a more complicated mask, they can create it themselves.

@eteq
Copy link
Member

eteq commented Sep 23, 2019

Gotcha @camipacifici and @brechmos - I'm ok with 1, and also using the "keep" (vs get rid of) convention.

re:

although I do not really see the use for all those cases. I presume somebody else could and giving the option is a plus

Maybe we adjust this to be a "lite" version of 1 where we only implement > and <, but leave the option open for other operations if clear use cases appear? I personally only have use cases for those two, but this way we maintain flexibility.

@brechmos
Copy link
Contributor

brechmos commented Sep 24, 2019

@eteq See 6a6fe1e for what I am thinking of how it could be implemented. (@camipacifici too)

@brechmos-stsci
Copy link
Author

I was thinking about this a little further and the third op parameter might even be better if one could pass in a string (e.g., '>', '<', '>=', '<=') or the operator. Then if a string, it could be converted to an operator before the other checks.

@hcferguson
Copy link
Contributor

Lurking on this discussion, I'm starting to feel like even having an snr_threshold() function might not be a great idea. If this is just basically a one-liner for most use cases, would it be better to give people the one-liner in documentation and tutorials rather than having a function that has to take a bunch of options -- all because we packaged the mask together with the spectrum and uncertainties (for convenience), and are trying to hide that level of complexity from the user. When hiding the complexity introduces more complexity, maybe we should just not try so hard to hide it?

The line that does all the work in snr_threshold() is

mask = op((data / (spectrum.uncertainty.array*spectrum.uncertainty.unit)), value)

But then there are lots of lots of tests and there's extensive documentation. So I'm worried this is overkill for this particular operation.

@eteq
Copy link
Member

eteq commented Sep 25, 2019

@brechmos-stsci - I like the operator approach, especially with the string-to-operator mapping as an option. I didn't know about the operator module, so that's neat!

@hcferguson - I can see your point here, but I'm concerned about some of the complexities that came out in #516. That is, I think the "simplest" version of this function as a documentation example would be this:

from copy import copy
new_spectrum = copy(spectrum)
new_spectrum.mask = spectrum.flux/uncertainty.quantity < threshold

so we could have that be in the docs instead, but that has the disadvantages that 1) it's three lines of code instead of 1, and 2) that way of doing SNR is wrong if the uncertainty is not StdDev (see #523), so by not wrapping it in a function, people will start using it in their science code in a forward-incompatible way.

That said, a third way presents itself now that I think about it: we could change this function to be a method on Spectrum1D (or maybe the spectrum mixin?) along the lines of with_new_mask, which then would be called as new_spectrum = spectrum.with_new_mask(spectrum.flux/uncertainty.quantity < threshold) - we can keep this PR mostly as-is in terms of the tests and docs since we do want these tests in there, but with that invokation instead of a new function. Then when we solve #523 we'd change it to new_spectrum = spectrum.with_new_mask(pixel_snr(spectrum) < threshold) or whatever.

@nmearl
Copy link
Contributor

nmearl commented Sep 25, 2019

That said, a third way presents itself now that I think about it: we could change this function to be a method on Spectrum1D (or maybe the spectrum mixin?) along the lines of with_new_mask

I am totally with @hcferguson on this one -- this PR is becoming a bit over engineered with trying to encapsulate all use cases inside a single function call. But the solution @eteq mentioned above, I think, is excellent. It lets the user have much more freedom (the current implementation in this PR doesn't handle the case of a chain of operations, e.g. (spectrum.flux/spectrum.uncertainty) < threshold) & (spectrum.flux / spectrum.uncertainty > 0)).

@hcferguson
Copy link
Contributor

@eteq Your suggestion sounds interesting, but I'm not sure I completely understand it.

Is new_spectrum = spectrum.with_new_mask(pixel_snr(spectrum) < threshold) something we would give the user as an example in the docs? Is new_spectrum a deep copy (i.e. if you go back and modify spectrum, is new_spectrum going to be affected)? As a user, I would expect it not to be affected.

I like this, but it involves a deep copy. I could see an advantage of perhaps also having a set_mask() method. For example, to add an SNR threshold on top of some existing mask without copying data:
spectrum.set_mask(spectrum.mask | (pixel_snr(spectrum) < threshold))

But once you have a set_mask method, then you not might really need the with_new_mask() method, since you can just make a copy and then use set_mask().

@brechmos-stsci
Copy link
Author

Personally, I would lean to having the snr_threshold() method, I agree it is simple, but gives a framework for more complex methods too.

@camipacifici, as the sprint PO, do you have an opinion?

@camipacifici
Copy link

camipacifici commented Sep 25, 2019

The idea of this function is to make the life of the user simpler when dealing with masks in the context of Spectrum1D objects and making sure that the masked object can be safely used by other functions (e.g. continuum fitting, line finder, etc). Also, I found that simply applying a mask to a Spectrum1D object with multiple dimensions does not necessarily return the expected masked object, but the shape changes, so this can easily trick a user. Lastly, Spectrum1D.uncertainty can be confusing and a check that the right uncertainty is used to calculate the signal-to-noise ratio is definitely a plus.

An experienced user will surely be able to create their own masks just looking at the documentation. I am thinking more towards the less experienced users here.

So, I still think that a snr_threshold() function is useful and beneficial to a wide range of users. If you think the scope is too narrow, I guess this can be extended to any other property (e.g. asking for flux > xx, but I do not have a specific science case for this now) to create properly masked Spectrum1D objects.

specutils/manipulation/manipulation.py Outdated Show resolved Hide resolved
specutils/manipulation/manipulation.py Outdated Show resolved Hide resolved
@camipacifici
Copy link

After discussing with @eteq and @hcferguson, the decision is to keep this function "as is". It is simple but includes some necessary checks that will be of help to the non-expert user.
The documentation will include examples on the lines of what @hcferguson suggested for the more experienced users who need more than simple S/N masks.

@brechmos
Copy link
Contributor

Some small offline conversation about what mask means. Based on astropy ndata:

Masks should follow the numpy convention that valid data points are marked by False and invalid ones with True.

So I am going to change it my PR to fix/confirm my PR to follow this convention.

@eteq
Copy link
Member

eteq commented Oct 1, 2019

One more to-do item for after this PR is merged (hopefully pending the small change @brechmos noted above): I will make a small follow-on PR that updates the narrative docs for this PR to help with the decision-making in #518

@eteq
Copy link
Member

eteq commented Oct 2, 2019

LGTM now, thanks @brechmos-stsci !

@eteq eteq merged commit 53a00de into astropy:master Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants