Added in snr_thresholding with 1D and 3D tests #509

brechmos-stsci · 2019-09-06T14:23:25Z

There has been some discussion about starting to write and use specutils for ND datasets, where N > 1. This PR creates a higher level function that sets the mask on a spectrum based on a threshold of the S/N of the spectrum.

Tests were added for 1D and 3D data.

eteq

Looks promising! A few high-level thoughts/questions though:

Since this basically worked out-of-the-box, perhaps it's worth just trying if it works just as easily with SpectralCollection? You could use the exact same test but have spectral_axis be (4,3,10) (i.e., replicate the existing one 12 times and reshape), and then you can just test if they all match the existing case. If it doesn't work, this PR probably doesn't need to worry about fixing it, but if it does work all the better.
I'm not sure I understand what makes this higherlevel? I think it's not that different from SNR which is in analysis. (Or it might be "manipulation" depending on the response to my next question)
This function returns essentially a mask. I had been thinking it's more useful to return a shallow copy of the input, with the mask set to what is currently returned from this function. Then if the user really just wants the mask they can do result = snr_threshold(inspec, val).mask, but if they want a "fully-functional" spectrum with the mask already in it, they can take the return value directly. How does that sound?

brechmos · 2019-09-18T18:01:16Z

I added spectrum1d.flux_masked and spectrum1d.uncertainty_masked as properties in order to properly extract a masked flux or masked uncertainty.

I put the two *_masked in the mixin for Spectrum1D, only because .flux was in there.

eteq · 2019-09-19T15:12:37Z

In some out-of-band discussion with @brechmos-stsci , @camipacifici, and @nmearl, we realized that flux_masked/uncertainty_masked might be problematic because of the fact that NaN's are "infectious" (i.e., arithmetic on them tends to propagate the NaNs more than desired). So I think (although others can object if I mis-understood) that for this application it's OK to keep the mask in spectrum.mask, but we realized that we need to make sure that the mask is used and a reasonable way in "downstream" operations that the user would want to do after they do this SNR masking. I'll make a separate issue about that.

At the same time, there might still be an application for the NaN-masked versions, most notably that that might the easiest way to make matplotlib plots of cubes and the like. But there are still some questions there too so I'll make an issue for that.

In this PR, though, I think the decision was that flux_masked and uncertainty_masked are not desired, instead the recommendation is for the user to continue to carry the new spectrum around. But maybe you can pull them out as a draft PR @brechmos-stsci, as a way to provide an implementation example for the issue I'm going to create?

eteq · 2019-09-19T20:20:39Z

@brechmos-stsci - I was just looking at the diff here and it seems like there's some template matching stuff in here now. Was there perhaps an unintentional merge or something?

brechmos-stsci · 2019-09-20T16:03:10Z

@eteq argh, let me ck

…old-nd

brechmos-stsci · 2019-09-20T16:38:21Z

In this PR, though, I think the decision was that flux_masked and uncertainty_masked are not desired, instead the recommendation is for the user to continue to carry the new spectrum around. But maybe you can pull them out as a draft PR @brechmos-stsci, as a way to provide an implementation example for the issue I'm going to create?

Ok. So remove *_masked and put in a new PR (?). I'll have to update the docs, too, to show how to do this without *_masked (rather than the intuitive, but wrong, flux[mask].

eteq

One minor implementation question, and also some doc items in brechmos-stsci#3 .

One other thing I realized, though: I'm not sure the sign of the mask is right: shouldn't it be masked if the threshod is less than the threshold? I.e., the "good" ones are the pixels that are above the threshold?

Moreover, I imagine there are use cases for both. Perhaps that implies this should be slightly generalized:

Add a keyword that sets the type of comparison - e.g. </>/>=/<=/==/!=. These would all be interpreted as "set the mask to that" meaning "get rid of pixels that meet the condition".
Add a keyword that's just "lessthan" which is either True or False, to limit the scope here to just those cases.
Split this into two separate functions, one called snr_greater_than and snr_less_than. (They could use the existing function as a private shared implementation though, and just invert the mask at the end or similar.)

I tend to favor 3, because it's explicit and fairly readable, and I don't really see a strong need for anything other than the less than and greater than use cases. (Or more to the point, we should tell people to just set the mask on their own for anything more complicated.)

specutils/manipulation/manipulation.py

brechmos · 2019-09-23T19:53:53Z

I would rather do 1. We should be able to pass an operator or something and it should be clear enough for people how to use (if they want to do, for example, less-than).

camipacifici · 2019-09-23T20:03:04Z

Giving the two options snr_greater_than and snr_less_than seems unnecessary to me. For example, if you want the latter you can just take the [False] of the former. The only difference would be the "=" part, which makes it confusing.

I agree with @brechmos, although I do not really see the use for all those cases. I presume somebody else could and giving the option is a plus. Also, I would do "keep all the pixels that meet the condition", rather than "get rid of pixels that meet the condition".

If we give all the options, this has to be super clear in the documentation. Otherwise, we stick to > and if the user wants a more complicated mask, they can create it themselves.

eteq · 2019-09-23T20:26:59Z

Gotcha @camipacifici and @brechmos - I'm ok with 1, and also using the "keep" (vs get rid of) convention.

re:

although I do not really see the use for all those cases. I presume somebody else could and giving the option is a plus

Maybe we adjust this to be a "lite" version of 1 where we only implement > and <, but leave the option open for other operations if clear use cases appear? I personally only have use cases for those two, but this way we maintain flexibility.

brechmos · 2019-09-24T16:52:55Z

@eteq See 6a6fe1e for what I am thinking of how it could be implemented. (@camipacifici too)

brechmos-stsci · 2019-09-25T14:57:51Z

I was thinking about this a little further and the third op parameter might even be better if one could pass in a string (e.g., '>', '<', '>=', '<=') or the operator. Then if a string, it could be converted to an operator before the other checks.

hcferguson · 2019-09-25T15:40:38Z

Lurking on this discussion, I'm starting to feel like even having an snr_threshold() function might not be a great idea. If this is just basically a one-liner for most use cases, would it be better to give people the one-liner in documentation and tutorials rather than having a function that has to take a bunch of options -- all because we packaged the mask together with the spectrum and uncertainties (for convenience), and are trying to hide that level of complexity from the user. When hiding the complexity introduces more complexity, maybe we should just not try so hard to hide it?

The line that does all the work in snr_threshold() is

mask = op((data / (spectrum.uncertainty.array*spectrum.uncertainty.unit)), value)

But then there are lots of lots of tests and there's extensive documentation. So I'm worried this is overkill for this particular operation.

eteq · 2019-09-25T18:43:51Z

@brechmos-stsci - I like the operator approach, especially with the string-to-operator mapping as an option. I didn't know about the operator module, so that's neat!

@hcferguson - I can see your point here, but I'm concerned about some of the complexities that came out in #516. That is, I think the "simplest" version of this function as a documentation example would be this:

from copy import copy
new_spectrum = copy(spectrum)
new_spectrum.mask = spectrum.flux/uncertainty.quantity < threshold

so we could have that be in the docs instead, but that has the disadvantages that 1) it's three lines of code instead of 1, and 2) that way of doing SNR is wrong if the uncertainty is not StdDev (see #523), so by not wrapping it in a function, people will start using it in their science code in a forward-incompatible way.

That said, a third way presents itself now that I think about it: we could change this function to be a method on Spectrum1D (or maybe the spectrum mixin?) along the lines of with_new_mask, which then would be called as new_spectrum = spectrum.with_new_mask(spectrum.flux/uncertainty.quantity < threshold) - we can keep this PR mostly as-is in terms of the tests and docs since we do want these tests in there, but with that invokation instead of a new function. Then when we solve #523 we'd change it to new_spectrum = spectrum.with_new_mask(pixel_snr(spectrum) < threshold) or whatever.

nmearl · 2019-09-25T18:51:16Z

That said, a third way presents itself now that I think about it: we could change this function to be a method on Spectrum1D (or maybe the spectrum mixin?) along the lines of with_new_mask

I am totally with @hcferguson on this one -- this PR is becoming a bit over engineered with trying to encapsulate all use cases inside a single function call. But the solution @eteq mentioned above, I think, is excellent. It lets the user have much more freedom (the current implementation in this PR doesn't handle the case of a chain of operations, e.g. (spectrum.flux/spectrum.uncertainty) < threshold) & (spectrum.flux / spectrum.uncertainty > 0)).

hcferguson · 2019-09-25T19:03:56Z

@eteq Your suggestion sounds interesting, but I'm not sure I completely understand it.

Is new_spectrum = spectrum.with_new_mask(pixel_snr(spectrum) < threshold) something we would give the user as an example in the docs? Is new_spectrum a deep copy (i.e. if you go back and modify spectrum, is new_spectrum going to be affected)? As a user, I would expect it not to be affected.

I like this, but it involves a deep copy. I could see an advantage of perhaps also having a set_mask() method. For example, to add an SNR threshold on top of some existing mask without copying data:
spectrum.set_mask(spectrum.mask | (pixel_snr(spectrum) < threshold))

But once you have a set_mask method, then you not might really need the with_new_mask() method, since you can just make a copy and then use set_mask().

brechmos-stsci · 2019-09-25T19:07:52Z

Personally, I would lean to having the snr_threshold() method, I agree it is simple, but gives a framework for more complex methods too.

@camipacifici, as the sprint PO, do you have an opinion?

camipacifici · 2019-09-25T20:28:46Z

The idea of this function is to make the life of the user simpler when dealing with masks in the context of Spectrum1D objects and making sure that the masked object can be safely used by other functions (e.g. continuum fitting, line finder, etc). Also, I found that simply applying a mask to a Spectrum1D object with multiple dimensions does not necessarily return the expected masked object, but the shape changes, so this can easily trick a user. Lastly, Spectrum1D.uncertainty can be confusing and a check that the right uncertainty is used to calculate the signal-to-noise ratio is definitely a plus.

An experienced user will surely be able to create their own masks just looking at the documentation. I am thinking more towards the less experienced users here.

So, I still think that a snr_threshold() function is useful and beneficial to a wide range of users. If you think the scope is too narrow, I guess this can be extended to any other property (e.g. asking for flux > xx, but I do not have a specific science case for this now) to create properly masked Spectrum1D objects.

specutils/manipulation/manipulation.py

camipacifici · 2019-09-27T16:01:23Z

After discussing with @eteq and @hcferguson, the decision is to keep this function "as is". It is simple but includes some necessary checks that will be of help to the non-expert user.
The documentation will include examples on the lines of what @hcferguson suggested for the more experienced users who need more than simple S/N masks.

brechmos · 2019-09-30T19:46:07Z

Some small offline conversation about what mask means. Based on astropy ndata:

Masks should follow the numpy convention that valid data points are marked by False and invalid ones with True.

So I am going to change it my PR to fix/confirm my PR to follow this convention.

eteq · 2019-10-01T03:32:01Z

One more to-do item for after this PR is merged (hopefully pending the small change @brechmos noted above): I will make a small follow-on PR that updates the narrative docs for this PR to help with the decision-making in #518

eteq · 2019-10-02T13:44:00Z

LGTM now, thanks @brechmos-stsci !

added in snr_thresholding with 1D and 3D tests

28ea2c9

brechmos-stsci requested a review from eteq September 6, 2019 14:23

eteq reviewed Sep 6, 2019

View reviewed changes

brechmos added 2 commits September 6, 2019 14:51

added in for NDData object

931877b

moved snr_threshold, changed to allow SpectrumCollection

359b6a6

brechmos marked this pull request as ready for review September 9, 2019 19:13

brechmos and others added 7 commits September 9, 2019 15:13

fixed missing import

46410f0

Merge branch 'master' into snr-threshold-nd

f150946

Fixed up documentation, renamed test file.

aec039b

added documentation

c5322a6

fixed code example

7ef3618

added doctest skip

705b106

fixed wavelength setup

900baa2

eteq added this to the v0.7.0 milestone Sep 12, 2019

brechmos added 3 commits September 12, 2019 13:51

fixed ref

bbfa33f

fixed up docs

c44cbb1

squash commits

536f1d7

brechmos-stsci force-pushed the snr-threshold-nd branch from 2d7e46c to 536f1d7 Compare September 18, 2019 16:56

eteq mentioned this pull request Sep 19, 2019

Ensure that existing manipulation and analysis functions use the mask where appropriate. #516

Open

eteq mentioned this pull request Sep 19, 2019

Decide if there are use cases for flux_nanfilled and if so implement it #518

Open

Merge branch 'master' of github.com:astropy/specutils into snr-thresh…

78691cb

…old-nd

brechmos added 2 commits September 20, 2019 13:19

removed _masked properties

1d81267

updated docs

661249c

eteq mentioned this pull request Sep 23, 2019

doc improvements brechmos-stsci/specutils#3

Open

eteq requested changes Sep 23, 2019

View reviewed changes

specutils/manipulation/manipulation.py Outdated Show resolved Hide resolved

eteq mentioned this pull request Sep 23, 2019

Have a consistent concept/shared implementation of signal-to-noise #523

Open

Removed old test, added in operator parameter

6a6fe1e

doc fix

2a07663

nmearl reviewed Sep 26, 2019

View reviewed changes

specutils/manipulation/manipulation.py Outdated Show resolved Hide resolved

specutils/manipulation/manipulation.py Outdated Show resolved Hide resolved

brechmos added 4 commits September 30, 2019 08:52

finished adding str operator, tests, sp fixes

2c2924b

pep8

63f10a7

pep8

0242c87

changed array*unit to quantity

6a48c16

brechmos added 3 commits October 1, 2019 09:19

fixed mask polarity and corresponding tests

dc6f009

fixed pep8 doc issue

09af3c4

Fixed the polarity information for the mask.

10bffa8

eteq approved these changes Oct 2, 2019

View reviewed changes

eteq merged commit 53a00de into astropy:master Oct 2, 2019

eteq mentioned this pull request Feb 14, 2020

Why is mask not applied to Spectrum1D when present? #585

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added in snr_thresholding with 1D and 3D tests #509

Added in snr_thresholding with 1D and 3D tests #509

brechmos-stsci commented Sep 6, 2019

eteq left a comment

brechmos commented Sep 18, 2019 •

edited

Loading

eteq commented Sep 19, 2019

eteq commented Sep 19, 2019

brechmos-stsci commented Sep 20, 2019

brechmos-stsci commented Sep 20, 2019

eteq left a comment

brechmos commented Sep 23, 2019

camipacifici commented Sep 23, 2019

eteq commented Sep 23, 2019

brechmos commented Sep 24, 2019 •

edited by brechmos-stsci

Loading

brechmos-stsci commented Sep 25, 2019

hcferguson commented Sep 25, 2019

eteq commented Sep 25, 2019

nmearl commented Sep 25, 2019

hcferguson commented Sep 25, 2019

brechmos-stsci commented Sep 25, 2019

camipacifici commented Sep 25, 2019 •

edited

Loading

camipacifici commented Sep 27, 2019

brechmos commented Sep 30, 2019

eteq commented Oct 1, 2019

eteq commented Oct 2, 2019

Added in snr_thresholding with 1D and 3D tests #509

Added in snr_thresholding with 1D and 3D tests #509

Conversation

brechmos-stsci commented Sep 6, 2019

eteq left a comment

Choose a reason for hiding this comment

brechmos commented Sep 18, 2019 • edited Loading

eteq commented Sep 19, 2019

eteq commented Sep 19, 2019

brechmos-stsci commented Sep 20, 2019

brechmos-stsci commented Sep 20, 2019

eteq left a comment

Choose a reason for hiding this comment

brechmos commented Sep 23, 2019

camipacifici commented Sep 23, 2019

eteq commented Sep 23, 2019

brechmos commented Sep 24, 2019 • edited by brechmos-stsci Loading

brechmos-stsci commented Sep 25, 2019

hcferguson commented Sep 25, 2019

eteq commented Sep 25, 2019

nmearl commented Sep 25, 2019

hcferguson commented Sep 25, 2019

brechmos-stsci commented Sep 25, 2019

camipacifici commented Sep 25, 2019 • edited Loading

camipacifici commented Sep 27, 2019

brechmos commented Sep 30, 2019

eteq commented Oct 1, 2019

eteq commented Oct 2, 2019

brechmos commented Sep 18, 2019 •

edited

Loading

brechmos commented Sep 24, 2019 •

edited by brechmos-stsci

Loading

camipacifici commented Sep 25, 2019 •

edited

Loading