Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support spectra & noise propagation #32

Closed
wants to merge 4 commits into from

Conversation

jehturner
Copy link
Member

This is the same PR as cmccully/lacosmicx#1, rebased on astroscrappy, so we can close that old one. It doesn't yet implement the intended behaviour summarized in that PR, so do not merge yet.

@jehturner
Copy link
Member Author

Regarding this:

  • You'd like to get rid of pssl and my bgsub and replace both of them with bkg, which can accept either a fixed background level (like pssl) or an array of model values to be subtracted (like bgsub, mainly for spectroscopy).

There's actually no reason to support subtracting a constant background internally, is there? You were probably just proposing to get rid of pssl altogether for imaging and require the user to supply indat with its background still included, unless also passing a noise image.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 110f162 on jehturner:spectroscopy into acaa02e on astropy:master.

@coveralls
Copy link

coveralls commented Sep 23, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling d628fc8 on jehturner:spectroscopy into 750cc76 on astropy:master.

@jehturner
Copy link
Member Author

Does this look like what you had in mind, @cmccully? I have not added a noise argument yet, but this removes the pssl argument -- instead expecting indat to have its sky background included -- and renames my bgsub to bkg, an optional array that gets subtracted temporarily from the data during detection (I don't see an obvious reason to subtract a constant for imaging but one could).

@jehturner
Copy link
Member Author

For completeness, I've had a go at adding a var parameter that accepts a user-supplied variance array, though perhaps it should go its own PR. Anyway, on reflection, it may be less useful than I had been thinking, given the following considerations:

  1. If bkg contains all the signal that has been subtracted prior to running astroscrappy then there isn't really any additional information about the noise in var.

  2. Like the main data array, I think var needs its own median filtering and cleaning, to determine what the noise would be in the absence of CRs (which is obviously a bit more processing).

  3. If (unlike the main data array) var is filtered without subtracting & restoring bkg, I think that can lead to less accurate noise structure by blurring out fine background details (such as sky lines or IFU fibres).

  4. In the case I've tested so far, there was very little difference in the cleaned result between using var or just indat, with only ~0.01% of pixels having significant differences. Those cleaned pixels appear similar but a little noisier when using var than they do with the original noise algorithm.

@jehturner
Copy link
Member Author

I'll push what I did so you can see it. We can always revert that commit if needed...

I'm not returning a noise image here, but I suppose cleanvar could be added to the return tuple? I tend to re-clean my data after running LA Cosmic anyway, since I find it does a better job of identifying the CRs than removing them (eg. compared with local interpolation), so crmask seems like the most useful return value.

Copy link
Contributor

@cmccully cmccully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall, this looks really helpful. I made a few minor comments about the implementation. I'm going to merge #41 before this so you can rebase to that if you would like.

Input data array that will be used for cosmic ray detection. This
should include the sky background (or a mean background level, added
back in after sky subtraction), so that noise can be estimated
correctly from the data values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say unless the var array is also provided.


var : float numpy array, optional
A pre-determined estimate of the data variance (ie. noise squared) in
each pixel, generated by previous processing of ``indat``. If provided,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might remove "generated by previous processing of indat".

each pixel, generated by previous processing of ``indat``. If provided,
this is used in place of an internal noise model based on ``indat``,
``gain`` and ``readnoise``. This still gets median filtered and cleaned
internally, to estimate what the noise in each pixel *would* be in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the median filtering done in place? We probably shouldn't. We should probably make a copy of the array if we are not already.

internally, to estimate what the noise in each pixel *would* be in the
absence of cosmic rays, but without removing ``bkg`` temporarily (a
difference that could lead to less accurate results than the default
noise model close to fine structure). This argument should be provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it just make sense to take the masked median of the variance array? Or just replace the masked regions with the median filter of the var array? This way you wouldn't have to worry about the bkg stuff.

noise model close to fine structure). This argument should be provided
for correct results if ``bkg`` does not include all signal that has
previously been subtracted from ``indat`` (other than electronic bias).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to note that var is expected to be in adu like bkg and indat (I really should have named that variable better).

if var is not None:
goodvar = np.empty_like(gooddata, order='c')
goodvar[:] = var[np.logical_not(mask)]
var_level = median(goodvar, len(goodvar))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want an else var_level = background_level?

# order to estimate the noise accurately (first saving a copy if
# using the same median to replace bad values):
if cleantype == 'median':
m5_nobkg = m5 if bkg is None else m5.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just replace this by working on the cleanvar below? See comment below.


if cleantype != 'median':
if var is None:
noise = np.sqrt(m5 + readnoise * readnoise)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using m5 in the noise, why don't you use the 5x5 median of cleanvar. Basically calculate one m5 for the cleanarr for the cr detection and one m5 for the noise calculation. Then you don't have to worry about adding back in bkg etc.

del m5
cleanarr[crinds] = m5_nobkg[crinds]
del m5_nobkg
if var is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, I think you can have a noise and just median of the cleanarr which would simplify this whole block considerably.

# Subtract the input sky model, if applicable.
if bkg is not None:
cleanarr -= bkg

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should happen much later in the function. The background needs to be included when you set cleanvar = cleanarr at line 221.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The background for the noise estimate gets restored again at L303, which needs to happen after the median filtering; see recent discussion at jehturner#1 for further explanation.

@jehturner
Copy link
Member Author

Rebased as requested.

@jehturner
Copy link
Member Author

Not sure what is going on with the first, Python 2.7 build test? It seems to install Python 3.7 with Conda and then fail, but I don't think it's relevant to this PR.

@jehturner
Copy link
Member Author

Before answering your detailed suggestions (thanks), I'd like to point out that I was leaning towards reverting my last commit (see comment of 16 April). Input variance was something we discussed on Slack that seemed like a good idea at the time, but once I actually implemented var, I realized that it doesn't work as well as the original LACosmic noise estimate, for 2 reasons: First, var has no equivalent of the bkg image for indat, so continuum/sky structure gets left in the variance when filtering for cosmic rays, producing noise estimates that are less, rather than more, accurate. Second, bkg should help ensure that all the signal gets accounted for anyway when producing LACosmic's usual noise estimate, because it includes any "missing" flux that a separate variance array would otherwise tell you about. So I would probably just get rid of var to avoid overcomplicating things, but obviously if you want to keep it then we can. Admittedly, I'm mostly interested in spectroscopy, where bkg is going to be essential, whereas for imaging bkg is more likely to be a scalar and then you might get slightly more accurate noise structure using a var image as well (off the top of my head). Does that make sense?

@jehturner
Copy link
Member Author

In our meeting earlier this week, @cmccully mentioned that he's keen to retain variance input as an option because he has a use case (I think for echelle data) where prior processing produces a non-trivial correspondence betweeen indat & var. I've therefore had another look at what's needed to address the limitations I mentioned above on 16 April 2018, but I've had quite a hard time remembering what I was doing here after 3 years...

The good news is that I think the code here already does most of the duplicate processing necessary to use input variance accurately and is "only" missing subtraction/restoration of the background structure (as for indat).

The problem is simply how to go about subtracting the background from the variance. Unless the only processing done prior to cosmic ray removal is bias subtraction and maybe stacking (in which case you don't really need var, because the original algorithm can derive the same thing from indat), the levels of background structure in var and indat may differ, even if just by a scaling factor or similar. AstroScrappy doesn't know anything about that, so the caller would be required to provide, say, bkg and varbkg separately, which is already getting a bit messy. Moreover, at present the run time for spectroscopy is already dominated by background fitting rather than by detect_cosmics itself, so if the calling function (equivalent to lacos_spec.cl) has to fit the sky+continuum for both indat & var separately, that's quite a large overhead. Alternatively, the caller might know enough about previous processing to convert bkg into varbkg directly, but that might sacrifice generality and/or produce a convoluted API -- I think it would be good to understand your use case a bit better here. And if the caller doesn't get this exactly right, things are liable to go (at least slightly) wrong without it being obvious.

So what are your thoughts on how exactly var would be used? I think I'm fairly clear on how detect_cosmics would need to be modified if we know how we want it to look.

@cmccully
Copy link
Contributor

Can you rebase this into master @jehturner ?

@jehturner
Copy link
Member Author

Rebase from master? OK.

…rappy's predecessor, lacosmicx (producing identical results in a quick test).
…d background internally instead of taking pre-subtracted input, as discussed with cmucully on the astropyspectroscopy slack channel in April.
…noise estimates, in place of the internal noise model. (May want some tweaking, eg. to protect bkg structure for noise estimates too??)
@jehturner
Copy link
Member Author

jehturner commented Aug 4, 2020

That was suspiciously easy... Anyway, the diff is just 1 file now.

@codecov
Copy link

codecov bot commented Aug 4, 2020

Codecov Report

Merging #32 into master will decrease coverage by 1.93%.
The diff coverage is 47.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #32      +/-   ##
==========================================
- Coverage   94.88%   92.94%   -1.94%     
==========================================
  Files           7        7              
  Lines        1016     1049      +33     
  Branches       53       53              
==========================================
+ Hits          964      975      +11     
- Misses         52       74      +22     
Impacted Files Coverage Δ
astroscrappy/astroscrappy.pyx 69.29% <47.50%> (-5.83%) ⬇️
astroscrappy/utils/medutils.c 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd4acfe...0d45f31. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants