Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sigma clipping is painfully slow #624

Open
mwcraig opened this issue Jun 4, 2018 · 11 comments
Open

Sigma clipping is painfully slow #624

mwcraig opened this issue Jun 4, 2018 · 11 comments

Comments

@mwcraig
Copy link
Member

mwcraig commented Jun 4, 2018

I don't know what the underlying performance issue is, but wow it is slow (clipping a stack of 25 4k x 4k images takes minutes).

@mwcraig mwcraig added this to the 2.0 milestone Jun 4, 2018
@mwcraig
Copy link
Member Author

mwcraig commented Jun 4, 2018

About 4.5 minutes, but it feels longer..

@MSeifert04
Copy link
Contributor

MSeifert04 commented Jun 4, 2018

I also noticed that it's very slow. Thanks for bringing this up (I always forgot).

Could you provide a random-dataset for profiling? I could be interesting depending on the data-types and the strides of the dataset (the actual data is probably not important except it contains lots of unusual values like NaNs and so forth.

Also because we're re-using astropys sigma-clipping it would be handy to know what astropy version you were using.

One additional thing that could be a problem is RAM. Depending on the RAM and size of the arrays there may be lots of swapping going on.

Sorry for asking so much information from you but without having the specifics it's probably hard to profile it (correctly).

@bsipocz
Copy link
Member

bsipocz commented Jun 4, 2018

This really brings up the issue that we need to add more benchmarks to astropy-benchmarks, currently there is nothing for the stats package. While the 4.5mins long test is not feasible, if it's not a local RAM issue I suspect it can be scaled back to be useful but still slow?

@mwcraig
Copy link
Member Author

mwcraig commented Jun 4, 2018

Sure; I can also provide the data set I was working with if you want. It is just a bunch of dark frames. It is conceivable the bottle neck is in the combining, not the clipping, I suppose.

@SaOgaz
Copy link

SaOgaz commented Jun 4, 2018

Probably related, @larrybradley is working on speeding up the astropy one: astropy/astropy#7478

@mwcraig
Copy link
Member Author

mwcraig commented Jun 4, 2018

Very related, thanks for mentioning that, @SaOgaz. And, of course, thanks to @larrybradley for working on it!

@mwcraig
Copy link
Member Author

mwcraig commented Jun 10, 2018

I haven't had a time to do much profiling, but here is an example of a set of calls that takes several minutes to run (for the combine part). Adding in some outlying pixels seems to help drive up the run time ( about 30 secs faster without that).

There are a number of things here that can lead to a speedup, I think. Even without sigma clipping, it takes close to a minute to combine the images.

Away from keyboard for a week, but will take another look when I'm back...

import numpy as np
from astropy.io import fits
from ccdproc import combine
from astropy.stats import median_absolute_deviation

size = [4096, 4109]
n_images = 25

base_name = 'test-combine-{num:03d}.fits'

for num in range(n_images):
    data = np.random.normal(size=size)
    # Now add some outlying pixels so there is something to clip
    n_bad = 50000
    bad_x = np.random.randint(0, high=size[0] - 1, size=n_bad)
    bad_y = np.random.randint(0, high=size[1] - 1, size=n_bad)
    data[bad_x, bad_y] = np.random.choice([-1, 1], size=n_bad) * (10 + np.random.rand(n_bad))
    hdu = fits.PrimaryHDU(data=data)
    hdu.header['for_prof'] = 'yes'
    hdu.header['bunit'] = 'adu'
    hdu.writeto(base_name.format(num=num), overwrite=True)

ic = ImageFileCollection('.', '*')
fils = ic.files_filtered(for_prof='yes')
combine(fils, output_file='foo12.fit', sigma_clip=True, sigma_clip_low_thresh=5, sigma_clip_high_thresh=5, sigma_clip_func=np.ma.median, sigma_clip_dev_func=median_absolute_deviation)

Edit 2018-06-23: move misplaced parentheses in hdu.writeto

@larrybradley
Copy link
Member

larrybradley commented Jun 15, 2018

@mwcraig For your sigma-clipping stdfunc (sigma_clip_dev_func) you should probably use mad_std (http://docs.astropy.org/en/stable/api/astropy.stats.mad_std.html) instead of median_absolute_deviation. mad_std is a robust standard deviation estimator. For reference, see e.g. https://en.wikipedia.org/wiki/Median_absolute_deviation#Relation_to_standard_deviation.
Just an FYI -- it has no effect on performance. 😄

@larrybradley
Copy link
Member

One other note:
If you want to clean up the combine API a bit, you could use the SigmaClip object instead of having a bunch of keywords to control the sigma clipping. That's exactly the use case for why I added the SigmaClip class.

i.e. instead of this:

combine(fils, output_file='foo12.fit', sigma_clip=True, sigma_clip_low_thresh=5, sigma_clip_high_thresh=5, sigma_clip_func=np.ma.median, sigma_clip_dev_func=median_absolute_deviation)

you could do this:

sigclip = SigmaClip(sigma_lower=5, sigma_upper=5, cenfunc=np.ma.median, stdfunc=mad_std)
combine(fils, output_file='foo12.fit', sigma_clip=sigclip)

where sigma_clip=None would mean do not perform sigma clipping.

If interested, I could put in a PR.

@MSeifert04
Copy link
Contributor

One thing that I noticed during my tests is that we actually do most operations on the stacked arrays in Combiner along the first axis. For C contiguous arrays that's actually the worst axis to do operations on because elements along the first axis have the biggest "step" (in memory) between them. However stacking along the first axis is something that is faster than stacking along the last axis (which would be the fastest axis to "reduce" along).

I'm not sure if that matters much because we stack and reduce, however from the number of operations the last axis would be better in cases when we do clipping:

  • We stack once
  • We reduce once for the baseline, once for the deviation, then we reduce once for the "combined" value, once for the "combined" mask and once for the "combined" uncertainty.

However processors are typically very clever what to pre-fetch so it might not be too bad that we reduce along the "worst" (memory-layout-wise) axis.

Then I played a bit with possible optimizations, however it's really tricky to make optimizations because we have to "assume" things, for example it's easily possible with a numba function (or C extension) to speed up the clipping by a factor of 3-20 (depending on the shape of the stack), however that's assuming that the stack is 3D, that the clip-functions are either (np.ma.mean and np.ma.std) or (np.ma.median and astropy.stats.median_absolute_deviation). And do we really want to include numba (a heavy dependency if one doesn't have numba) or include code to compile (particular problematic for windows users that don't use anaconda) just for some particular combinations, even though they are probably the most common ones?

@mwcraig
Copy link
Member Author

mwcraig commented Aug 9, 2018

I've a had chance to do a bit of profiling; the notebook I used to generate the profiles is in this gist.

The notebook combines 25 images, each roughly 2k x 2k, combined by averaging, sigma clipped before combining.

The upshot is that numpy masked median is really, really, slow. Two thirds of the time in the combination is spent in sigma clip; each of the three medians it does take roughly 7 seconds. Of that, a surprising amount of time is spent in numpy MaskedArray's __getitem__ (10 seconds).

A screenshot from SnakeViz is below. Following @larrybradley's approach of using the nan... speeds things up some, and bottleneck does even better.

profile_fun_mamedian_dev_mask_med_abs_dev

@crawfordsm crawfordsm modified the milestones: 2.0, 2.1 Jul 25, 2019
@mwcraig mwcraig modified the milestones: 2.1.0, 2.1.1 Dec 24, 2019
@mwcraig mwcraig modified the milestones: 2.1.1, 2.2.0 Mar 15, 2021
@mwcraig mwcraig modified the milestones: 2.2.0, 2.2.1 May 25, 2021
@mwcraig mwcraig modified the milestones: 2.2.1, 2.3 Nov 21, 2021
@mwcraig mwcraig modified the milestones: 2.3, 3.0 Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants