Change formula for chunking of files in `combine` #642

mwcraig · 2018-08-14T16:46:27Z

This PR fixes #638 by changing the formula for calculating the way the number of chunks is calculated in combiner.combine. While the change looks a bit arbitrary (adding "memory factor" of 2 or 3 depending on the combine method, and another factor of 1.5) it is best on memory profiling of several cases, described in the comment below. I believe the pattern of memory use by combine is consistent enough to justify the insertion of these factors.

Two things remain to be done to complete this PR, I think:

Add a test for memory use that goes too far beyond what the user has set.
Add a test for memory use that falls too far below what the user has set. This is intended to remind us to change these factors if future implementations of combination become more memory efficient.
~~Increase current default memory limit (?).~~ Seems unnecessary, since it is currently at 16GB.

Next up, profiling justification for these changes....

mwcraig · 2018-08-14T18:30:15Z

Memory profiling of `combine`

Background and tools

Profiling was done on an iMac with sufficient memory (32GB) that these runs were not close to using the available memory. Other tasks were being done on the machine while the tests were performed, so these should not be used to try to examine execution speed.

The package memory_profiler was used to carry out the profiling. Memory use was calculated just prior to performing the combine, and that memory is subtracted from the output of the memory profile.

Prior to the profiling runs shown below, a single run was done because the speed of that first run consistently differed from all subsequent runs in the same notebook. Results from that "warm up" run were discarded.

All runs were done using as a base the notebook run_profile in commit 8058300

Numerical parameters and astropy version

All runs except one were done with these parameters:

image_size = 4000   # Square image, so 4000 x 4000
num_files = 10
sampling_interval = 0.01 # sec
memory_limit = 1000000000  # bytes, roughly 1GB

One run increased the memory limit to 2GB.

All runs were done with astropy 3.0.4.

For each commit, all three combine methods (average, median, sum) were test both with and without sigma clipping.

Summary of commits profiled

Memory profiles were performed for several commits; for each commit there is a notebook that was run to perform the profiling. The table below summarizes the commits tested.

Commit	Description
`38d14d2`	This PR, BEFORE adding memory factor (effectively current master)
`5ec598c`	PR #632 -- use numba JIT to speed up median sigma clipping
`e73b93e`	PR #630 -- Reduce number of open files and `del` some variables
`df233f3`	This PR, AFTER adding memory factor but BEFORE additional factor of 1.5
`a73e72f`	This PR, AFTER modification of memory factor by additional factor of 1.5
`a73e72f`	As above, but memory limit of 2GB

TL;DR results (red vertical line is memory limit)

Before this PR	After this PR

Profiling of `38d14d2`; notebook link

Commit on this PR, before adding memory factor (effectively current master)

Summary: Combination with average or sum routinely use 2x memory limit; median uses 3x

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of `5ec598c`; notebook link

PR #632 -- use numba JIT to speed up median sigma clipping

Summary: Roughly the same as case above, but somewhat higher memory use in median

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of `e73b93e`; notebook link

PR #630 -- Reduce number of open files and `del` some variables

Summary: Roughly the same as first case above, but memory use more frequently drops back to near zero

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of `df233f3`; notebook link

This PR, AFTER adding memory factor but BEFORE additional factor of 1.5

Summary: Memory use same for all combine methods, but ~50% higher than limit

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of `a73e72f`; notebook link

This PR, AFTER modification of memory factor by additional factor of 1.5

Summary: All methods use about same memory; while limit is exceeded, it is not by much or for long

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of `a73e72f`, memory limit of 2GB; notebook link

Same as previous case, higher memory limit

Summary: All methods use about same memory; while limit is exceeded, it is not by much or for long

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

mwcraig · 2018-08-15T00:00:12Z

This is ready for a look (I think the tests will pass)

MSeifert04 · 2018-08-15T19:26:22Z

Thanks for the profilings. However it seems a bit odd at least for the numba solution I expected significantly lower memory usage (maybe the intermediate boolean arrays aren't a problem after all...) and at least a bit of performance improvement (that I cannot see from the timeline in your measurements).

However I probably won't have time to look at this (and the other PRs) today or tomorrow.

mwcraig · 2018-08-16T04:54:25Z

I'd be hesitant to draw any performance profiling conclusions from this (I don't know how the memory profiling itself affects performance) but I was a little surprised too. Can try re-running it this weekend....

mwcraig · 2018-08-16T04:58:21Z

Hmmm, may be time to update some of the test matrix entries...failures are because of PathLib vs string for name of directory issues.

MSeifert04 · 2018-08-16T19:18:30Z

@mwcraig Just a thought but could you separate the different parts of the PR into separate PRs? There are essentially 3 things here: the refactor of the "size estimate" (which is very good and straightforward), the "fudge factor" (which I would prefer not to have), and the "memory testing" (which could be very useful but requires a more thorough review).

That's by no means necessary it would make reviewing and discussing the different parts just a bit easier.

MSeifert04 · 2018-08-16T19:21:56Z

For the Path vs. io thing: Just wrap the Path in str and it should work fine. Dropping Python 3.4 seems fine but I wouldn't drop 3.5 just yet (maybe in ccdproc 2.1 or so).

MSeifert04 · 2018-08-16T19:43:40Z

I'd be hesitant to draw any performance profiling conclusions from this (I don't know how the memory profiling itself affects performance) but I was a little surprised too.

Is it possible that it hasn't picked up the current state and some other installation? I just had a look at #632 and the performance difference should be immense enough to be clearly visible here as well (or I'm missing something obvious).

mwcraig · 2018-08-18T01:37:30Z

Is it possible that it hasn't picked up the current state and some other installation?

Turns out my script to turn clipping on set all of the sigma clipping parameters but not sigma_clip=True so clipping wasn't done.

A couple examples with sigma clipping are below; in them the difference in timing between the numba version and a non-numba version (current master) is clear.

The memory use in both cases is about the same and (thankfully) not that different than in the original plots above.

Memory profile, numba branch, sigma clipping on

Memory profile, current master (`d1b67f0`), sigma clipping on

mwcraig · 2018-08-18T19:42:39Z

@mwcraig Just a thought but could you separate the different parts of the PR into separate PRs? There are essentially 3 things here

I'd rather not...I think the memory test and change in memory formula should be together because the parameters to use in the test depend on what we expect the memory usage to be.

mwcraig · 2018-08-18T19:57:01Z

I ran the profile below today to try to get a better sense of why the memory limit calculated in the combine function underestimates actual memory use by so much. The profile below was run with parameters set so that would be no breaking into smaller chunks. I ran it on the tip of the branch for the open files PR because that PR eliminated some potential memory overuse.

I used average_combine and no sigma (or other) clipping.

The first think that jumps out is that Co,biner always makes a copy of the data in its __init__. That the original memory use formula in the combine function didn't take that into account is, I think, a bug in that formula.

It looks like in the average combine case, at least, and additional spike of 50% more memory happens when the actual combination happens (in this case via np.ma.average. In the profile below that is line 420-422 in average_combine where memory use increases then drops).

Not sure why that uses a temporary ~50%.

Will post a profile of median_combine in a moment.

$ python -m memory_profiler run_for_memory_profile.py 10 --memory-limit 2000000000 --size 4000 -c 'average' --generate-files

Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
    59 1007.816 MiB 1007.816 MiB       @profile
    60                                 def __init__(self, ccd_list, dtype=None):
    61 1007.816 MiB    0.000 MiB           if ccd_list is None:
    62                                         raise TypeError("ccd_list should be a list of CCDData objects.")
    63
    64 1007.816 MiB    0.000 MiB           if dtype is None:
    65                                         dtype = np.float64
    66
    67 1007.816 MiB    0.000 MiB           default_shape = None
    68 1007.816 MiB    0.000 MiB           default_unit = None
    69 1007.816 MiB    0.000 MiB           for ccd in ccd_list:
    70                                         # raise an error if the objects aren't CCDData objects
    71 1007.816 MiB    0.000 MiB               if not isinstance(ccd, CCDData):
    72                                             raise TypeError(
    73                                                 "ccd_list should only contain CCDData objects.")
    74
    75                                         # raise an error if the shape is different
    76 1007.816 MiB    0.000 MiB               if default_shape is None:
    77 1007.816 MiB    0.000 MiB                   default_shape = ccd.shape
    78                                         else:
    79 1007.816 MiB    0.000 MiB                   if not (default_shape == ccd.shape):
    80                                                 raise TypeError("CCDData objects are not the same size.")
    81
    82                                         # raise an error if the units are different
    83 1007.816 MiB    0.000 MiB               if default_unit is None:
    84 1007.816 MiB    0.000 MiB                   default_unit = ccd.unit
    85                                         else:
    86 1007.816 MiB    0.000 MiB                   if not (default_unit == ccd.unit):
    87                                                 raise TypeError("CCDData objects don't have the same unit.")
    88
    89 1007.816 MiB    0.000 MiB           self.ccd_list = ccd_list
    90 1007.816 MiB    0.000 MiB           self.unit = default_unit
    91 1007.816 MiB    0.000 MiB           self.weights = None
    92 1007.816 MiB    0.000 MiB           self._dtype = dtype
    93
    94                                     # set up the data array
    95 1007.816 MiB    0.000 MiB           new_shape = (len(ccd_list),) + default_shape
    96 1160.406 MiB  152.590 MiB           self.data_arr = ma.masked_all(new_shape, dtype=dtype)
    97
    98                                     # populate self.data_arr
    99 2381.109 MiB    0.000 MiB           for i, ccd in enumerate(ccd_list):
   100 2381.109 MiB 1220.703 MiB               self.data_arr[i] = ccd.data
   101 2381.109 MiB    0.000 MiB               if ccd.mask is not None:
   102                                             self.data_arr.mask[i] = ccd.mask
   103                                         else:
   104 2381.109 MiB    0.000 MiB                   self.data_arr.mask[i] = ma.zeros(default_shape)
   105
   106                                     # Must be after self.data_arr is defined because it checks the
   107                                     # length of the data array.
   108 2381.109 MiB    0.000 MiB           self.scaling = None


Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
   380 2381.109 MiB 2381.109 MiB       @profile
   381                                 def average_combine(self, scale_func=ma.average, scale_to=None,
   382                                                     uncertainty_func=ma.std):
   383                                     """
   384                                     Average combine together a set of arrays.
   385
   386                                     A `~astropy.nddata.CCDData` object is returned with the data property
   387                                     set to the average of the arrays. If the data was masked or any
   388                                     data have been rejected, those pixels will not be included in the
   389                                     average. A mask will be returned, and if a pixel has been
   390                                     rejected in all images, it will be masked. The uncertainty of
   391                                     the combined image is set by the standard deviation of the input
   392                                     images.
   393
   394                                     Parameters
   395                                     ----------
   396                                     scale_func : function, optional
   397                                         Function to calculate the average. Defaults to
   398                                         `numpy.ma.average`.
   399
   400                                     scale_to : float or None, optional
   401                                         Scaling factor used in the average combined image. If given,
   402                                         it overrides `scaling`. Defaults to ``None``.
   403
   404                                     uncertainty_func : function, optional
   405                                         Function to calculate uncertainty. Defaults to `numpy.ma.std`.
   406
   407                                     Returns
   408                                     -------
   409                                     combined_image: `~astropy.nddata.CCDData`
   410                                         CCDData object based on the combined input of CCDData objects.
   411                                     """
   412 2381.109 MiB    0.000 MiB           if scale_to is not None:
   413                                         scalings = scale_to
   414 2381.109 MiB    0.000 MiB           elif self.scaling is not None:
   415                                         scalings = self.scaling
   416                                     else:
   417 2381.109 MiB    0.000 MiB               scalings = 1.0
   418
   419                                     # set up the data
   420 3755.266 MiB 1374.156 MiB           data, wei = scale_func(scalings * self.data_arr,
   421 3755.266 MiB    0.000 MiB                                  axis=0, weights=self.weights,
   422 3053.598 MiB -701.668 MiB                                  returned=True)
   423
   424                                     # set up the mask
   425 3053.598 MiB    0.000 MiB           masked_values = self.data_arr.mask.sum(axis=0)
   426 3053.602 MiB    0.004 MiB           mask = (masked_values == len(self.data_arr))
   427
   428                                     # set up the deviation
   429 3572.516 MiB  518.914 MiB           uncertainty = uncertainty_func(self.data_arr, axis=0)
   430                                     # Divide uncertainty by the number of pixel (#309)
   431 3572.516 MiB    0.000 MiB           uncertainty /= np.sqrt(len(self.data_arr) - masked_values)
   432                                     # Convert uncertainty to plain numpy array (#351)
   433 3572.516 MiB    0.000 MiB           uncertainty = np.asarray(uncertainty)
   434
   435                                     # create the combined image with a dtype that matches the combiner
   436 3572.516 MiB    0.000 MiB           combined_image = CCDData(np.asarray(data.data, dtype=self.dtype),
   437 3572.516 MiB    0.000 MiB                                    mask=mask, unit=self.unit,
   438 3572.516 MiB    0.000 MiB                                    uncertainty=StdDevUncertainty(uncertainty))
   439
   440                                     # update the meta data
   441 3572.516 MiB    0.000 MiB           combined_image.meta['NCOMBINE'] = len(self.data_arr)
   442
   443                                     # return the combined image
   444 3572.516 MiB    0.000 MiB           return combined_image

mwcraig · 2018-08-18T20:06:45Z

The median profile confuses me. In the time-based profiles graphed above, median clearly peaks out at higher memory than average or sum. The profile below uses less memory (peak) than the profile of average above. 🙄

$ python -m memory_profiler run_for_memory_profile.py 10 --memory-limit 2000000000 --size 4000 -c 'median' --generate-files

Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
    59 1007.797 MiB 1007.797 MiB       @profile
    60                                 def __init__(self, ccd_list, dtype=None):
    61 1007.797 MiB    0.000 MiB           if ccd_list is None:
    62                                         raise TypeError("ccd_list should be a list of CCDData objects.")
    63
    64 1007.797 MiB    0.000 MiB           if dtype is None:
    65                                         dtype = np.float64
    66
    67 1007.797 MiB    0.000 MiB           default_shape = None
    68 1007.797 MiB    0.000 MiB           default_unit = None
    69 1007.797 MiB    0.000 MiB           for ccd in ccd_list:
    70                                         # raise an error if the objects aren't CCDData objects
    71 1007.797 MiB    0.000 MiB               if not isinstance(ccd, CCDData):
    72                                             raise TypeError(
    73                                                 "ccd_list should only contain CCDData objects.")
    74
    75                                         # raise an error if the shape is different
    76 1007.797 MiB    0.000 MiB               if default_shape is None:
    77 1007.797 MiB    0.000 MiB                   default_shape = ccd.shape
    78                                         else:
    79 1007.797 MiB    0.000 MiB                   if not (default_shape == ccd.shape):
    80                                                 raise TypeError("CCDData objects are not the same size.")
    81
    82                                         # raise an error if the units are different
    83 1007.797 MiB    0.000 MiB               if default_unit is None:
    84 1007.797 MiB    0.000 MiB                   default_unit = ccd.unit
    85                                         else:
    86 1007.797 MiB    0.000 MiB                   if not (default_unit == ccd.unit):
    87                                                 raise TypeError("CCDData objects don't have the same unit.")
    88
    89 1007.797 MiB    0.000 MiB           self.ccd_list = ccd_list
    90 1007.797 MiB    0.000 MiB           self.unit = default_unit
    91 1007.797 MiB    0.000 MiB           self.weights = None
    92 1007.797 MiB    0.000 MiB           self._dtype = dtype
    93
    94                                     # set up the data array
    95 1007.797 MiB    0.000 MiB           new_shape = (len(ccd_list),) + default_shape
    96 1160.387 MiB  152.590 MiB           self.data_arr = ma.masked_all(new_shape, dtype=dtype)
    97
    98                                     # populate self.data_arr
    99 2381.090 MiB    0.000 MiB           for i, ccd in enumerate(ccd_list):
   100 2381.090 MiB 1220.703 MiB               self.data_arr[i] = ccd.data
   101 2381.090 MiB    0.000 MiB               if ccd.mask is not None:
   102                                             self.data_arr.mask[i] = ccd.mask
   103                                         else:
   104 2381.090 MiB    0.000 MiB                   self.data_arr.mask[i] = ma.zeros(default_shape)
   105
   106                                     # Must be after self.data_arr is defined because it checks the
   107                                     # length of the data array.
   108 2381.090 MiB    0.000 MiB           self.scaling = None


Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
   306 2381.090 MiB 2381.090 MiB       @profile
   307                                 # set up the combining algorithms
   308                                 def median_combine(self, median_func=ma.median, scale_to=None,
   309                                                    uncertainty_func=sigma_func):
   310                                     """
   311                                     Median combine a set of arrays.
   312
   313                                     A `~astropy.nddata.CCDData` object is returned with the data property set to
   314                                     the median of the arrays. If the data was masked or any data have been
   315                                     rejected, those pixels will not be included in the median. A mask will
   316                                     be returned, and if a pixel has been rejected in all images, it will be
   317                                     masked. The uncertainty of the combined image is set by 1.4826 times
   318                                     the median absolute deviation of all input images.
   319
   320                                     Parameters
   321                                     ----------
   322                                     median_func : function, optional
   323                                         Function that calculates median of a `numpy.ma.MaskedArray`.
   324                                         Default is `numpy.ma.median`.
   325
   326                                     scale_to : float or None, optional
   327                                         Scaling factor used in the average combined image. If given,
   328                                         it overrides `scaling`.
   329                                         Defaults to None.
   330
   331                                     uncertainty_func : function, optional
   332                                         Function to calculate uncertainty.
   333                                         Defaults is `~ccdproc.sigma_func`.
   334
   335                                     Returns
   336                                     -------
   337                                     combined_image: `~astropy.nddata.CCDData`
   338                                         CCDData object based on the combined input of CCDData objects.
   339
   340                                     Warnings
   341                                     --------
   342                                     The uncertainty currently calculated using the median absolute
   343                                     deviation does not account for rejected pixels.
   344                                     """
   345 2381.090 MiB    0.000 MiB           if scale_to is not None:
   346                                         scalings = scale_to
   347 2381.090 MiB    0.000 MiB           elif self.scaling is not None:
   348                                         scalings = self.scaling
   349                                     else:
   350 2381.090 MiB    0.000 MiB               scalings = 1.0
   351
   352                                     # set the data
   353 2855.453 MiB  474.363 MiB           data = median_func(scalings * self.data_arr, axis=0)
   354
   355                                     # set the mask
   356 2855.453 MiB    0.000 MiB           masked_values = self.data_arr.mask.sum(axis=0)
   357 2855.453 MiB    0.000 MiB           mask = (masked_values == len(self.data_arr))
   358
   359                                     # set the uncertainty
   360 3251.941 MiB  396.488 MiB           uncertainty = uncertainty_func(self.data_arr, axis=0)
   361                                     # Divide uncertainty by the number of pixel (#309)
   362 3251.961 MiB    0.020 MiB           uncertainty /= np.sqrt(len(self.data_arr) - masked_values)
   363                                     # Convert uncertainty to plain numpy array (#351)
   364                                     # There is no need to care about potential masks because the
   365                                     # uncertainty was calculated based on the data so potential masked
   366                                     # elements are also masked in the data. No need to keep two identical
   367                                     # masks.
   368 3251.961 MiB    0.000 MiB           uncertainty = np.asarray(uncertainty)
   369
   370                                     # create the combined image with a dtype matching the combiner
   371 3251.961 MiB    0.000 MiB           combined_image = CCDData(np.asarray(data.data, dtype=self.dtype),
   372 3251.961 MiB    0.000 MiB                                    mask=mask, unit=self.unit,
   373 3251.961 MiB    0.000 MiB                                    uncertainty=StdDevUncertainty(uncertainty))
   374
   375                                     # update the meta data
   376 3251.961 MiB    0.000 MiB           combined_image.meta['NCOMBINE'] = len(self.data_arr)
   377
   378                                     # return the combined image
   379 3251.961 MiB    0.000 MiB           return combined_image

crawfordsm · 2018-10-19T18:02:25Z

If memory_profile is not installed, these tests should be skipped

mwcraig · 2018-10-19T18:03:38Z

ccdproc/tests/run_profile.ipynb

+    "from matplotlib import pyplot as plt\n",
+    "import numpy as np\n",
+    "\n",
+    "from run_for_memory_profile import run_with_limit, generate_fits_files\n",


Raise an obvious exception in case memory_profile is not installed.

… files)

because apparently numpy 1.13 can be a little more memory-intensive than others

mwcraig mentioned this pull request Aug 14, 2018

Memory use in the combine function exceeds (by factor of 2-4 or more) the memory limit passed in #638

Closed

mwcraig requested review from MSeifert04 and crawfordsm August 14, 2018 23:59

mwcraig commented Oct 19, 2018

View reviewed changes

mwcraig added 13 commits July 25, 2019 11:05

Move calculation of image size into function

9033104

Add profiling utilities

e9e6330

Measure and subtract baseline memory usage

f7befd7

Update notebook for running profile

c0712b0

Get the git hash from the install

38b9300

Provide fallback for calculating image size

1d446c8

Add a memory use factor to account for overhead in combining

6876de7

Modify calculation of the memory factor

8c04bb0

Make a couple more improvements to plots in memory profile notebook

4677436

First draft of tests

dca9733

Add memory_profiler to test requirements

6d12620

Only create test files once (and fix an import)(and try to close open…

bd4a928

… files)

Wrap Path in str to fix breaking tests on some python versions

2f724d3

mwcraig added 2 commits July 25, 2019 11:07

Fix error in sigma clip setup

8b25fb9

Raise memory overhead allowance in memory test

b3b5bdc

because apparently numpy 1.13 can be a little more memory-intensive than others

mwcraig force-pushed the memory-profile branch from 4b2b2bf to b3b5bdc Compare July 25, 2019 15:09

mwcraig added this to the 2.0 milestone Jul 25, 2019

mwcraig added bug combiner labels Jul 25, 2019

mwcraig and others added 4 commits July 25, 2019 11:15

Add changelog

0486779

Skip appropriate tests if memory_profiler is not installed

5a3dc20

Give a more helpful error message if memory_profiler is not installed

848e73d

Merge branch 'master' into memory-profile

d21ffae

crawfordsm merged commit b605b17 into astropy:master Jul 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change formula for chunking of files in `combine` #642

Change formula for chunking of files in `combine` #642

mwcraig commented Aug 14, 2018 •

edited

Loading

mwcraig commented Aug 14, 2018 •

edited

Loading

mwcraig commented Aug 15, 2018

MSeifert04 commented Aug 15, 2018

mwcraig commented Aug 16, 2018

mwcraig commented Aug 16, 2018 •

edited

Loading

MSeifert04 commented Aug 16, 2018

MSeifert04 commented Aug 16, 2018

MSeifert04 commented Aug 16, 2018

mwcraig commented Aug 18, 2018

mwcraig commented Aug 18, 2018

mwcraig commented Aug 18, 2018

mwcraig commented Aug 18, 2018

crawfordsm commented Oct 19, 2018

mwcraig Oct 19, 2018

Change formula for chunking of files in combine #642

Change formula for chunking of files in combine #642

Conversation

mwcraig commented Aug 14, 2018 • edited Loading

mwcraig commented Aug 14, 2018 • edited Loading

Memory profiling of combine

Background and tools

Numerical parameters and astropy version

Summary of commits profiled

TL;DR results (red vertical line is memory limit)

Profiling of 38d14d2; notebook link

Commit on this PR, before adding memory factor (effectively current master)

Summary: Combination with average or sum routinely use 2x memory limit; median uses 3x

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of 5ec598c; notebook link

PR #632 -- use numba JIT to speed up median sigma clipping

Summary: Roughly the same as case above, but somewhat higher memory use in median

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of e73b93e; notebook link

PR #630 -- Reduce number of open files and del some variables

Summary: Roughly the same as first case above, but memory use more frequently drops back to near zero

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of df233f3; notebook link

This PR, AFTER adding memory factor but BEFORE additional factor of 1.5

Summary: Memory use same for all combine methods, but ~50% higher than limit

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of a73e72f; notebook link

This PR, AFTER modification of memory factor by additional factor of 1.5

Summary: All methods use about same memory; while limit is exceeded, it is not by much or for long

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

Profiling of a73e72f, memory limit of 2GB; notebook link

Same as previous case, higher memory limit

Summary: All methods use about same memory; while limit is exceeded, it is not by much or for long

Runs with no sigma clipping

Runs with sigma clipping -- WRONG (no sigma clipping was done)

mwcraig commented Aug 15, 2018

MSeifert04 commented Aug 15, 2018

mwcraig commented Aug 16, 2018

mwcraig commented Aug 16, 2018 • edited Loading

MSeifert04 commented Aug 16, 2018

MSeifert04 commented Aug 16, 2018

MSeifert04 commented Aug 16, 2018

mwcraig commented Aug 18, 2018

Memory profile, numba branch, sigma clipping on

Memory profile, current master (d1b67f0), sigma clipping on

mwcraig commented Aug 18, 2018

mwcraig commented Aug 18, 2018

mwcraig commented Aug 18, 2018

crawfordsm commented Oct 19, 2018

mwcraig Oct 19, 2018

Choose a reason for hiding this comment

Change formula for chunking of files in `combine` #642

Change formula for chunking of files in `combine` #642

mwcraig commented Aug 14, 2018 •

edited

Loading

mwcraig commented Aug 14, 2018 •

edited

Loading

Memory profiling of `combine`

Profiling of `38d14d2`; notebook link

Profiling of `5ec598c`; notebook link

Profiling of `e73b93e`; notebook link

PR #630 -- Reduce number of open files and `del` some variables

Profiling of `df233f3`; notebook link

Profiling of `a73e72f`; notebook link

Profiling of `a73e72f`, memory limit of 2GB; notebook link

mwcraig commented Aug 16, 2018 •

edited

Loading

Memory profile, current master (`d1b67f0`), sigma clipping on