Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change formula for chunking of files in combine #642

Merged
merged 19 commits into from
Jul 27, 2019

Conversation

mwcraig
Copy link
Member

@mwcraig mwcraig commented Aug 14, 2018

This PR fixes #638 by changing the formula for calculating the way the number of chunks is calculated in combiner.combine. While the change looks a bit arbitrary (adding "memory factor" of 2 or 3 depending on the combine method, and another factor of 1.5) it is best on memory profiling of several cases, described in the comment below. I believe the pattern of memory use by combine is consistent enough to justify the insertion of these factors.

Two things remain to be done to complete this PR, I think:

  • Add a test for memory use that goes too far beyond what the user has set.
  • Add a test for memory use that falls too far below what the user has set. This is intended to remind us to change these factors if future implementations of combination become more memory efficient.
  • Increase current default memory limit (?). Seems unnecessary, since it is currently at 16GB.

Next up, profiling justification for these changes....

@mwcraig
Copy link
Member Author

mwcraig commented Aug 14, 2018

Memory profiling of combine

Background and tools

Profiling was done on an iMac with sufficient memory (32GB) that these runs were not close to using the available memory. Other tasks were being done on the machine while the tests were performed, so these should not be used to try to examine execution speed.

The package memory_profiler was used to carry out the profiling. Memory use was calculated just prior to performing the combine, and that memory is subtracted from the output of the memory profile.

Prior to the profiling runs shown below, a single run was done because the speed of that first run consistently differed from all subsequent runs in the same notebook. Results from that "warm up" run were discarded.

All runs were done using as a base the notebook run_profile in commit 8058300

Numerical parameters and astropy version

All runs except one were done with these parameters:

image_size = 4000   # Square image, so 4000 x 4000
num_files = 10
sampling_interval = 0.01 # sec
memory_limit = 1000000000  # bytes, roughly 1GB

One run increased the memory limit to 2GB.

All runs were done with astropy 3.0.4.

For each commit, all three combine methods (average, median, sum) were test both with and without sigma clipping.

Summary of commits profiled

Memory profiles were performed for several commits; for each commit there is a notebook that was run to perform the profiling. The table below summarizes the commits tested.

Commit Description
38d14d2 This PR, BEFORE adding memory factor (effectively current master)
5ec598c PR #632 -- use numba JIT to speed up median sigma clipping
e73b93e PR #630 -- Reduce number of open files and del some variables
df233f3 This PR, AFTER adding memory factor but BEFORE additional factor of 1.5
a73e72f This PR, AFTER modification of memory factor by additional factor of 1.5
a73e72f As above, but memory limit of 2GB

TL;DR results (red vertical line is memory limit)

Before this PR After this PR
commit_38d14d2_reps_4_clip_off_memlim_1 0gb commit_a73e72f_reps_4_clip_off_memlim_1 0gb

Profiling of 38d14d2; notebook link

Commit on this PR, before adding memory factor (effectively current master)

Summary: Combination with average or sum routinely use 2x memory limit; median uses 3x

Runs with no sigma clipping

commit_38d14d2_reps_4_clip_off_memlim_1 0gb

Runs with sigma clipping -- WRONG (no sigma clipping was done)

commit_38d14d2_reps_4_clip_on_memlim_1 0gb

Profiling of 5ec598c; notebook link

PR #632 -- use numba JIT to speed up median sigma clipping

Summary: Roughly the same as case above, but somewhat higher memory use in median

Runs with no sigma clipping

commit_5ec598c_reps_4_clip_off_memlim_1 0gb

Runs with sigma clipping -- WRONG (no sigma clipping was done)

commit_5ec598c_reps_4_clip_on_memlim_1 0gb

Profiling of e73b93e; notebook link

PR #630 -- Reduce number of open files and del some variables

Summary: Roughly the same as first case above, but memory use more frequently drops back to near zero

Runs with no sigma clipping

commit_e73b93e_reps_4_clip_off_memlim_1 0gb

Runs with sigma clipping -- WRONG (no sigma clipping was done)

commit_e73b93e_reps_4_clip_on_memlim_1 0gb

Profiling of df233f3; notebook link

This PR, AFTER adding memory factor but BEFORE additional factor of 1.5

Summary: Memory use same for all combine methods, but ~50% higher than limit

Runs with no sigma clipping

commit_df233f3_reps_4_clip_off_memlim_1 0gb

Runs with sigma clipping -- WRONG (no sigma clipping was done)

commit_df233f3_reps_4_clip_on_memlim_1 0gb

Profiling of a73e72f; notebook link

This PR, AFTER modification of memory factor by additional factor of 1.5

Summary: All methods use about same memory; while limit is exceeded, it is not by much or for long

Runs with no sigma clipping

commit_a73e72f_reps_4_clip_off_memlim_1 0gb

Runs with sigma clipping -- WRONG (no sigma clipping was done)

commit_a73e72f_reps_4_clip_on_memlim_1 0gb

Profiling of a73e72f, memory limit of 2GB; notebook link

Same as previous case, higher memory limit

Summary: All methods use about same memory; while limit is exceeded, it is not by much or for long

Runs with no sigma clipping

commit_a73e72f_reps_4_clip_off_memlim_2 0gb

Runs with sigma clipping -- WRONG (no sigma clipping was done)

commit_a73e72f_reps_4_clip_on_memlim_2 0gb

@mwcraig
Copy link
Member Author

mwcraig commented Aug 15, 2018

This is ready for a look (I think the tests will pass)

@MSeifert04
Copy link
Contributor

Thanks for the profilings. However it seems a bit odd at least for the numba solution I expected significantly lower memory usage (maybe the intermediate boolean arrays aren't a problem after all...) and at least a bit of performance improvement (that I cannot see from the timeline in your measurements).

However I probably won't have time to look at this (and the other PRs) today or tomorrow.

@mwcraig
Copy link
Member Author

mwcraig commented Aug 16, 2018

I'd be hesitant to draw any performance profiling conclusions from this (I don't know how the memory profiling itself affects performance) but I was a little surprised too. Can try re-running it this weekend....

@mwcraig
Copy link
Member Author

mwcraig commented Aug 16, 2018

Hmmm, may be time to update some of the test matrix entries...failures are because of PathLib vs string for name of directory issues.

@MSeifert04
Copy link
Contributor

@mwcraig Just a thought but could you separate the different parts of the PR into separate PRs? There are essentially 3 things here: the refactor of the "size estimate" (which is very good and straightforward), the "fudge factor" (which I would prefer not to have), and the "memory testing" (which could be very useful but requires a more thorough review).

That's by no means necessary it would make reviewing and discussing the different parts just a bit easier.

@MSeifert04
Copy link
Contributor

For the Path vs. io thing: Just wrap the Path in str and it should work fine. Dropping Python 3.4 seems fine but I wouldn't drop 3.5 just yet (maybe in ccdproc 2.1 or so).

@MSeifert04
Copy link
Contributor

I'd be hesitant to draw any performance profiling conclusions from this (I don't know how the memory profiling itself affects performance) but I was a little surprised too.

Is it possible that it hasn't picked up the current state and some other installation? I just had a look at #632 and the performance difference should be immense enough to be clearly visible here as well (or I'm missing something obvious).

@mwcraig
Copy link
Member Author

mwcraig commented Aug 18, 2018

Is it possible that it hasn't picked up the current state and some other installation?

Turns out my script to turn clipping on set all of the sigma clipping parameters but not sigma_clip=True so clipping wasn't done.

A couple examples with sigma clipping are below; in them the difference in timing between the numba version and a non-numba version (current master) is clear.

The memory use in both cases is about the same and (thankfully) not that different than in the original plots above.

Memory profile, numba branch, sigma clipping on

commit_5ec598c_reps_4_clip_on_memlim_1 0gb_foo

Memory profile, current master (d1b67f0), sigma clipping on

commit_d1b67f0_reps_4_clip_on_memlim_1 0gb

@mwcraig
Copy link
Member Author

mwcraig commented Aug 18, 2018

@mwcraig Just a thought but could you separate the different parts of the PR into separate PRs? There are essentially 3 things here

I'd rather not...I think the memory test and change in memory formula should be together because the parameters to use in the test depend on what we expect the memory usage to be.

@mwcraig
Copy link
Member Author

mwcraig commented Aug 18, 2018

I ran the profile below today to try to get a better sense of why the memory limit calculated in the combine function underestimates actual memory use by so much. The profile below was run with parameters set so that would be no breaking into smaller chunks. I ran it on the tip of the branch for the open files PR because that PR eliminated some potential memory overuse.

I used average_combine and no sigma (or other) clipping.

The first think that jumps out is that Co,biner always makes a copy of the data in its __init__. That the original memory use formula in the combine function didn't take that into account is, I think, a bug in that formula.

It looks like in the average combine case, at least, and additional spike of 50% more memory happens when the actual combination happens (in this case via np.ma.average. In the profile below that is line 420-422 in average_combine where memory use increases then drops).

Not sure why that uses a temporary ~50%.

Will post a profile of median_combine in a moment.

$ python -m memory_profiler run_for_memory_profile.py 10 --memory-limit 2000000000 --size 4000 -c 'average' --generate-files

Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
    59 1007.816 MiB 1007.816 MiB       @profile
    60                                 def __init__(self, ccd_list, dtype=None):
    61 1007.816 MiB    0.000 MiB           if ccd_list is None:
    62                                         raise TypeError("ccd_list should be a list of CCDData objects.")
    63
    64 1007.816 MiB    0.000 MiB           if dtype is None:
    65                                         dtype = np.float64
    66
    67 1007.816 MiB    0.000 MiB           default_shape = None
    68 1007.816 MiB    0.000 MiB           default_unit = None
    69 1007.816 MiB    0.000 MiB           for ccd in ccd_list:
    70                                         # raise an error if the objects aren't CCDData objects
    71 1007.816 MiB    0.000 MiB               if not isinstance(ccd, CCDData):
    72                                             raise TypeError(
    73                                                 "ccd_list should only contain CCDData objects.")
    74
    75                                         # raise an error if the shape is different
    76 1007.816 MiB    0.000 MiB               if default_shape is None:
    77 1007.816 MiB    0.000 MiB                   default_shape = ccd.shape
    78                                         else:
    79 1007.816 MiB    0.000 MiB                   if not (default_shape == ccd.shape):
    80                                                 raise TypeError("CCDData objects are not the same size.")
    81
    82                                         # raise an error if the units are different
    83 1007.816 MiB    0.000 MiB               if default_unit is None:
    84 1007.816 MiB    0.000 MiB                   default_unit = ccd.unit
    85                                         else:
    86 1007.816 MiB    0.000 MiB                   if not (default_unit == ccd.unit):
    87                                                 raise TypeError("CCDData objects don't have the same unit.")
    88
    89 1007.816 MiB    0.000 MiB           self.ccd_list = ccd_list
    90 1007.816 MiB    0.000 MiB           self.unit = default_unit
    91 1007.816 MiB    0.000 MiB           self.weights = None
    92 1007.816 MiB    0.000 MiB           self._dtype = dtype
    93
    94                                     # set up the data array
    95 1007.816 MiB    0.000 MiB           new_shape = (len(ccd_list),) + default_shape
    96 1160.406 MiB  152.590 MiB           self.data_arr = ma.masked_all(new_shape, dtype=dtype)
    97
    98                                     # populate self.data_arr
    99 2381.109 MiB    0.000 MiB           for i, ccd in enumerate(ccd_list):
   100 2381.109 MiB 1220.703 MiB               self.data_arr[i] = ccd.data
   101 2381.109 MiB    0.000 MiB               if ccd.mask is not None:
   102                                             self.data_arr.mask[i] = ccd.mask
   103                                         else:
   104 2381.109 MiB    0.000 MiB                   self.data_arr.mask[i] = ma.zeros(default_shape)
   105
   106                                     # Must be after self.data_arr is defined because it checks the
   107                                     # length of the data array.
   108 2381.109 MiB    0.000 MiB           self.scaling = None


Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
   380 2381.109 MiB 2381.109 MiB       @profile
   381                                 def average_combine(self, scale_func=ma.average, scale_to=None,
   382                                                     uncertainty_func=ma.std):
   383                                     """
   384                                     Average combine together a set of arrays.
   385
   386                                     A `~astropy.nddata.CCDData` object is returned with the data property
   387                                     set to the average of the arrays. If the data was masked or any
   388                                     data have been rejected, those pixels will not be included in the
   389                                     average. A mask will be returned, and if a pixel has been
   390                                     rejected in all images, it will be masked. The uncertainty of
   391                                     the combined image is set by the standard deviation of the input
   392                                     images.
   393
   394                                     Parameters
   395                                     ----------
   396                                     scale_func : function, optional
   397                                         Function to calculate the average. Defaults to
   398                                         `numpy.ma.average`.
   399
   400                                     scale_to : float or None, optional
   401                                         Scaling factor used in the average combined image. If given,
   402                                         it overrides `scaling`. Defaults to ``None``.
   403
   404                                     uncertainty_func : function, optional
   405                                         Function to calculate uncertainty. Defaults to `numpy.ma.std`.
   406
   407                                     Returns
   408                                     -------
   409                                     combined_image: `~astropy.nddata.CCDData`
   410                                         CCDData object based on the combined input of CCDData objects.
   411                                     """
   412 2381.109 MiB    0.000 MiB           if scale_to is not None:
   413                                         scalings = scale_to
   414 2381.109 MiB    0.000 MiB           elif self.scaling is not None:
   415                                         scalings = self.scaling
   416                                     else:
   417 2381.109 MiB    0.000 MiB               scalings = 1.0
   418
   419                                     # set up the data
   420 3755.266 MiB 1374.156 MiB           data, wei = scale_func(scalings * self.data_arr,
   421 3755.266 MiB    0.000 MiB                                  axis=0, weights=self.weights,
   422 3053.598 MiB -701.668 MiB                                  returned=True)
   423
   424                                     # set up the mask
   425 3053.598 MiB    0.000 MiB           masked_values = self.data_arr.mask.sum(axis=0)
   426 3053.602 MiB    0.004 MiB           mask = (masked_values == len(self.data_arr))
   427
   428                                     # set up the deviation
   429 3572.516 MiB  518.914 MiB           uncertainty = uncertainty_func(self.data_arr, axis=0)
   430                                     # Divide uncertainty by the number of pixel (#309)
   431 3572.516 MiB    0.000 MiB           uncertainty /= np.sqrt(len(self.data_arr) - masked_values)
   432                                     # Convert uncertainty to plain numpy array (#351)
   433 3572.516 MiB    0.000 MiB           uncertainty = np.asarray(uncertainty)
   434
   435                                     # create the combined image with a dtype that matches the combiner
   436 3572.516 MiB    0.000 MiB           combined_image = CCDData(np.asarray(data.data, dtype=self.dtype),
   437 3572.516 MiB    0.000 MiB                                    mask=mask, unit=self.unit,
   438 3572.516 MiB    0.000 MiB                                    uncertainty=StdDevUncertainty(uncertainty))
   439
   440                                     # update the meta data
   441 3572.516 MiB    0.000 MiB           combined_image.meta['NCOMBINE'] = len(self.data_arr)
   442
   443                                     # return the combined image
   444 3572.516 MiB    0.000 MiB           return combined_image

@mwcraig
Copy link
Member Author

mwcraig commented Aug 18, 2018

The median profile confuses me. In the time-based profiles graphed above, median clearly peaks out at higher memory than average or sum. The profile below uses less memory (peak) than the profile of average above. 🙄

$ python -m memory_profiler run_for_memory_profile.py 10 --memory-limit 2000000000 --size 4000 -c 'median' --generate-files

Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
    59 1007.797 MiB 1007.797 MiB       @profile
    60                                 def __init__(self, ccd_list, dtype=None):
    61 1007.797 MiB    0.000 MiB           if ccd_list is None:
    62                                         raise TypeError("ccd_list should be a list of CCDData objects.")
    63
    64 1007.797 MiB    0.000 MiB           if dtype is None:
    65                                         dtype = np.float64
    66
    67 1007.797 MiB    0.000 MiB           default_shape = None
    68 1007.797 MiB    0.000 MiB           default_unit = None
    69 1007.797 MiB    0.000 MiB           for ccd in ccd_list:
    70                                         # raise an error if the objects aren't CCDData objects
    71 1007.797 MiB    0.000 MiB               if not isinstance(ccd, CCDData):
    72                                             raise TypeError(
    73                                                 "ccd_list should only contain CCDData objects.")
    74
    75                                         # raise an error if the shape is different
    76 1007.797 MiB    0.000 MiB               if default_shape is None:
    77 1007.797 MiB    0.000 MiB                   default_shape = ccd.shape
    78                                         else:
    79 1007.797 MiB    0.000 MiB                   if not (default_shape == ccd.shape):
    80                                                 raise TypeError("CCDData objects are not the same size.")
    81
    82                                         # raise an error if the units are different
    83 1007.797 MiB    0.000 MiB               if default_unit is None:
    84 1007.797 MiB    0.000 MiB                   default_unit = ccd.unit
    85                                         else:
    86 1007.797 MiB    0.000 MiB                   if not (default_unit == ccd.unit):
    87                                                 raise TypeError("CCDData objects don't have the same unit.")
    88
    89 1007.797 MiB    0.000 MiB           self.ccd_list = ccd_list
    90 1007.797 MiB    0.000 MiB           self.unit = default_unit
    91 1007.797 MiB    0.000 MiB           self.weights = None
    92 1007.797 MiB    0.000 MiB           self._dtype = dtype
    93
    94                                     # set up the data array
    95 1007.797 MiB    0.000 MiB           new_shape = (len(ccd_list),) + default_shape
    96 1160.387 MiB  152.590 MiB           self.data_arr = ma.masked_all(new_shape, dtype=dtype)
    97
    98                                     # populate self.data_arr
    99 2381.090 MiB    0.000 MiB           for i, ccd in enumerate(ccd_list):
   100 2381.090 MiB 1220.703 MiB               self.data_arr[i] = ccd.data
   101 2381.090 MiB    0.000 MiB               if ccd.mask is not None:
   102                                             self.data_arr.mask[i] = ccd.mask
   103                                         else:
   104 2381.090 MiB    0.000 MiB                   self.data_arr.mask[i] = ma.zeros(default_shape)
   105
   106                                     # Must be after self.data_arr is defined because it checks the
   107                                     # length of the data array.
   108 2381.090 MiB    0.000 MiB           self.scaling = None


Filename: /Users/mattcraig/development/ccdproc/ccdproc/combiner.py

Line #    Mem usage    Increment   Line Contents
================================================
   306 2381.090 MiB 2381.090 MiB       @profile
   307                                 # set up the combining algorithms
   308                                 def median_combine(self, median_func=ma.median, scale_to=None,
   309                                                    uncertainty_func=sigma_func):
   310                                     """
   311                                     Median combine a set of arrays.
   312
   313                                     A `~astropy.nddata.CCDData` object is returned with the data property set to
   314                                     the median of the arrays. If the data was masked or any data have been
   315                                     rejected, those pixels will not be included in the median. A mask will
   316                                     be returned, and if a pixel has been rejected in all images, it will be
   317                                     masked. The uncertainty of the combined image is set by 1.4826 times
   318                                     the median absolute deviation of all input images.
   319
   320                                     Parameters
   321                                     ----------
   322                                     median_func : function, optional
   323                                         Function that calculates median of a `numpy.ma.MaskedArray`.
   324                                         Default is `numpy.ma.median`.
   325
   326                                     scale_to : float or None, optional
   327                                         Scaling factor used in the average combined image. If given,
   328                                         it overrides `scaling`.
   329                                         Defaults to None.
   330
   331                                     uncertainty_func : function, optional
   332                                         Function to calculate uncertainty.
   333                                         Defaults is `~ccdproc.sigma_func`.
   334
   335                                     Returns
   336                                     -------
   337                                     combined_image: `~astropy.nddata.CCDData`
   338                                         CCDData object based on the combined input of CCDData objects.
   339
   340                                     Warnings
   341                                     --------
   342                                     The uncertainty currently calculated using the median absolute
   343                                     deviation does not account for rejected pixels.
   344                                     """
   345 2381.090 MiB    0.000 MiB           if scale_to is not None:
   346                                         scalings = scale_to
   347 2381.090 MiB    0.000 MiB           elif self.scaling is not None:
   348                                         scalings = self.scaling
   349                                     else:
   350 2381.090 MiB    0.000 MiB               scalings = 1.0
   351
   352                                     # set the data
   353 2855.453 MiB  474.363 MiB           data = median_func(scalings * self.data_arr, axis=0)
   354
   355                                     # set the mask
   356 2855.453 MiB    0.000 MiB           masked_values = self.data_arr.mask.sum(axis=0)
   357 2855.453 MiB    0.000 MiB           mask = (masked_values == len(self.data_arr))
   358
   359                                     # set the uncertainty
   360 3251.941 MiB  396.488 MiB           uncertainty = uncertainty_func(self.data_arr, axis=0)
   361                                     # Divide uncertainty by the number of pixel (#309)
   362 3251.961 MiB    0.020 MiB           uncertainty /= np.sqrt(len(self.data_arr) - masked_values)
   363                                     # Convert uncertainty to plain numpy array (#351)
   364                                     # There is no need to care about potential masks because the
   365                                     # uncertainty was calculated based on the data so potential masked
   366                                     # elements are also masked in the data. No need to keep two identical
   367                                     # masks.
   368 3251.961 MiB    0.000 MiB           uncertainty = np.asarray(uncertainty)
   369
   370                                     # create the combined image with a dtype matching the combiner
   371 3251.961 MiB    0.000 MiB           combined_image = CCDData(np.asarray(data.data, dtype=self.dtype),
   372 3251.961 MiB    0.000 MiB                                    mask=mask, unit=self.unit,
   373 3251.961 MiB    0.000 MiB                                    uncertainty=StdDevUncertainty(uncertainty))
   374
   375                                     # update the meta data
   376 3251.961 MiB    0.000 MiB           combined_image.meta['NCOMBINE'] = len(self.data_arr)
   377
   378                                     # return the combined image
   379 3251.961 MiB    0.000 MiB           return combined_image

@crawfordsm
Copy link
Member

If memory_profile is not installed, these tests should be skipped

"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"\n",
"from run_for_memory_profile import run_with_limit, generate_fits_files\n",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise an obvious exception in case memory_profile is not installed.

because apparently numpy 1.13 can be a little more memory-intensive than others
@crawfordsm crawfordsm merged commit b605b17 into astropy:master Jul 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory use in the combine function exceeds (by factor of 2-4 or more) the memory limit passed in
3 participants