Combiner_fits for large number of images with limited memory #218

indiajoe · 2015-06-04T20:29:04Z

As discussed in #176 , Combiner_fits is a class which can be used to combine large number of images using all the features of Combiner (under the constrain of an user defined memory limit).

…ccd object

…g onces from Combiner

indiajoe · 2015-06-04T20:43:50Z

ccdproc/combiner.py

+
+    **kwargs: Other keyword arguments for CCD Object's fits reader.
+    """
+    def __init__(self, img_list, mem_limit=16e9, output_fits=None, **kwargs):


Should we ask user to input mem_limit in in MB or GB instead of bytes?

crawfordsm · 2015-06-04T21:22:22Z

Thanks for the contribution @indiajoe. Sorry for the delay in feeback on this. I think in my head I had more of an idea of a convenience function rather than another class in order to do this so it might take a little time for me to process what you have done here.

One of my first thought that I had thought was that it should inherit from Combiner and overwrite the functions that it uses and that it should also use the property decorator rather than the set_method naming scheme. However, before you proceed further on this, let's figure out what would work best.

indiajoe · 2015-06-05T01:26:39Z

@crawfordsm Thanks for the feedback. A separate class was also my least favourite choice. Following are the reasons in the end I thought a separate class might be needed.

Combiner instance has many steps like sigma_clipping which are invoked before calling the actual combine method. If we ask user to provide all the parameters for all the steps in a single function call we will have to keep a long list of keywords unique.
( I could implement this too so that we can compare which will be better.)
If we inherit the class, we will have to override pretty much every method and data attributes of Combiner, mainly because we need to invoke multiple instances of the Combiner object one after other,
Keeping in view that Combiner class might get additional attributes in future, I was trying to keep the design of Combiner_fits in such a way that requires only minor additions.

crawfordsm · 2015-06-05T21:24:34Z

The more I think about this I think that a function would be a better way to go. It would probably provide more convenience to the user and could be easily replaced if/when a cython implementation of combine can be written.

I don't think it would be too much work as most of it you have done already in run_on_all_tiles. I think it would primarily be taken that code and then most of the work would be in writing the doc_string.

However, it would be nice if the function could also take in a list of CCDData objects as well as a list of filenames. And one more thing to consider in the estimate of the memory issue, is that if there are variance frames and bad pixel maps, they will further limit the amount of memory available.

Let me know if it would be too much work to change this over to a function or if you have any questions, but I think the work that you have done so far is very helpful.

indiajoe · 2015-06-05T21:47:24Z

@crawfordsm Sure, I shall convert them into a function.

mwcraig · 2015-07-16T19:20:12Z

@indiajoe -- giving this a friendly bump to see if you have had time to convert to a function.

indiajoe · 2015-07-16T19:46:53Z

@mwcraig Hi, Really sorry for the delay, got tied up with a paper submission. give me one more week, I shall do it and send pull request.

mwcraig · 2015-07-16T19:57:52Z

No worries -- my PR #217 for just as long!

mwcraig · 2015-08-04T18:02:53Z

@indiajoe -- another friendly bump, @crawfordsm and I were just looking through pull requests and are eager to add this!

indiajoe · 2015-08-06T02:51:29Z

@mwcraig @crawfordsm Hi, sorry for the delay. I have just now converted the class into a convenience function and pushed to the branch. This is an untested code. Aim is to make sure we all agree with the general structure of the function. My only worry is that the long list of arguments will have to keep growing as Combiner class gets new features in future.

mwcraig · 2015-08-06T17:28:22Z

@indiajoe -- thanks!

I see what you mean about the long list of keywords, that seems fragile.

Keeping in mind that I haven't really thought this through, what if we modified the current Combiner class so that it has one additional option that specifies whether it tiles or not? That would avoid the issues that came up when subclassing (i.e. having to override everything) and the issue of a long list of args, but presumably means changing nearly every method in Combiner.

I'll defer to you and @crawfordsm on a final decision here since you have both thought about this much more than I have!

indiajoe · 2015-08-07T12:45:16Z

@mwcraig If we want to modify Combiner itself. It will be some substantial change. For instance, Combiner has a method which clips certain pixels by masking them. Since we cannot keep all the images in memory we will have to write that mask to hardisk and then load it again when we run the actual combine function. This can make the entire combining process really slow.
The other option is to change in Combiner to call all preprocessing functions like the clipping function etc. only at the last moment when the data is being combined.

crawfordsm · 2015-08-07T15:45:58Z

Longer term, I think we do want to re-write the combiner class at a lower level so that it does handle the rejection and combining at a pixel to pixel level. Until that happens though, I think this is a pretty good compromise. I don't see the parameter list growing significantly until that time, so I think it is okay to keep the long list of keywords. I'll have some comments on the PR this weekend hopefully.

indiajoe · 2015-08-08T11:53:08Z

@crawfordsm Sure. That sounds good.
I would like to add two more extra features to combiner which will add three more arguments after this pull request is over.

zero : This is equivalent to zero shift option in IRAF imcombine. It is a very useful and frequently used feature while median combining NIR images.
statsection : Another very useful feature in IRAF imcombine, since in most cases we want only a section of good part of detector to be used for calculating stats for scaling and zero shifting.
I shall open separate pull request for these later.

crawfordsm · 2015-08-18T11:30:55Z

ccdproc/combiner.py

+    #determine the number of chunks to split the images into
+    no_chunks = int((size_of_an_img*no_of_img)/mem_limit)+1
+    print('Spliting each image into {1} chunks to limit memory usage to {0} bytes.'.format(self.mem_limit,no_chunks))
+    Xs, Ys = ccd_dummy.data.shape


change these to lower case xs, ys. Same goes for Xstep, Ystep below

crawfordsm · 2015-08-18T11:35:38Z

@indiajoe Overall the code looks pretty good. I have a few small minor comments that just changed some of the names to match up with our style requirements. The other request would be to set it so that it could be passed a list of CCDData objects to eliminate the step of reading in new files if the files are already in memory.

crawfordsm · 2015-08-18T11:36:52Z

Of course, now that it is at a good spot, the other thing that I was hoping to ask you to add would be some unit tests. At the very least, anything that raises an error should have a test. I don't think you need to add tests for the combination as those are tested at a lower level.

indiajoe · 2015-08-18T16:34:30Z

@crawfordsm I was wondering about how to make unit test for this function. I should look for some way to check memory usage during this function call, but I should not use external dependencies like the memory_profiler module. Right?

crawfordsm · 2015-08-18T17:10:43Z

First, I'd just add tests that check the inputs and the errors that are raised. Then it would probably be good to add a test with combining a small data set with and without tiling to confirm they produce the same results.

I'm not sure if we do want to add something that checks the memory of a specific system as they might run into problems with different OSes and different memory limits, which might be beyond the scope of what we can test for.

indiajoe · 2015-09-05T23:27:11Z

@crawfordsm Really sorry for the delay to write tests. This weekend I hope to finish all the test writing.
I am getting an IOError while running the tests. I added the fits filename in ccdproc/tests/setup_package.py . Not sure I understand where I'm making mistake.

mwcraig · 2015-09-06T02:04:17Z

@indiajoe -- the FITS file should already be included (it is in ccdproc/data). The way to access it from the test suite is like this:

# Add to imports
from astropy.utils.data import get_pkg_data_filename

# In your test:
def test_my_great_test():
     path_to_file = get_pkg_data_filename('../data/a8280271.fits')

Should also note this is not at all obvious...took me ~45 min poking through the astropy source code to get this right (I hope).

…rtion

indiajoe · 2015-09-06T14:56:09Z

@mwcraig Thanks a lot for showing how to add the path to file. That part is working fine now.

While testing I found something else strange.
Average of an array while using np.ma.average() is significantly different from .mean()!
For example incase of the fits file in data/
ccd.data.mean() and np.ma.average(ccd.data) gives 296.0 and 296.000315729 respectively.

I suspect this is the reason why combined array are not matching even up to 4th decimal point on Travis CL. (On my machine, It matches fine up to 4 decimal places!).
I need to trouble shoot, exactly which stage we are loosing precision.

Another query, are we going to keep all the test data in ccdproc/data/ ?
In other astropy packages, I thought the test data are usually kept in a files/ subdirectory of tests/.

mwcraig · 2015-09-08T01:45:26Z

@indiajoe -- interesting, no idea why there is precision loss. I wonder if maybe masking is being done in one case but not another?

I think (though I'd have to double check) that it is the norm to put files in a data subdirectory of the directory containing the test, so our data directory should be tests, not in the top level. Feel free to move it as part of this pr!

mwcraig · 2015-09-08T18:50:25Z

@indiajoe -- @crawfordsm and I talked today and we think it makes the most sense to mark these tests as xfail on numpy versions less than 1.9, which are the only places it is failing.

You can take a look here to see how to do that in pytest: https://pytest.org/latest/skipping.html

There some significant changes to masked arrays going from numpy 1.8 to 1.9, and we already recommend that people use 1.9 if at all possible.

indiajoe · 2015-09-08T22:31:34Z

I have added xfail and also moved data/ to tests/data/

mwcraig · 2015-09-09T01:46:20Z

@crawfordsm -- I'll leave this one to you to merge...I'm fine with it, but you have more experience with this part of the code.

Combiner_fits for large number of images with limited memory

indiajoe added 6 commits June 3, 2015 05:03

First outline for the combiner helper function . Note: Untested.

060f046

Write mask and uncertanity only if present; Also return the combined …

3938417

…ccd object

Print the number of chunks created

2dc0ca3

Replace the doc strings of methods in Combiner_fits with correspondin…

776a7f4

…g onces from Combiner

Scale callable functions are to be applied on entire image

8141929

Removing inline comments to be consistent with coding style

3fed011

indiajoe reviewed Jun 4, 2015
View reviewed changes

Convert the memory conscious combiner class into a convenience function

d1928f2

indiajoe added 2 commits August 8, 2015 16:45

Neater sanity check at the beginning

2520554

adding size of uncertanity, mask and flag into total image size

d344f96

crawfordsm reviewed Aug 18, 2015
View reviewed changes

Convert variables with upper case name to lower case name

abdd33a

indiajoe added 2 commits September 6, 2015 04:47

Adding few tests for the new combine function

fcd51b0

Sample fits file for the tests

de9f62d

indiajoe added 2 commits September 6, 2015 19:59

Add array size only when array is not None ; also removed two typo

1ee0a24

Corrected test calls with proper filename call, as well as array asse…

6b52cfb

…rtion

indiajoe added 3 commits September 9, 2015 03:20

Added xfail to expect errors on numpy version < 1.9

8ef9a1a

Moving data/ to tests/data

8ce0552

Used astropy.utils.compat.NUMPY_LT_1_9 to control xfail

5d2c715

crawfordsm added a commit that referenced this pull request Sep 9, 2015

Merge pull request #218 from indiajoe/CombinerForLargeImages

5481bb6

Combiner_fits for large number of images with limited memory

crawfordsm merged commit 5481bb6 into astropy:master Sep 9, 2015

indiajoe deleted the CombinerForLargeImages branch September 11, 2015 09:25

This was referenced Oct 20, 2015

Combiner memory consumption #176

Closed

ccdproc.combine #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combiner_fits for large number of images with limited memory #218

Combiner_fits for large number of images with limited memory #218

indiajoe commented Jun 4, 2015

indiajoe Jun 4, 2015

crawfordsm commented Jun 4, 2015

indiajoe commented Jun 5, 2015

crawfordsm commented Jun 5, 2015

indiajoe commented Jun 5, 2015

mwcraig commented Jul 16, 2015

indiajoe commented Jul 16, 2015

mwcraig commented Jul 16, 2015

mwcraig commented Aug 4, 2015

indiajoe commented Aug 6, 2015

mwcraig commented Aug 6, 2015

indiajoe commented Aug 7, 2015

crawfordsm commented Aug 7, 2015

indiajoe commented Aug 8, 2015

crawfordsm Aug 18, 2015

crawfordsm commented Aug 18, 2015

crawfordsm commented Aug 18, 2015

indiajoe commented Aug 18, 2015

crawfordsm commented Aug 18, 2015

indiajoe commented Sep 5, 2015

mwcraig commented Sep 6, 2015

indiajoe commented Sep 6, 2015

mwcraig commented Sep 8, 2015

mwcraig commented Sep 8, 2015

indiajoe commented Sep 8, 2015

mwcraig commented Sep 9, 2015

Combiner_fits for large number of images with limited memory #218

Combiner_fits for large number of images with limited memory #218

Conversation

indiajoe commented Jun 4, 2015

indiajoe Jun 4, 2015

Choose a reason for hiding this comment

crawfordsm commented Jun 4, 2015

indiajoe commented Jun 5, 2015

crawfordsm commented Jun 5, 2015

indiajoe commented Jun 5, 2015

mwcraig commented Jul 16, 2015

indiajoe commented Jul 16, 2015

mwcraig commented Jul 16, 2015

mwcraig commented Aug 4, 2015

indiajoe commented Aug 6, 2015

mwcraig commented Aug 6, 2015

indiajoe commented Aug 7, 2015

crawfordsm commented Aug 7, 2015

indiajoe commented Aug 8, 2015

crawfordsm Aug 18, 2015

Choose a reason for hiding this comment

crawfordsm commented Aug 18, 2015

crawfordsm commented Aug 18, 2015

indiajoe commented Aug 18, 2015

crawfordsm commented Aug 18, 2015

indiajoe commented Sep 5, 2015

mwcraig commented Sep 6, 2015

indiajoe commented Sep 6, 2015

mwcraig commented Sep 8, 2015

mwcraig commented Sep 8, 2015

indiajoe commented Sep 8, 2015

mwcraig commented Sep 9, 2015