Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combiner_fits for large number of images with limited memory #218

Merged
merged 17 commits into from
Sep 9, 2015

Conversation

indiajoe
Copy link
Contributor

@indiajoe indiajoe commented Jun 4, 2015

As discussed in #176 , Combiner_fits is a class which can be used to combine large number of images using all the features of Combiner (under the constrain of an user defined memory limit).


**kwargs: Other keyword arguments for CCD Object's fits reader.
"""
def __init__(self, img_list, mem_limit=16e9, output_fits=None, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we ask user to input mem_limit in in MB or GB instead of bytes?

@crawfordsm
Copy link
Member

Thanks for the contribution @indiajoe. Sorry for the delay in feeback on this. I think in my head I had more of an idea of a convenience function rather than another class in order to do this so it might take a little time for me to process what you have done here.

One of my first thought that I had thought was that it should inherit from Combiner and overwrite the functions that it uses and that it should also use the property decorator rather than the set_method naming scheme. However, before you proceed further on this, let's figure out what would work best.

@indiajoe
Copy link
Contributor Author

indiajoe commented Jun 5, 2015

@crawfordsm Thanks for the feedback. A separate class was also my least favourite choice. Following are the reasons in the end I thought a separate class might be needed.

  1. Combiner instance has many steps like sigma_clipping which are invoked before calling the actual combine method. If we ask user to provide all the parameters for all the steps in a single function call we will have to keep a long list of keywords unique.
    ( I could implement this too so that we can compare which will be better.)
  2. If we inherit the class, we will have to override pretty much every method and data attributes of Combiner, mainly because we need to invoke multiple instances of the Combiner object one after other,
    Keeping in view that Combiner class might get additional attributes in future, I was trying to keep the design of Combiner_fits in such a way that requires only minor additions.

@crawfordsm
Copy link
Member

The more I think about this I think that a function would be a better way to go. It would probably provide more convenience to the user and could be easily replaced if/when a cython implementation of combine can be written.

I don't think it would be too much work as most of it you have done already in run_on_all_tiles. I think it would primarily be taken that code and then most of the work would be in writing the doc_string.

However, it would be nice if the function could also take in a list of CCDData objects as well as a list of filenames. And one more thing to consider in the estimate of the memory issue, is that if there are variance frames and bad pixel maps, they will further limit the amount of memory available.

Let me know if it would be too much work to change this over to a function or if you have any questions, but I think the work that you have done so far is very helpful.

@indiajoe
Copy link
Contributor Author

indiajoe commented Jun 5, 2015

@crawfordsm Sure, I shall convert them into a function.

@mwcraig
Copy link
Member

mwcraig commented Jul 16, 2015

@indiajoe -- giving this a friendly bump to see if you have had time to convert to a function.

@indiajoe
Copy link
Contributor Author

@mwcraig Hi, Really sorry for the delay, got tied up with a paper submission. give me one more week, I shall do it and send pull request.

@mwcraig
Copy link
Member

mwcraig commented Jul 16, 2015

No worries -- my PR #217 for just as long!

@mwcraig
Copy link
Member

mwcraig commented Aug 4, 2015

@indiajoe -- another friendly bump, @crawfordsm and I were just looking through pull requests and are eager to add this!

@indiajoe
Copy link
Contributor Author

indiajoe commented Aug 6, 2015

@mwcraig @crawfordsm Hi, sorry for the delay. I have just now converted the class into a convenience function and pushed to the branch. This is an untested code. Aim is to make sure we all agree with the general structure of the function. My only worry is that the long list of arguments will have to keep growing as Combiner class gets new features in future.

@mwcraig
Copy link
Member

mwcraig commented Aug 6, 2015

@indiajoe -- thanks!

I see what you mean about the long list of keywords, that seems fragile.

Keeping in mind that I haven't really thought this through, what if we modified the current Combiner class so that it has one additional option that specifies whether it tiles or not? That would avoid the issues that came up when subclassing (i.e. having to override everything) and the issue of a long list of args, but presumably means changing nearly every method in Combiner.

I'll defer to you and @crawfordsm on a final decision here since you have both thought about this much more than I have!

@indiajoe
Copy link
Contributor Author

indiajoe commented Aug 7, 2015

@mwcraig If we want to modify Combiner itself. It will be some substantial change. For instance, Combiner has a method which clips certain pixels by masking them. Since we cannot keep all the images in memory we will have to write that mask to hardisk and then load it again when we run the actual combine function. This can make the entire combining process really slow.
The other option is to change in Combiner to call all preprocessing functions like the clipping function etc. only at the last moment when the data is being combined.

@crawfordsm
Copy link
Member

Longer term, I think we do want to re-write the combiner class at a lower level so that it does handle the rejection and combining at a pixel to pixel level. Until that happens though, I think this is a pretty good compromise. I don't see the parameter list growing significantly until that time, so I think it is okay to keep the long list of keywords. I'll have some comments on the PR this weekend hopefully.

@indiajoe
Copy link
Contributor Author

indiajoe commented Aug 8, 2015

@crawfordsm Sure. That sounds good.
I would like to add two more extra features to combiner which will add three more arguments after this pull request is over.

  1. zero : This is equivalent to zero shift option in IRAF imcombine. It is a very useful and frequently used feature while median combining NIR images.
  2. statsection : Another very useful feature in IRAF imcombine, since in most cases we want only a section of good part of detector to be used for calculating stats for scaling and zero shifting.
    I shall open separate pull request for these later.

#determine the number of chunks to split the images into
no_chunks = int((size_of_an_img*no_of_img)/mem_limit)+1
print('Spliting each image into {1} chunks to limit memory usage to {0} bytes.'.format(self.mem_limit,no_chunks))
Xs, Ys = ccd_dummy.data.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change these to lower case xs, ys. Same goes for Xstep, Ystep below

@crawfordsm
Copy link
Member

@indiajoe Overall the code looks pretty good. I have a few small minor comments that just changed some of the names to match up with our style requirements. The other request would be to set it so that it could be passed a list of CCDData objects to eliminate the step of reading in new files if the files are already in memory.

@crawfordsm
Copy link
Member

Of course, now that it is at a good spot, the other thing that I was hoping to ask you to add would be some unit tests. At the very least, anything that raises an error should have a test. I don't think you need to add tests for the combination as those are tested at a lower level.

@indiajoe
Copy link
Contributor Author

@crawfordsm I was wondering about how to make unit test for this function. I should look for some way to check memory usage during this function call, but I should not use external dependencies like the memory_profiler module. Right?

@crawfordsm
Copy link
Member

First, I'd just add tests that check the inputs and the errors that are raised. Then it would probably be good to add a test with combining a small data set with and without tiling to confirm they produce the same results.

I'm not sure if we do want to add something that checks the memory of a specific system as they might run into problems with different OSes and different memory limits, which might be beyond the scope of what we can test for.

@indiajoe
Copy link
Contributor Author

indiajoe commented Sep 5, 2015

@crawfordsm Really sorry for the delay to write tests. This weekend I hope to finish all the test writing.
I am getting an IOError while running the tests. I added the fits filename in ccdproc/tests/setup_package.py . Not sure I understand where I'm making mistake.

@mwcraig
Copy link
Member

mwcraig commented Sep 6, 2015

@indiajoe -- the FITS file should already be included (it is in ccdproc/data). The way to access it from the test suite is like this:

# Add to imports
from astropy.utils.data import get_pkg_data_filename

# In your test:
def test_my_great_test():
     path_to_file = get_pkg_data_filename('../data/a8280271.fits')

Should also note this is not at all obvious...took me ~45 min poking through the astropy source code to get this right (I hope).

@indiajoe
Copy link
Contributor Author

indiajoe commented Sep 6, 2015

@mwcraig Thanks a lot for showing how to add the path to file. That part is working fine now.

While testing I found something else strange.
Average of an array while using np.ma.average() is significantly different from .mean()!
For example incase of the fits file in data/
ccd.data.mean() and np.ma.average(ccd.data) gives 296.0 and 296.000315729 respectively.

I suspect this is the reason why combined array are not matching even up to 4th decimal point on Travis CL. (On my machine, It matches fine up to 4 decimal places!).
I need to trouble shoot, exactly which stage we are loosing precision.

Another query, are we going to keep all the test data in ccdproc/data/ ?
In other astropy packages, I thought the test data are usually kept in a files/ subdirectory of tests/.

@mwcraig
Copy link
Member

mwcraig commented Sep 8, 2015

@indiajoe -- interesting, no idea why there is precision loss. I wonder if maybe masking is being done in one case but not another?

I think (though I'd have to double check) that it is the norm to put files in a data subdirectory of the directory containing the test, so our data directory should be tests, not in the top level. Feel free to move it as part of this pr!

@mwcraig
Copy link
Member

mwcraig commented Sep 8, 2015

@indiajoe -- @crawfordsm and I talked today and we think it makes the most sense to mark these tests as xfail on numpy versions less than 1.9, which are the only places it is failing.

You can take a look here to see how to do that in pytest: https://pytest.org/latest/skipping.html

There some significant changes to masked arrays going from numpy 1.8 to 1.9, and we already recommend that people use 1.9 if at all possible.

@indiajoe
Copy link
Contributor Author

indiajoe commented Sep 8, 2015

I have added xfail and also moved data/ to tests/data/

@mwcraig
Copy link
Member

mwcraig commented Sep 9, 2015

@crawfordsm -- I'll leave this one to you to merge...I'm fine with it, but you have more experience with this part of the code.

crawfordsm added a commit that referenced this pull request Sep 9, 2015
Combiner_fits for large number of images with limited memory
@crawfordsm crawfordsm merged commit 5481bb6 into astropy:master Sep 9, 2015
@indiajoe indiajoe deleted the CombinerForLargeImages branch September 11, 2015 09:25
This was referenced Oct 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants