-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combiner_fits for large number of images with limited memory #218
Conversation
…g onces from Combiner
|
||
**kwargs: Other keyword arguments for CCD Object's fits reader. | ||
""" | ||
def __init__(self, img_list, mem_limit=16e9, output_fits=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we ask user to input mem_limit in in MB or GB instead of bytes?
Thanks for the contribution @indiajoe. Sorry for the delay in feeback on this. I think in my head I had more of an idea of a convenience function rather than another class in order to do this so it might take a little time for me to process what you have done here. One of my first thought that I had thought was that it should inherit from Combiner and overwrite the functions that it uses and that it should also use the property decorator rather than the set_method naming scheme. However, before you proceed further on this, let's figure out what would work best. |
@crawfordsm Thanks for the feedback. A separate class was also my least favourite choice. Following are the reasons in the end I thought a separate class might be needed.
|
The more I think about this I think that a function would be a better way to go. It would probably provide more convenience to the user and could be easily replaced if/when a cython implementation of combine can be written. I don't think it would be too much work as most of it you have done already in run_on_all_tiles. I think it would primarily be taken that code and then most of the work would be in writing the doc_string. However, it would be nice if the function could also take in a list of CCDData objects as well as a list of filenames. And one more thing to consider in the estimate of the memory issue, is that if there are variance frames and bad pixel maps, they will further limit the amount of memory available. Let me know if it would be too much work to change this over to a function or if you have any questions, but I think the work that you have done so far is very helpful. |
@crawfordsm Sure, I shall convert them into a function. |
@indiajoe -- giving this a friendly bump to see if you have had time to convert to a function. |
@mwcraig Hi, Really sorry for the delay, got tied up with a paper submission. give me one more week, I shall do it and send pull request. |
No worries -- my PR #217 for just as long! |
@indiajoe -- another friendly bump, @crawfordsm and I were just looking through pull requests and are eager to add this! |
@mwcraig @crawfordsm Hi, sorry for the delay. I have just now converted the class into a convenience function and pushed to the branch. This is an untested code. Aim is to make sure we all agree with the general structure of the function. My only worry is that the long list of arguments will have to keep growing as Combiner class gets new features in future. |
@indiajoe -- thanks! I see what you mean about the long list of keywords, that seems fragile. Keeping in mind that I haven't really thought this through, what if we modified the current I'll defer to you and @crawfordsm on a final decision here since you have both thought about this much more than I have! |
@mwcraig If we want to modify |
Longer term, I think we do want to re-write the combiner class at a lower level so that it does handle the rejection and combining at a pixel to pixel level. Until that happens though, I think this is a pretty good compromise. I don't see the parameter list growing significantly until that time, so I think it is okay to keep the long list of keywords. I'll have some comments on the PR this weekend hopefully. |
@crawfordsm Sure. That sounds good.
|
#determine the number of chunks to split the images into | ||
no_chunks = int((size_of_an_img*no_of_img)/mem_limit)+1 | ||
print('Spliting each image into {1} chunks to limit memory usage to {0} bytes.'.format(self.mem_limit,no_chunks)) | ||
Xs, Ys = ccd_dummy.data.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change these to lower case xs, ys. Same goes for Xstep, Ystep below
@indiajoe Overall the code looks pretty good. I have a few small minor comments that just changed some of the names to match up with our style requirements. The other request would be to set it so that it could be passed a list of CCDData objects to eliminate the step of reading in new files if the files are already in memory. |
Of course, now that it is at a good spot, the other thing that I was hoping to ask you to add would be some unit tests. At the very least, anything that raises an error should have a test. I don't think you need to add tests for the combination as those are tested at a lower level. |
@crawfordsm I was wondering about how to make unit test for this function. I should look for some way to check memory usage during this function call, but I should not use external dependencies like the memory_profiler module. Right? |
First, I'd just add tests that check the inputs and the errors that are raised. Then it would probably be good to add a test with combining a small data set with and without tiling to confirm they produce the same results. I'm not sure if we do want to add something that checks the memory of a specific system as they might run into problems with different OSes and different memory limits, which might be beyond the scope of what we can test for. |
@crawfordsm Really sorry for the delay to write tests. This weekend I hope to finish all the test writing. |
@indiajoe -- the FITS file should already be included (it is in # Add to imports
from astropy.utils.data import get_pkg_data_filename
# In your test:
def test_my_great_test():
path_to_file = get_pkg_data_filename('../data/a8280271.fits') Should also note this is not at all obvious...took me ~45 min poking through the astropy source code to get this right (I hope). |
@mwcraig Thanks a lot for showing how to add the path to file. That part is working fine now. While testing I found something else strange. I suspect this is the reason why combined array are not matching even up to 4th decimal point on Travis CL. (On my machine, It matches fine up to 4 decimal places!). Another query, are we going to keep all the test data in |
@indiajoe -- interesting, no idea why there is precision loss. I wonder if maybe masking is being done in one case but not another? I think (though I'd have to double check) that it is the norm to put files in a |
@indiajoe -- @crawfordsm and I talked today and we think it makes the most sense to mark these tests as You can take a look here to see how to do that in pytest: https://pytest.org/latest/skipping.html There some significant changes to masked arrays going from numpy 1.8 to 1.9, and we already recommend that people use 1.9 if at all possible. |
I have added xfail and also moved data/ to tests/data/ |
@crawfordsm -- I'll leave this one to you to merge...I'm fine with it, but you have more experience with this part of the code. |
Combiner_fits for large number of images with limited memory
As discussed in #176 , Combiner_fits is a class which can be used to combine large number of images using all the features of Combiner (under the constrain of an user defined memory limit).