New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use mmap directly for memory-mapped FITS files #7597
Conversation
Eliminates the use of numpy.memmap just to get an mmap, and a workaround for numpy 1.6
Hi there @mwcraig 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃. Everything looks good from my point of view! 👍 If there are any issues with this message, please report them here. |
@mwcraig - Do you think this should be backported? |
I don't think it needs to be backported; it isn't a bug and should have little or no affect on users. I thought about just removing the workaround for numpy 1.6, but it was easy enough to just use mmap. |
Restarted the windows build...don't see how this PR could have caused that failure... |
...but the windows failure happened again. Booting a VM. |
I'm surprised we still had a workaround for Numpy 1.6 - we only support 1.10+! |
I think @taldcroft had some issues with mmap on Windows and might have some insight here? In particular it might have issues above 2Gb if I recall? |
must have slipped through last time and the one before when we removed old numpy supports |
It's always a bit scary to change something like this in the FITS code 😨 😁 . It looks good (thanks for doing it!), and should change nothing, but I don't understand the Windows failure ... |
I can reproduce it locally, so I should able to fix it tomorrow. Still no idea why it happens... |
Since you are able to run this locally on Windows, please make sure you test it with a e.g. 20-30Gb file to be sure it works properly (since we don't test that in CI) |
Any request for file contents (multiple hdus, ...)? Or a link to an appropriate file |
Good news/bad news: Bad news: I still do not understand why this is failing the way it is. Good news: Going to the beginning or end of the file before creating the I haven't been able to reduce this to a simple example that could be reported as a CPython bug, unfortunately... |
Should have added that the reason using |
@mwcraig - have you been able to test this with a large file on Windows? |
@astrofrog — not yet, no. I need to venture into Windows later today to test install instructions for something else, will try it then. I assume it doesn’t matter what the contents of the monster FITS file are? |
@astrofrog -- this works on Windows with a large file. Created a file following these instructions, modified to be 50,000 x 50,000, which is roughly 20GB. Am able to open the fits file and access a piece |
@mwcraig - ok, great! |
Late to the conversation, and I don't have anything incredibly useful to add. I recall long ago that I had problems with memmap and files bigger than 2 Gb (the 32-bit address range). It might have been on Mac. Anyway, looks like things are somewhat under control, good job fighting the Windows! |
It seems like this PR also directly adresses #5797. |
I also tested it with a 10GB file on Windows without problems. It also worked in |
#5797 is marked as a bug, should this be backported with change log? |
The relevant deprecation was reverted as far as i know so it's not actually a bug anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup. Just one comment.
astropy/io/fits/file.py
Outdated
@@ -63,6 +62,11 @@ | |||
MEMMAP_MODES = {'readonly': 'c', 'copyonwrite': 'c', 'update': 'r+', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need these now that np.memmap
is not used any more?
@mhvk -- those modes are no longer necessary, have removed them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, looks all fine now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too, nice cleanup.
I think this will also make it possible to fix #1380 |
Currently, when a FITS file is opened with
memmap=True
, the memory map is created by making anumpy.memmap
, copying the underlying Pythonmmap
from a private attribute of the numpy objet and deleting thenumpy.memmap
object.This PR replaces that code by opening the memmap directly. As a side effect, it also removes a workaround for numpy 1.6.
The relevant numpy code being replaced is at https://github.com/numpy/numpy/blob/master/numpy/core/memmap.py#L202. Current astropy code always passes in a file descriptor, always sets
offset=0
and never passes in a size.