Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mmap directly for memory-mapped FITS files #7597

Merged
merged 3 commits into from Jul 2, 2018

Conversation

mwcraig
Copy link
Member

@mwcraig mwcraig commented Jun 25, 2018

Currently, when a FITS file is opened with memmap=True, the memory map is created by making a numpy.memmap, copying the underlying Python mmap from a private attribute of the numpy objet and deleting the numpy.memmap object.

This PR replaces that code by opening the memmap directly. As a side effect, it also removes a workaround for numpy 1.6.

The relevant numpy code being replaced is at https://github.com/numpy/numpy/blob/master/numpy/core/memmap.py#L202. Current astropy code always passes in a file descriptor, always sets offset=0 and never passes in a size.

Eliminates the use of numpy.memmap just to get an mmap, and a workaround
for numpy 1.6
@astropy-bot
Copy link

astropy-bot bot commented Jun 25, 2018

Hi there @mwcraig 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

Everything looks good from my point of view! 👍

If there are any issues with this message, please report them here.

@bsipocz
Copy link
Member

bsipocz commented Jun 25, 2018

@mwcraig - Do you think this should be backported?

@mwcraig
Copy link
Member Author

mwcraig commented Jun 25, 2018

I don't think it needs to be backported; it isn't a bug and should have little or no affect on users.

I thought about just removing the workaround for numpy 1.6, but it was easy enough to just use mmap.

@mwcraig
Copy link
Member Author

mwcraig commented Jun 25, 2018

Restarted the windows build...don't see how this PR could have caused that failure...

@mwcraig
Copy link
Member Author

mwcraig commented Jun 25, 2018

...but the windows failure happened again. Booting a VM.

@astrofrog
Copy link
Member

I'm surprised we still had a workaround for Numpy 1.6 - we only support 1.10+!

@astrofrog
Copy link
Member

astrofrog commented Jun 25, 2018

I think @taldcroft had some issues with mmap on Windows and might have some insight here? In particular it might have issues above 2Gb if I recall?

@bsipocz
Copy link
Member

bsipocz commented Jun 25, 2018

I'm surprised we still had a workaround for Numpy 1.6 - we only support 1.10+!

must have slipped through last time and the one before when we removed old numpy supports

@saimn
Copy link
Contributor

saimn commented Jun 25, 2018

It's always a bit scary to change something like this in the FITS code 😨 😁 . It looks good (thanks for doing it!), and should change nothing, but I don't understand the Windows failure ...

@mwcraig
Copy link
Member Author

mwcraig commented Jun 25, 2018

I can reproduce it locally, so I should able to fix it tomorrow. Still no idea why it happens...

@astrofrog
Copy link
Member

Since you are able to run this locally on Windows, please make sure you test it with a e.g. 20-30Gb file to be sure it works properly (since we don't test that in CI)

@mwcraig
Copy link
Member Author

mwcraig commented Jun 26, 2018

Any request for file contents (multiple hdus, ...)? Or a link to an appropriate file

@mwcraig
Copy link
Member Author

mwcraig commented Jun 26, 2018

Good news/bad news:

Bad news: I still do not understand why this is failing the way it is.

Good news: Going to the beginning or end of the file before creating the mmap makes the error go away.

I haven't been able to reduce this to a simple example that could be reported as a CPython bug, unfortunately...

@mwcraig
Copy link
Member Author

mwcraig commented Jun 26, 2018

Should have added that the reason using numpy.memmap worked is that they f.seek(0, 2) (go to end of file) before creating their mmap.

@astrofrog
Copy link
Member

@mwcraig - have you been able to test this with a large file on Windows?

@mwcraig
Copy link
Member Author

mwcraig commented Jun 29, 2018

@astrofrog — not yet, no. I need to venture into Windows later today to test install instructions for something else, will try it then. I assume it doesn’t matter what the contents of the monster FITS file are?

@mwcraig
Copy link
Member Author

mwcraig commented Jun 29, 2018

@astrofrog -- this works on Windows with a large file. Created a file following these instructions, modified to be 50,000 x 50,000, which is roughly 20GB. Am able to open the fits file and access a piece hdu.data[5000:5010, 28000:28010] without crashing my VM.

@astrofrog
Copy link
Member

@mwcraig - ok, great!

@taldcroft
Copy link
Member

Late to the conversation, and I don't have anything incredibly useful to add. I recall long ago that I had problems with memmap and files bigger than 2 Gb (the 32-bit address range). It might have been on Mac. Anyway, looks like things are somewhat under control, good job fighting the Windows!

@MSeifert04
Copy link
Contributor

It seems like this PR also directly adresses #5797.

@MSeifert04
Copy link
Contributor

MSeifert04 commented Jun 30, 2018

I also tested it with a 10GB file on Windows without problems. It also worked in update mode. 👍

@pllim
Copy link
Member

pllim commented Jun 30, 2018

#5797 is marked as a bug, should this be backported with change log?

@MSeifert04
Copy link
Contributor

MSeifert04 commented Jun 30, 2018

The relevant deprecation was reverted as far as i know so it's not actually a bug anymore.

@mwcraig
Copy link
Member Author

mwcraig commented Jun 30, 2018

I just opened #7611 to address #5797. I don't think this PR actually addresses that issue. I'm not sure the new PR correctly addresses the issue either...

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup. Just one comment.

@@ -63,6 +62,11 @@
MEMMAP_MODES = {'readonly': 'c', 'copyonwrite': 'c', 'update': 'r+',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need these now that np.memmap is not used any more?

@mwcraig
Copy link
Member Author

mwcraig commented Jul 2, 2018

@mhvk -- those modes are no longer necessary, have removed them.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, looks all fine now.

@mhvk mhvk added the 💤 merge-when-ci-passes Do not use: We have auto-merge option now. label Jul 2, 2018
Copy link
Contributor

@saimn saimn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too, nice cleanup.

@saimn saimn merged commit fd0a775 into astropy:master Jul 2, 2018
@embray
Copy link
Member

embray commented Sep 18, 2018

I think this will also make it possible to fix #1380

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants