Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with astropy.io.fits and large arrays #839

Merged
merged 2 commits into from
Jul 22, 2013
Merged

Conversation

embray
Copy link
Member

@embray embray commented Jul 19, 2013

I'm seeing the following issue with astropy.io.fits:

In [1]: import numpy as np

In [2]: from astropy.io import fits

In [3]: x = np.random.random((1e9))

In [4]: fits.writeto('test.fits', x, clobber=True)
WARNING: Overwriting existing file 'test.fits'. [astropy.io.fits.hdu.hdulist]

In [5]: d = fits.getdata('test.fits', memmap=False)
WARNING: File may have been truncated: actual file length (3705036224) is smaller than the expected size (8000003520) [astropy.io.fits.file]

In [6]: d = fits.getdata('test.fits', memmap=True)
ERROR: ValueError: mmap length is greater than file size [numpy.core.memmap]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-3493dc939351> in <module>()
----> 1 d = fits.getdata('test.fits', memmap=True)

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/astropy-0.3.dev3272-py3.2-macosx-10.8-x86_64.egg/astropy/io/fits/convenience.py in getdata(filename, *args, **kwargs)
    186     hdulist, extidx = _getext(filename, mode, *args, **kwargs)
    187     hdu = hdulist[extidx]
--> 188     data = hdu.data
    189     if data is None and extidx == 0:
    190         try:

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/astropy-0.3.dev3272-py3.2-macosx-10.8-x86_64.egg/astropy/utils/misc.py in __get__(self, obj, owner)
    259         key = self._fget.__name__
    260         if key not in obj.__dict__:
--> 261             val = self._fget(obj)
    262             obj.__dict__[key] = val
    263             return val

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/astropy-0.3.dev3272-py3.2-macosx-10.8-x86_64.egg/astropy/io/fits/hdu/image.py in data(self)
    201             return
    202 
--> 203         data = self._get_scaled_image_data(self._datLoc, self.shape)
    204         self._update_header_scale_info(data.dtype)
    205 

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/astropy-0.3.dev3272-py3.2-macosx-10.8-x86_64.egg/astropy/io/fits/hdu/image.py in _get_scaled_image_data(self, offset, shape)
    521         code = _ImageBaseHDU.NumCode[self._orig_bitpix]
    522 
--> 523         raw_data = self._get_raw_data(shape, code, offset)
    524         raw_data.dtype = raw_data.dtype.newbyteorder('>')
    525 

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/astropy-0.3.dev3272-py3.2-macosx-10.8-x86_64.egg/astropy/io/fits/hdu/base.py in _get_raw_data(self, shape, code, offset)
    366                               offset=offset)
    367         else:
--> 368             return self._file.readarray(offset=offset, dtype=code, shape=shape)
    369 
    370     # TODO: Rework checksum handling so that it's not necessary to add a

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/astropy-0.3.dev3272-py3.2-macosx-10.8-x86_64.egg/astropy/io/fits/py3compat.py in readarray(self, size, offset, dtype, shape)
    171             if isinstance(dtype, numpy.dtype):
    172                 dtype = _fix_dtype(dtype)
--> 173             return old_File.readarray(self, size, offset, dtype, shape)
    174         readarray.__doc__ = old_File.readarray.__doc__
    175     file._File = _File

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/astropy-0.3.dev3272-py3.2-macosx-10.8-x86_64.egg/astropy/io/fits/file.py in readarray(self, size, offset, dtype, shape)
    270             return Memmap(self.__file, offset=offset,
    271                           mode=MEMMAP_MODES[self.mode], dtype=dtype,
--> 272                           shape=shape).view(np.ndarray)
    273         else:
    274             count = reduce(lambda x, y: x * y, shape)

/Volumes/Raptor/Library/Python/3.2/lib/python/site-packages/numpy/core/memmap.py in __new__(subtype, filename, dtype, mode, offset, shape, order)
    251             bytes -= start
    252             offset -= start
--> 253             mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
    254         else:
    255             mm = mmap.mmap(fid.fileno(), bytes, access=acc)

ValueError: mmap length is greater than file size

This is on MacOS 10.8 with 64Gb of RAM.

@iguananaut - I know you want issues to be opened on the pyfits tracker, but this is an issue that IMHO should be fixed in 0.2.1 (if it's an easy fix) so I need to open it here so we can keep track of it.

@astrofrog
Copy link
Member Author

I'm also seeing issues with a table that is valid between memory mapping and no memory mapping:

In [3]: fits.getdata('combined_i1.fits')[-1]
Out[3]: 14.259655

In [4]: fits.getdata('combined_i1.fits', memmap=False)[-1]
Out[4]: 0.0

But the table is 6Gb, so not easy to share (nevertheless I can put it online if needed)

@embray
Copy link
Member

embray commented Mar 1, 2013

I think I've seen this before. First of all the entire file isn't being written in the first place. I think this is that thing where it keeps writing and then the file pointer overflows. It was a bug in Numpy that I fixed at some point...

@embray
Copy link
Member

embray commented Mar 1, 2013

Though I thought PyFITS had a workaround for it too, so I'll have to check on that as well.

@embray
Copy link
Member

embray commented Mar 1, 2013

Well I'm not getting this on Linux with either Numpy 1.6.2 or 1.7.0. So once again the problem seems to be something particular with OSX 💢

@embray
Copy link
Member

embray commented Mar 1, 2013

I remember now, this was the issue: http://projects.scipy.org/numpy/ticket/2114 Yay OSX \o/

@embray
Copy link
Member

embray commented Mar 1, 2013

Yup. That's gotta be it:

>>> int(8e9) & 0xffffffff
3705032704

Now I'm just concerned because I could have sworn I put a workaround to this in pyfits. I thought there was even a regression test, but now I can't find it...

@embray
Copy link
Member

embray commented Jul 19, 2013

I've attached a workaround for this issue. With this fix the test given in the original issue now works as expected. Some more complex tests work too.

I have not added a unit test for this issue because it takes a little time depending on filesystem I/O throughput and requires some 4GB of disk space. Perhaps in the future we should consider adding a test marker for "big" tests that are normally skipped by default.

…o removed a Sphinx-specific directive from the changelog (just for now--I need to revist whether or not we want to be able to render the changelog without Sphinx).
@astrofrog
Copy link
Member Author

This works for me! Feel free to merge :)

@embray
Copy link
Member

embray commented Jul 22, 2013

Thanks--wasn't sure if I'd hear back from you on this or not since I know you were on vacation. So I was leaving it open 'til the last minute.

embray added a commit that referenced this pull request Jul 22, 2013
Issue with astropy.io.fits and large arrays
@embray embray merged commit 893063e into astropy:master Jul 22, 2013
@embray embray deleted the issue-839 branch July 22, 2013 14:27
@astrofrog
Copy link
Member Author

@taldcroft is on vacation, not me :)

@embray
Copy link
Member

embray commented Jul 22, 2013

Damn! Getting my Toms mixed up 😫

embray added a commit that referenced this pull request Jul 22, 2013
Issue with astropy.io.fits and large arrays
Conflicts:

	CHANGES.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants