append mode doesn't work for gzipped files #7457

jotaylor · 2018-05-11T18:59:26Z

Astropy documentation seems to imply that the only type of zipped files that do not work with append or update modes are bzipped files:
http://docs.astropy.org/en/stable/io/fits/#working-with-compressed-files
However, opening in append mode does not work for gzipped files. Update mode does still work.

I'm using python 3.5.5 and astropy 3.0.2

pllim · 2018-05-11T21:28:43Z

Is it possible for you to provide a minimal snippet to reproduce your problem? If you can provide the data (or subset of it), that would be great too. Thanks.

drdavella · 2018-05-11T23:34:12Z

I seem to remember that there was a good technical reason that gzip doesn't support append either, so maybe it's just a documentation issue. I might be wrong, though, so more investigation is needed.

jotaylor · 2018-05-14T15:16:22Z

@pllim sure- This is one dataset I couldn't get it to work on, LBUI23V3Q:
http://archive.stsci.edu/cgi-bin/mastpreview?mission=hst&dataid=LBUI23V3Q
For a code example, I initially found this in calcos, and filed an issue there in which I referenced this problem (see above). Here is a summary I wrote of how calcos is writing an output file and how the output varies between append and update modes:

from astropy.io import fits
import os

indata = "lbui23v3q_rawacq.fits.gz"
outdata = "lbui23v3q_flt.fits.gz"

def write_primary(inhdu):
    if os.path.exists(outdata):
        os.remove(outdata)
    outhdu0 = fits.PrimaryHDU(header=inhdu[0].header)
    outhdulist = fits.HDUList(outhdu0)
    outhdulist.writeto(outdata)

def appendmode(inhdu):
    with fits.open(outdata, mode="readonly") as outhdulist:
        print("In readonly mode: ", outhdulist)

    with fits.open(outdata, mode="append") as outhdulist:
        print("In append mode: ", outhdulist)

        hdu1 = fits.ImageHDU(data=inhdu[1].data)
        outhdulist.append(hdu1)
        hdu2 = fits.ImageHDU(data=inhdu[2].data)
        outhdulist.append(hdu2)
        hdu3 = fits.ImageHDU(data=inhdu[3].data)
        outhdulist.append(hdu3)

        print("\t{}\n".format(outhdulist))

def updatemode(inhdu):
    with fits.open(outdata, mode="readonly") as outhdulist:
        print("In readonly mode: ", outhdulist)

    with fits.open(outdata, mode="update") as outhdulist:
        print("In update mode: ", outhdulist)

        hdu1 = fits.ImageHDU(data=inhdu[1].data)
        outhdulist.append(hdu1)
        hdu2 = fits.ImageHDU(data=inhdu[2].data)
        outhdulist.append(hdu2)
        hdu3 = fits.ImageHDU(data=inhdu[3].data)
        outhdulist.append(hdu3)

        print("\t{}\n".format(outhdulist))

if __name__ == "__main__":
    with fits.open(indata) as inhdu:
        write_primary(inhdu)
        appendmode(inhdu)

        write_primary(inhdu)
        updatemode(inhdu)

The output should be one PrimaryHDU followed by 3 ImageHDUs, but you can see that when you open the file in append mode, it doesn't end up this way.

In readonly mode:  [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810aa0e48>]
In append mode:  []
	[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810ba3f98>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810ba6eb8>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bae400>]

In readonly mode:  [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810bba828>]
In update mode:  [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810bc6630>]
	[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810bc6630>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bbd208>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bbd080>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bbd2b0>]

This is not necessarily how I would choose to write an output file, but it is currently done this way in calcos accum.py. In the end, it's not a big deal- and I already submitted a pull request for calcos- but the astropy documentation should be updated.

saimn · 2018-05-14T15:22:01Z

Does it work before Astropy 3.0 ? (Could it be related to #6373 @drdavella ?)

jotaylor · 2018-05-14T15:27:50Z

I tried with astropy 2.0.6 and 3.0.2 and both produce the same results.

drdavella · 2018-05-14T15:55:27Z

@saimn yes it could be. I'll take another look.

drdavella · 2018-05-14T17:56:48Z

@saimn I'm not exactly sure where this bug was introduced, but I wonder if you would be willing to accept as a solution an error when using append with gzip that says something like the following:

append mode is not supported for gzipped FITS files. Use update instead

Neither append nor update is supported for zip or bzip, and there are already appropriate error messages. I'm not sure why append doesn't work, but since update seems to serve the same purpose in this case I'm not sure we should jump through hoops to make append work as well.

saimn · 2018-05-14T22:16:04Z

I'm having a look, and the bug is probably not so difficult to fix. The issue is that for files opened in append mode, the length is 0:

In [1]: %astropy
Numpy 1.14.3
Astropy 3.1.dev21815
f
In [2]: fits.info('test.fits.gz')
Filename: test.fits.gz
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()      
  1                1 ImageHDU         6   (10,)   int64   
  2                1 ImageHDU         6   (20,)   int64   
  3                1 ImageHDU         6   (30,)   int64   

In [3]: hdu1 = fits.open('test.fits.gz', mode='update')

In [4]: len(hdu1)
Out[4]: 4

In [5]: hdu1 = fits.open('test.fits.gz', mode='append')

In [6]: len(hdu1)
Out[6]: 0

And this is because the gzip file is opened in write-only mode, so reading from it is not possible, and hence the length (number of HDUs) cannot be computed (which causes all HDUs to be added as PrimaryHDU):

In [7]: gzip.GzipFile(filename='test.fits.gz', mode='rb+').read(2880)
Out[7]: b'SIMPLE  =                    T / conforms to FITS standard                      BITPIX  =                    8 / array data type                                NAXIS   =                    0 / number of array dimensions                     EXTEND  =                    T                                                  END                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             '

In [8]: gzip.GzipFile(filename='test.fits.gz', mode='ab+').read(2880)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-8-814463253e61> in <module>()
----> 1 gzip.GzipFile(filename='test.fits.gz', mode='ab+').read(2880)

~/.pyenv/versions/3.6.3/lib/python3.6/gzip.py in read(self, size)
    273         if self.mode != READ:
    274             import errno
--> 275             raise OSError(errno.EBADF, "read() on write-only GzipFile object")
    276         return self._buffer.read(size)
    277 

OSError: [Errno 9] read() on write-only GzipFile object

Not sure of the best way to fix, opening in 'rb+' mode as for 'update' seems reasonable, @drdavella ?

drdavella · 2018-05-15T14:05:42Z

@saimn I tried changing it to rb+ and wb+ and I was not able to get either way to work. Maybe I missed something though, so if you can get it to work it seems like a reasonable solution.

saimn · 2018-05-15T19:51:49Z

@drdavella - It works for me with rb+, maybe we are not testing the same way ? I will open a PR with my test so we can check with it fails on your side.

saimn · 2018-05-15T21:55:38Z

@drdavella - Ok I think I found the same issue as you ;). I did not get far enough previously, with rb+ the flush fails because the file is not readable, and with ab+ and wb+ it is not readable. It seems that gzip does not support a read+write mode actually :(
This could probably be solved by opening the file twice with different modes, but I'm now more inclined to accept your previous solution (Use update instead)!

saimn · 2018-05-15T21:57:55Z

Hmm, but 'update' uses also rb+, so how is it working? 😕

saimn · 2018-05-15T22:16:50Z

So, it seems that 'update' write to a temporary (writable) file before replacing the original one (_flush_update, _flush_resize), which could be used for 'append' I guess but it does not work... giving up for now.

jotaylor mentioned this issue May 11, 2018

Calibrating gzipped ACCUM data exposes a bug in astropy spacetelescope/calcos#25

Closed

pllim added the io.fits label May 11, 2018

drdavella mentioned this issue May 16, 2018

Raise error when opening gzipped FITS file in append mode #7473

Merged

saimn closed this as completed in #7473 May 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

append mode doesn't work for gzipped files #7457

append mode doesn't work for gzipped files #7457

jotaylor commented May 11, 2018

pllim commented May 11, 2018

drdavella commented May 11, 2018

jotaylor commented May 14, 2018

saimn commented May 14, 2018

jotaylor commented May 14, 2018

drdavella commented May 14, 2018

drdavella commented May 14, 2018 •

edited

saimn commented May 14, 2018

drdavella commented May 15, 2018

saimn commented May 15, 2018

saimn commented May 15, 2018

saimn commented May 15, 2018

saimn commented May 15, 2018

append mode doesn't work for gzipped files #7457

append mode doesn't work for gzipped files #7457

Comments

jotaylor commented May 11, 2018

pllim commented May 11, 2018

drdavella commented May 11, 2018

jotaylor commented May 14, 2018

saimn commented May 14, 2018

jotaylor commented May 14, 2018

drdavella commented May 14, 2018

drdavella commented May 14, 2018 • edited

saimn commented May 14, 2018

drdavella commented May 15, 2018

saimn commented May 15, 2018

saimn commented May 15, 2018

saimn commented May 15, 2018

saimn commented May 15, 2018

drdavella commented May 14, 2018 •

edited