Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

append mode doesn't work for gzipped files #7457

Closed
jotaylor opened this issue May 11, 2018 · 13 comments · Fixed by #7473
Closed

append mode doesn't work for gzipped files #7457

jotaylor opened this issue May 11, 2018 · 13 comments · Fixed by #7473
Labels

Comments

@jotaylor
Copy link

Astropy documentation seems to imply that the only type of zipped files that do not work with append or update modes are bzipped files:
http://docs.astropy.org/en/stable/io/fits/#working-with-compressed-files
However, opening in append mode does not work for gzipped files. Update mode does still work.

I'm using python 3.5.5 and astropy 3.0.2

@pllim
Copy link
Member

pllim commented May 11, 2018

Is it possible for you to provide a minimal snippet to reproduce your problem? If you can provide the data (or subset of it), that would be great too. Thanks.

@pllim pllim added the io.fits label May 11, 2018
@drdavella
Copy link
Contributor

I seem to remember that there was a good technical reason that gzip doesn't support append either, so maybe it's just a documentation issue. I might be wrong, though, so more investigation is needed.

@jotaylor
Copy link
Author

@pllim sure- This is one dataset I couldn't get it to work on, LBUI23V3Q:
http://archive.stsci.edu/cgi-bin/mastpreview?mission=hst&dataid=LBUI23V3Q
For a code example, I initially found this in calcos, and filed an issue there in which I referenced this problem (see above). Here is a summary I wrote of how calcos is writing an output file and how the output varies between append and update modes:

from astropy.io import fits
import os

indata = "lbui23v3q_rawacq.fits.gz"
outdata = "lbui23v3q_flt.fits.gz"

def write_primary(inhdu):
    if os.path.exists(outdata):
        os.remove(outdata)
    outhdu0 = fits.PrimaryHDU(header=inhdu[0].header)
    outhdulist = fits.HDUList(outhdu0)
    outhdulist.writeto(outdata)

def appendmode(inhdu):
    with fits.open(outdata, mode="readonly") as outhdulist:
        print("In readonly mode: ", outhdulist)

    with fits.open(outdata, mode="append") as outhdulist:
        print("In append mode: ", outhdulist)

        hdu1 = fits.ImageHDU(data=inhdu[1].data)
        outhdulist.append(hdu1)
        hdu2 = fits.ImageHDU(data=inhdu[2].data)
        outhdulist.append(hdu2)
        hdu3 = fits.ImageHDU(data=inhdu[3].data)
        outhdulist.append(hdu3)

        print("\t{}\n".format(outhdulist))

def updatemode(inhdu):
    with fits.open(outdata, mode="readonly") as outhdulist:
        print("In readonly mode: ", outhdulist)

    with fits.open(outdata, mode="update") as outhdulist:
        print("In update mode: ", outhdulist)

        hdu1 = fits.ImageHDU(data=inhdu[1].data)
        outhdulist.append(hdu1)
        hdu2 = fits.ImageHDU(data=inhdu[2].data)
        outhdulist.append(hdu2)
        hdu3 = fits.ImageHDU(data=inhdu[3].data)
        outhdulist.append(hdu3)

        print("\t{}\n".format(outhdulist))

if __name__ == "__main__":
    with fits.open(indata) as inhdu:
        write_primary(inhdu)
        appendmode(inhdu)

        write_primary(inhdu)
        updatemode(inhdu)

The output should be one PrimaryHDU followed by 3 ImageHDUs, but you can see that when you open the file in append mode, it doesn't end up this way.

In readonly mode:  [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810aa0e48>]
In append mode:  []
	[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810ba3f98>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810ba6eb8>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bae400>]

In readonly mode:  [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810bba828>]
In update mode:  [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810bc6630>]
	[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x810bc6630>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bbd208>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bbd080>, <astropy.io.fits.hdu.image.ImageHDU object at 0x810bbd2b0>]

This is not necessarily how I would choose to write an output file, but it is currently done this way in calcos accum.py. In the end, it's not a big deal- and I already submitted a pull request for calcos- but the astropy documentation should be updated.

@saimn
Copy link
Contributor

saimn commented May 14, 2018

Does it work before Astropy 3.0 ? (Could it be related to #6373 @drdavella ?)

@jotaylor
Copy link
Author

I tried with astropy 2.0.6 and 3.0.2 and both produce the same results.

@drdavella
Copy link
Contributor

@saimn yes it could be. I'll take another look.

@drdavella
Copy link
Contributor

drdavella commented May 14, 2018

@saimn I'm not exactly sure where this bug was introduced, but I wonder if you would be willing to accept as a solution an error when using append with gzip that says something like the following:

append mode is not supported for gzipped FITS files. Use update instead

Neither append nor update is supported for zip or bzip, and there are already appropriate error messages. I'm not sure why append doesn't work, but since update seems to serve the same purpose in this case I'm not sure we should jump through hoops to make append work as well.

@saimn
Copy link
Contributor

saimn commented May 14, 2018

I'm having a look, and the bug is probably not so difficult to fix. The issue is that for files opened in append mode, the length is 0:

In [1]: %astropy
Numpy 1.14.3
Astropy 3.1.dev21815
f
In [2]: fits.info('test.fits.gz')
Filename: test.fits.gz
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()      
  1                1 ImageHDU         6   (10,)   int64   
  2                1 ImageHDU         6   (20,)   int64   
  3                1 ImageHDU         6   (30,)   int64   

In [3]: hdu1 = fits.open('test.fits.gz', mode='update')

In [4]: len(hdu1)
Out[4]: 4

In [5]: hdu1 = fits.open('test.fits.gz', mode='append')

In [6]: len(hdu1)
Out[6]: 0

And this is because the gzip file is opened in write-only mode, so reading from it is not possible, and hence the length (number of HDUs) cannot be computed (which causes all HDUs to be added as PrimaryHDU):

In [7]: gzip.GzipFile(filename='test.fits.gz', mode='rb+').read(2880)
Out[7]: b'SIMPLE  =                    T / conforms to FITS standard                      BITPIX  =                    8 / array data type                                NAXIS   =                    0 / number of array dimensions                     EXTEND  =                    T                                                  END                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             '

In [8]: gzip.GzipFile(filename='test.fits.gz', mode='ab+').read(2880)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-8-814463253e61> in <module>()
----> 1 gzip.GzipFile(filename='test.fits.gz', mode='ab+').read(2880)

~/.pyenv/versions/3.6.3/lib/python3.6/gzip.py in read(self, size)
    273         if self.mode != READ:
    274             import errno
--> 275             raise OSError(errno.EBADF, "read() on write-only GzipFile object")
    276         return self._buffer.read(size)
    277 

OSError: [Errno 9] read() on write-only GzipFile object

Not sure of the best way to fix, opening in 'rb+' mode as for 'update' seems reasonable, @drdavella ?

@drdavella
Copy link
Contributor

@saimn I tried changing it to rb+ and wb+ and I was not able to get either way to work. Maybe I missed something though, so if you can get it to work it seems like a reasonable solution.

@saimn
Copy link
Contributor

saimn commented May 15, 2018

@drdavella - It works for me with rb+, maybe we are not testing the same way ? I will open a PR with my test so we can check with it fails on your side.

@saimn
Copy link
Contributor

saimn commented May 15, 2018

@drdavella - Ok I think I found the same issue as you ;). I did not get far enough previously, with rb+ the flush fails because the file is not readable, and with ab+ and wb+ it is not readable. It seems that gzip does not support a read+write mode actually :(
This could probably be solved by opening the file twice with different modes, but I'm now more inclined to accept your previous solution (Use update instead)!

@saimn
Copy link
Contributor

saimn commented May 15, 2018

Hmm, but 'update' uses also rb+, so how is it working? 😕

@saimn
Copy link
Contributor

saimn commented May 15, 2018

So, it seems that 'update' write to a temporary (writable) file before replacing the original one (_flush_update, _flush_resize), which could be used for 'append' I guess but it does not work... giving up for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants