Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing gzip import issue 2774 #2783

Closed
wants to merge 24 commits into from
Closed

Conversation

abostroem
Copy link

Changed import from:
import gzip
to:
from ...utils.compat import gzip

ran setup.py test and all io/fits tests passed. I don't have Python 2.6 to test that this really fixed the problem, but it should based on the comments in issue #2774

@embray
Copy link
Member

embray commented Jul 23, 2014

Thanks for the fix @abostroem--unfortunately I don't think this quite did it. This is tricky because really there are two gzip modules in use and both need to be supported. This fix is in the right direction--astropy.io.fits.file should import gzip from the astropy utils. However, there are several places in that module that do a check like isinstance(fileobj, gzip.GzipFile). This gets ugly because that needs to return True for both the system gzip.GzipFile, and the copy from Astropy.

So really what this needs is something at the top like:

from ..utils.compat import gzip
import gzip as _system_gzip

_GZIP_FILE_TYPES = (gzip.GzipFile, _system_gzip.GzipFile)

and then replace all instances of isinstance(fileobj, gzip.GzipFile) with isinstance(fileobj, _GZIP_FILE_TYPES).

It's ugly but it might have to do.

@embray embray added this to the v0.4.1 milestone Jul 23, 2014
For clarity the astropy gzip is called _astropy_gzip
and the system gzip is called _system_gzip.

All gzip.x statements have been changed to use the
_astropy_gzip.

All isinstances statements check both
@abostroem
Copy link
Author

Thanks to Matt Craig I ran this in python 2.6, all the io/fits tests passed

@embray
Copy link
Member

embray commented Jul 24, 2014

Thanks, looks good. As one last thing would you mind implementing my suggestion of putting the two GzipFile types as a tuple in some global variable _GZIP_FILE_TYPES. This isn't merely cosmetic--it makes clear that this should be used when type checking this sort of thing and makes it harder to forget to implement this workaround.

Then please also add a changelog entry under the 0.4.1 heading mentioning that this was fixed, along with the issue number (following the example of other entries).

@embray
Copy link
Member

embray commented Jul 24, 2014

Oh, sorry, and if you don't mind could you add a regression test? Basically recreating the example from #2774 would suffice (just do the compression using the gzip module rather than calling an external program, but other than that it would just ensure the file an be read once zipped). If you don't have time I can add it.

@abostroem
Copy link
Author

@embray I'm a little unclear on what regression test you want me to add. Should I zip one of the files in data in place, try to read it and then delete it at the end of the test?

@abostroem
Copy link
Author

@embray I tried the above test using os.system call to gzip. Instead do you want me to do something like this?
import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

@embray
Copy link
Member

embray commented Jul 30, 2014

@abostroem Take a look at the tests starting around

https://github.com/astropy/astropy/blob/master/astropy/io/fits/tests/test_core.py#L588

which test other cases of reading and writing gzip files. This would be just one other such test but specifically implementing the case of #2774. In that issue @cdeil demonstrated the issue using a test data file that comes in the test suite--you can access it from within a test function using self.data('table.fits').

The trick would be to write the test first (without your fixes)--see that it fails--and then add your fixes on top and confirm that the test now passes.

@embray
Copy link
Member

embray commented Jul 30, 2014

This test is a particularly good model since it's also a regression test for a bug:

https://github.com/astropy/astropy/blob/master/astropy/io/fits/tests/test_core.py#L598

@abostroem
Copy link
Author

@embray I'm still a little confused, I think mostly because I'm not entirely clear on exactly what broke originally (what exactly is Table calling that is breaking), what _make_gzip does (how close to a real gzip object is this, is it a table?), what should be the length of the fits.open statement, and what does ignore_warnings do?

This was the original test I wrote before you responded. It passes and fails in the correct places:

from astropy.io import fits
import os
import shutil
import astropy
from astropy.table import Table

def test_gzip():
    data_dir = os.path.join(os.path.dirname(__file__), 'data')
    shutil.copy(os.path.join(data_dir, 'table.fits'), os.path.join(data_dir, 'table2.fits'))
    os.system('gzip '+os.path.join(data_dir, 'table2.fits'))
    try:
        Table.read(os.path.join(data_dir, 'table2.fits.gz'))
        passed = True
    except ValueError as e:
        passed = False
    os.remove(os.path.join(data_dir, 'table2.fits.gz'))
    if not passed:
        raise(e)

This is the function I wrote after your email, it passes in both 2.6 and 2.7. Suggestions?

    def test_gzip(self):
        gf = gzip.GzipFile(self._make_gzip_file())
        with ignore_warnings():
            assert len(Table.read(gf)) == 5

I sound say my original code was a stand alone file while the second one is included in test_core.py

@embray
Copy link
Member

embray commented Aug 5, 2014

The _make_gzip_file() helper that's used in some of the tests isn't applicable here since it doesn't gzip a file containing a table.

The original test you wrote is mostly fine. It should just make a few changes when integrating in with the the other tests:

  1. To get at the test files for the io.fits package there's a helper method self.data. Just do self.data('table.fits') to get the path to the data file (like the other tests do). Also there's no reason to copy the file since it's not going to be modified.
  2. Instead of using os.system to gzip the file use the gzip module in Python. In the tests you can use self.temp('filename.fits') to get a temporary filename to write to.
  3. No need for the try/except. If an error occurs it can just bubble up and the test runner will catch it and perform the necessary cleanup.

@embray embray modified the milestones: v0.4.1, v0.4.2 Aug 5, 2014
@abostroem
Copy link
Author

Is this what you had in mind?

def test_gzip_open_w_table_mod():
    temp_table = self.temp('table.fits.gz')
    with open(self.data('table.fits'), 'rb') as f:
        gz = gzip.open(temp_table, 'wb')
        gz.write(f.read())
        gz.close()
    Table.read(os.path.join('table.fits.gz'))

So now I have another problem. I've put the following on line 647 in test_core.py. I'm trying to run my test using astropy.test('io') but it doesn't seem to be picking it up my new test (when I delete the function is still runs the same number of tests and when I put something blatantly wrong in there it doesn't break (like assert 1 == 2). I did accidentally break another function so I know its looking at the right file. Do I have to do something else?

@embray
Copy link
Member

embray commented Aug 6, 2014

Try just running,

python setup.py test --package=io.fits

No need to run all the io tests. Trying to run astropy.test() might not be importing your development version of astropy.

@embray
Copy link
Member

embray commented Aug 26, 2014

The following should suffice:

    def test_read_open_astropy_gzip_file(self):
        """  
        Regression test for https://github.com/astropy/astropy/issues/2774

        This tests reading from a ``GzipFile`` object from Astropy's
        compatibility copy of the ``gzip`` module.
        """

        from ....utils.compat import gzip 

        gf = gzip.GzipFile(self._make_gzip_file())
        try: 
            assert len(fits.open(gf)) == 5 
        finally:
            gf.close()

This doesn't need to invoke Table.read directly since the real problem is supporting both the system gzip and astropy.utils.compat.gzip (there's no guarantee in the future as to how Table.read is implemented, so better to go directly to the issue).

@abostroem Just lemme know when you have time and I can walk you through rebasing and stuff if need be :)

@embray
Copy link
Member

embray commented Aug 26, 2014

I should add, earlier I wrote:

The _make_gzip_file() helper that's used in some of the tests isn't applicable here since it doesn't gzip a file containing a table.

My bad--there's nothing about this that is exclusive to tables. It's just that this code path was being invoked originally by way of Table.read. After giving this a closer look I realized that the test for this probably shouldn't involve Table.read at all.

@abostroem
Copy link
Author

@embray I removed a few trailing spaces from my test, let's see if it passes this time...

@embray
Copy link
Member

embray commented Aug 28, 2014

@abostroem What editor are you using? I would recommend finding out how to configure it to do this automatically; especially on .py files.

@abostroem
Copy link
Author

@embray I use TextWrangler, I've just enabled it to strip trailing whitespace.

Having done that, when I do a git status, neither the test_core.py nor file.py show up as modified.

Is there something that will tell me where the trailing whitespace is?

Additionally, astropy_helpers does show up as modified (even though I haven't touched it) and a diff shows this:

Kyras-MacBook-Pro:astropy abostroem$ git diff
diff --git a/astropy_helpers b/astropy_helpers
index c64c7b7..1bca250 160000
--- a/astropy_helpers
+++ b/astropy_helpers
@@ -1 +1 @@
-Subproject commit c64c7b7f0490caca2355b013d3ae76170a339da9
+Subproject commit 1bca250

Thoughts?

@embray
Copy link
Member

embray commented Aug 28, 2014

@abostroem You can look at the Travis build that failed here. It shows that there is trailing whitespace at test_core.py on line 658. Indeed if you look it's there after the try: statement.

Sorry about this--I know it's pretty annoying (and for once not even my idea!). But I think ultimately a good thing in the long term.

Don't worry about astropy_helpers--it's a submodule. When you switch between branches, if that submodule is at different commits on those branches it won't automatically move the submodule for you (hence showing it as "modified" which is really probably not the best description of what's actually going on). You can update the submodule to the correct version for the branch you're on by running git submodule update, but really in most cases it doesn't matter.

@abostroem
Copy link
Author

@embray Done!

@@ -204,6 +204,8 @@ Bug Fixes

- ``astropy.io.fits``

- Fix ability to read gzipped fits files using Python 2.6 [#2783]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one last thing--could you fix this to read something more specific to how the original problem manifested? Something like:

Fixed crash when reading gzip-compressed FITS tables through the Astropy
``Table`` interface. [#2783]

Then also rebase your branch on upstream/master and delete the merge commits from the history. See http://docs.astropy.org/en/stable/development/workflow/additional_git_topics.html?highlight=rebasing#rebasing-on-trunk (it talks about "trunk" for people coming from SVN, but it should probably say "master" instead of "trunk"--I'll see about rewriting that at some point).

@abostroem
Copy link
Author

@embray Let me know if this looks right. I followed the directions:
I made my changes on the master branch (learning experience).

git fetch upstream
git rebase -i upstream/master
git push origin master

Looking at the commits I didn't see any merge commits, so I didn't modify the commit history.

Since you said the real issue was not the call from Tables, do you want the commit message to reference the real issue. For now I copied what you wrote.

@embray
Copy link
Member

embray commented Aug 29, 2014

I have no idea what's going on anymore because now you're including commits from other issues as well. I'm just going to merge this manually.

@embray
Copy link
Member

embray commented Aug 29, 2014

Merged manually via 5d7acd6 Thanks again @abostroem , and indeed next time you'll probably have an easier time if you start a separate branch to work on the issue.

@embray embray closed this Aug 29, 2014
embray added a commit that referenced this pull request Aug 29, 2014
@embray
Copy link
Member

embray commented Aug 29, 2014

I'm not sure if it's something I did, or if it was a result of refreshing the page or what, but now the commit history in this PR doesn't look as messed up as it did a few minutes ago! Nevertheless, when I did the merge I also squashed a few commits down (typo fixes and things like that). There was also a tab character in the changelog :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants