Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write compressed ascii outputs #2216

Closed
richardjgowers opened this issue Mar 1, 2019 · 9 comments
Closed

Write compressed ascii outputs #2216

richardjgowers opened this issue Mar 1, 2019 · 9 comments

Comments

@richardjgowers
Copy link
Member

We can currently read ascii files that are compressed (eg mda.Universe('this.pdb.gz')). It would be cool if u.atoms.write('new.pdb.gz') also worked (ie wrote a pdb file that was compressed with gz).

I'm not 100% sure how this should work, but one idea is to add a GZWriter which acts as a wrapper, calling the "real" format Writer and then compressing whatever was created. So calling chain would be

  • u.atoms.write('something.pdb.gz') makes a GZWriter
  • GZWriter makes a PDBWriter
    • maybe GZWriter can get the child Writer to write to a stream to lessen I/O operations?
  • PDBWriter does its thing (oblivious to whats happening)
  • GZWriter then compresses the PDB file
    • (or compresses the stream and writes that?)
  • ????
  • Profit

Other things to consider

  • check that compression makes sense (ie only ascii formats, rather than letting people try and compress a dcd)
  • making sure errors from child writers are properly passed along (ie the wrapper is transparent where needed)
  • other compression formats? (and generic CompressionWriter?)
  • testssss
@fenilsuchak
Copy link
Member

Hello, I am new to this organization. I would like to contribute to the project "Making Cython Ascii Parsers" for GSOC 2019. I am looking for a starter issue to familiarize myself with the codebase. Can I take up this issue?

@richardjgowers
Copy link
Member Author

@Fenil3510 sure yeah. I think the steps above should work, so I'd follow them through to get an idea of what needs doing here.

@fenilsuchak
Copy link
Member

I'll send an initial PR as soon as possible. Thank you

@jbarnoud
Copy link
Contributor

Don't we use openany in the PDB writer? If so, it is more a matter of selecting the right writer despite the .gz extension isn't it?

@fenilsuchak
Copy link
Member

fenilsuchak commented Mar 12, 2019

@jbarnoud yes there is openany from util.py used in PDBwriter. I too was thinking along similar lines. Do you mean that changes in openany(anyopen specifically) function to handle .gz extension would do the job?

@jbarnoud
Copy link
Contributor

@Fenil3510 I did not look at the code, but, if I recall correctly, openany can already open a compressed file transparently. The issue, I guess, is that the method that decides what writer to use, does not go passed the .gz extension. From my recollection, this is where to make changes.

@fenilsuchak
Copy link
Member

fenilsuchak commented Mar 12, 2019

@jbarnoud, yes precisely that is what is happening. The get_writer() function searches suitable writer based on the name extension in the argument u.atom.write(). But when the argument is passed as something.pdb.gz, as there is no GZ writer it raises an error that no writer found. So I guess the fix would be to change the code such that it calls for writer that is before .gz ie. PDB in this case and openany would handle the compression part. I have done some hacks to check if this works and it does. So this would be the way to go right?

@jbarnoud
Copy link
Contributor

jbarnoud commented Mar 12, 2019 via email

@fenilsuchak
Copy link
Member

Ok @jbarnoud. I'll put a PR latest by tomorrow evening. Thanks.

This was referenced Mar 16, 2019
orbeckst added a commit that referenced this issue Apr 5, 2019
Writes compressed output of given format (fixes #2216)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants