Write compressed ascii outputs #2216

richardjgowers · 2019-03-01T19:57:02Z

We can currently read ascii files that are compressed (eg mda.Universe('this.pdb.gz')). It would be cool if u.atoms.write('new.pdb.gz') also worked (ie wrote a pdb file that was compressed with gz).

I'm not 100% sure how this should work, but one idea is to add a GZWriter which acts as a wrapper, calling the "real" format Writer and then compressing whatever was created. So calling chain would be

u.atoms.write('something.pdb.gz') makes a GZWriter
GZWriter makes a PDBWriter
- maybe GZWriter can get the child Writer to write to a stream to lessen I/O operations?
PDBWriter does its thing (oblivious to whats happening)
GZWriter then compresses the PDB file
- (or compresses the stream and writes that?)
????
Profit

Other things to consider

check that compression makes sense (ie only ascii formats, rather than letting people try and compress a dcd)
making sure errors from child writers are properly passed along (ie the wrapper is transparent where needed)
other compression formats? (and generic CompressionWriter?)
testssss

The text was updated successfully, but these errors were encountered:

fenilsuchak · 2019-03-08T16:03:48Z

Hello, I am new to this organization. I would like to contribute to the project "Making Cython Ascii Parsers" for GSOC 2019. I am looking for a starter issue to familiarize myself with the codebase. Can I take up this issue?

richardjgowers · 2019-03-08T16:06:24Z

@Fenil3510 sure yeah. I think the steps above should work, so I'd follow them through to get an idea of what needs doing here.

fenilsuchak · 2019-03-09T07:39:32Z

I'll send an initial PR as soon as possible. Thank you

jbarnoud · 2019-03-12T12:06:39Z

Don't we use openany in the PDB writer? If so, it is more a matter of selecting the right writer despite the .gz extension isn't it?

fenilsuchak · 2019-03-12T12:28:15Z

@jbarnoud yes there is openany from util.py used in PDBwriter. I too was thinking along similar lines. Do you mean that changes in openany(anyopen specifically) function to handle .gz extension would do the job?

jbarnoud · 2019-03-12T12:49:01Z

@Fenil3510 I did not look at the code, but, if I recall correctly, openany can already open a compressed file transparently. The issue, I guess, is that the method that decides what writer to use, does not go passed the .gz extension. From my recollection, this is where to make changes.

fenilsuchak · 2019-03-12T14:03:31Z

@jbarnoud, yes precisely that is what is happening. The get_writer() function searches suitable writer based on the name extension in the argument u.atom.write(). But when the argument is passed as something.pdb.gz, as there is no GZ writer it raises an error that no writer found. So I guess the fix would be to change the code such that it calls for writer that is before .gz ie. PDB in this case and openany would handle the compression part. I have done some hacks to check if this works and it does. So this would be the way to go right?

jbarnoud · 2019-03-12T14:28:47Z

Sounds good, indeed. Le 12 mars 2019 14:03, GeneX <notifications@github.com> a écrit :@jbarnoud, yes precisely that is what is happening. The get_writer() function searches suitable writer based on the name extension in the argument u.atom.write(). But when the argument is passed as .gz, it raises an error that no writer found. So I guess the fix would be to change the code such that it calls for writer that is before .gz ie. PDB in this case and openany would handle the compression part. I have done some hacks to check if this works and it does. So this would be the way to go right? —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

fenilsuchak · 2019-03-12T14:40:09Z

Ok @jbarnoud. I'll put a PR latest by tomorrow evening. Thanks.

Writes compressed output of given format (fixes #2216)

richardjgowers added Component-Writers proposal labels Mar 1, 2019

This was referenced Mar 16, 2019

Master #2220

Closed

Writes compressed output of given format #2221

Merged

Luthaf mentioned this issue Mar 28, 2019

Add Chemfiles as a coordinate reader/writer #1862

Merged

6 tasks

orbeckst added enhancement and removed proposal labels Apr 3, 2019

orbeckst closed this as completed in #2221 Apr 5, 2019

orbeckst added a commit that referenced this issue Apr 5, 2019

Merge pull request #2221 from Fenil3510/develop

1be9f73

Writes compressed output of given format (fixes #2216)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write compressed ascii outputs #2216

Write compressed ascii outputs #2216

richardjgowers commented Mar 1, 2019

fenilsuchak commented Mar 8, 2019

richardjgowers commented Mar 8, 2019

fenilsuchak commented Mar 9, 2019

jbarnoud commented Mar 12, 2019

fenilsuchak commented Mar 12, 2019 •

edited

jbarnoud commented Mar 12, 2019

fenilsuchak commented Mar 12, 2019 •

edited

jbarnoud commented Mar 12, 2019 via email

fenilsuchak commented Mar 12, 2019

Write compressed ascii outputs #2216

Write compressed ascii outputs #2216

Comments

richardjgowers commented Mar 1, 2019

fenilsuchak commented Mar 8, 2019

richardjgowers commented Mar 8, 2019

fenilsuchak commented Mar 9, 2019

jbarnoud commented Mar 12, 2019

fenilsuchak commented Mar 12, 2019 • edited

jbarnoud commented Mar 12, 2019

fenilsuchak commented Mar 12, 2019 • edited

jbarnoud commented Mar 12, 2019 via email

fenilsuchak commented Mar 12, 2019

fenilsuchak commented Mar 12, 2019 •

edited

fenilsuchak commented Mar 12, 2019 •

edited