Replies: 3 comments 2 replies
-
Basically all final is uncompressed as well. For light intermediate compression which does not cost CPU but saves disk I/O you can, for instance, use LZ4. |
Beta Was this translation helpful? Give feedback.
-
LZ4 and snappy are still good candidates and it would be great to write huge text-based output files in compressed (de-facto standard) format. |
Beta Was this translation helpful? Give feedback.
-
You can say different but 95% of software processing large FASTA files I have used in the past 10 years either support gzipped files directly or support reading from stdin, which allows decompressing on-the-fly. Raw reads are also typically compressed and gzip compression is already built into formats such as SAM. You don't have to make output compression mandatory, if you want to support software which does not support it. The overall reason to make use of compression, be it for intermediate (LZ4, snappy) or output (gzip), is to reduce the required disk storage to an estimated size of 10 to 20%. Depending on which system you run the software on, this can make a huge difference. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I see that you have already incorporated pigz, if available. You might also consider compressing reads while they are calculated using a pipe. That will also save you disk space and double disk I/O (saving plain reads, reading plain reads, saving compressed reads).
Best,
Johannes
Beta Was this translation helpful? Give feedback.
All reactions