Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Intel deflater + long reads data = intermittent corrupt bams #99
@kvg reports that running the Intel deflater via GATK on long reads data intermittently produces corrupt bam outputs. His specific use case is sharding a single unaligned bam file into multiple smaller bams. Running with the JDK deflater (--use-jdk-deflater) appears to resolve the issue.
Example error when trying to read a corrupt shard (reading with htsjdk produces the same error):
There may be a bug in https://github.com/Intel-HLS/GKL/blob/master/src/main/native/compression/IntelDeflater.cc, perhaps triggered when a read spans many compressed blocks.
This bug is also tracked in the GATK repo: broadinstitute/gatk#5798