Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient bad GZIP header bug when loading BGZF FASTQ #1658

Closed
fnothaft opened this issue Aug 4, 2017 · 2 comments
Closed

Transient bad GZIP header bug when loading BGZF FASTQ #1658

fnothaft opened this issue Aug 4, 2017 · 2 comments
Assignees
Labels
Milestone

Comments

@fnothaft
Copy link
Member

fnothaft commented Aug 4, 2017

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 27, 10.126.251.149, executor 0): htsjdk.samtools.SAMFormatException: Invalid GZIP header
	at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
	at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
	at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:533)
	at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:515)
	at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:451)
	at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:441)
	at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:194)
	at org.seqdoop.hadoop_bam.util.BGZFSplitCompressionInputStream.readWithinBlock(BGZFSplitCompressionInputStream.java:81)
	at org.seqdoop.hadoop_bam.util.BGZFSplitCompressionInputStream.read(BGZFSplitCompressionInputStream.java:48)
	at java.io.InputStream.read(InputStream.java:101)
	at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:370)
	at org.bdgenomics.adam.io.FastqRecordReader.positionAtFirstRecord(FastqRecordReader.java:244)
	at org.bdgenomics.adam.io.FastqRecordReader.<init>(FastqRecordReader.java:175)
	at org.bdgenomics.adam.io.SingleFastqInputFormat$SingleFastqRecordReader.<init>(SingleFastqInputFormat.java:53)
	at org.bdgenomics.adam.io.SingleFastqInputFormat.createRecordReader(SingleFastqInputFormat.java:112)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:178)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:177)
	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
@fnothaft fnothaft added the bug label Aug 4, 2017
@fnothaft fnothaft added this to the 0.23.0 milestone Aug 4, 2017
@fnothaft fnothaft self-assigned this Aug 4, 2017
@fnothaft
Copy link
Member Author

An additional ask is to make sure that we have a flag to disable splitting.

@fnothaft
Copy link
Member Author

This was resolved upstream in Hadoop-BAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant