Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient bad GZIP header bug when loading BGZF FASTQ #1658

Closed
fnothaft opened this issue Aug 4, 2017 · 2 comments
Closed

Transient bad GZIP header bug when loading BGZF FASTQ #1658

fnothaft opened this issue Aug 4, 2017 · 2 comments
Assignees
Labels
bug
Milestone

Comments

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Aug 4, 2017

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 27, 10.126.251.149, executor 0): htsjdk.samtools.SAMFormatException: Invalid GZIP header
	at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
	at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
	at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:533)
	at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:515)
	at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:451)
	at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:441)
	at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:194)
	at org.seqdoop.hadoop_bam.util.BGZFSplitCompressionInputStream.readWithinBlock(BGZFSplitCompressionInputStream.java:81)
	at org.seqdoop.hadoop_bam.util.BGZFSplitCompressionInputStream.read(BGZFSplitCompressionInputStream.java:48)
	at java.io.InputStream.read(InputStream.java:101)
	at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:370)
	at org.bdgenomics.adam.io.FastqRecordReader.positionAtFirstRecord(FastqRecordReader.java:244)
	at org.bdgenomics.adam.io.FastqRecordReader.<init>(FastqRecordReader.java:175)
	at org.bdgenomics.adam.io.SingleFastqInputFormat$SingleFastqRecordReader.<init>(SingleFastqInputFormat.java:53)
	at org.bdgenomics.adam.io.SingleFastqInputFormat.createRecordReader(SingleFastqInputFormat.java:112)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:178)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:177)
	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
@fnothaft fnothaft added the bug label Aug 4, 2017
@fnothaft fnothaft added this to the 0.23.0 milestone Aug 4, 2017
@fnothaft fnothaft self-assigned this Aug 4, 2017
@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Aug 10, 2017

An additional ask is to make sure that we have a flag to disable splitting.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Oct 11, 2017

This was resolved upstream in Hadoop-BAM.

@fnothaft fnothaft closed this Oct 11, 2017
@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.