New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient bad GZIP header bug when loading BGZF FASTQ #1658

Closed
fnothaft opened this Issue Aug 4, 2017 · 2 comments

Comments

Projects
1 participant
@fnothaft
Member

fnothaft commented Aug 4, 2017

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 27, 10.126.251.149, executor 0): htsjdk.samtools.SAMFormatException: Invalid GZIP header
	at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:121)
	at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
	at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:533)
	at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:515)
	at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:451)
	at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:441)
	at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:194)
	at org.seqdoop.hadoop_bam.util.BGZFSplitCompressionInputStream.readWithinBlock(BGZFSplitCompressionInputStream.java:81)
	at org.seqdoop.hadoop_bam.util.BGZFSplitCompressionInputStream.read(BGZFSplitCompressionInputStream.java:48)
	at java.io.InputStream.read(InputStream.java:101)
	at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:370)
	at org.bdgenomics.adam.io.FastqRecordReader.positionAtFirstRecord(FastqRecordReader.java:244)
	at org.bdgenomics.adam.io.FastqRecordReader.<init>(FastqRecordReader.java:175)
	at org.bdgenomics.adam.io.SingleFastqInputFormat$SingleFastqRecordReader.<init>(SingleFastqInputFormat.java:53)
	at org.bdgenomics.adam.io.SingleFastqInputFormat.createRecordReader(SingleFastqInputFormat.java:112)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:178)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:177)
	at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)

@fnothaft fnothaft added the bug label Aug 4, 2017

@fnothaft fnothaft added this to the 0.23.0 milestone Aug 4, 2017

@fnothaft fnothaft self-assigned this Aug 4, 2017

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 10, 2017

Member

An additional ask is to make sure that we have a flag to disable splitting.

Member

fnothaft commented Aug 10, 2017

An additional ask is to make sure that we have a flag to disable splitting.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 11, 2017

Member

This was resolved upstream in Hadoop-BAM.

Member

fnothaft commented Oct 11, 2017

This was resolved upstream in Hadoop-BAM.

@fnothaft fnothaft closed this Oct 11, 2017

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment