Off-by-1 error in FASTQ InputFormat start positioning code #1383

Closed
fnothaft opened this Issue Feb 6, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@fnothaft
Member

fnothaft commented Feb 6, 2017

See: https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/java/org/bdgenomics/adam/io/SingleFastqInputFormat.java#L65. Causes an AIOOBE:

java.lang.ArrayIndexOutOfBoundsException: 0
	at org.bdgenomics.adam.io.SingleFastqInputFormat$SingleFastqRecordReader.checkBuffer(SingleFastqInputFormat.java:66)
	at org.bdgenomics.adam.io.FastqRecordReader.positionAtFirstRecord(FastqRecordReader.java:169)
	at org.bdgenomics.adam.io.FastqRecordReader.<init>(FastqRecordReader.java:126)
	at org.bdgenomics.adam.io.SingleFastqInputFormat$SingleFastqRecordReader.<init>(SingleFastqInputFormat.java:49)
	at org.bdgenomics.adam.io.SingleFastqInputFormat.createRecordReader(SingleFastqInputFormat.java:107)
	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)

This bug is in both the single and interleaved code.

@fnothaft fnothaft added the bug label Feb 6, 2017

@fnothaft fnothaft added this to the 0.21.1 milestone Feb 6, 2017

@fnothaft fnothaft self-assigned this Feb 6, 2017

@A-Tsai

This comment has been minimized.

Show comment
Hide comment
@A-Tsai

A-Tsai Feb 17, 2017

Contributor

I found it happens when the first character of the block accessed by FileSplit is '\n'. At that scenario, bufferLength=0 and buffer is empty. It causes an exception due to ArrayIndexOutOfBounds because try to get buffer.getBytes()[0] in Line 66 on SingleFastqInputFormat.java.
if we can remove '=' in Line 65 of SingleFastqInputFormat.java when checking "bufferLength >= 0", the issue can be solved. I'm not sure it is a right solution or not, but it works on my pipeline.

Contributor

A-Tsai commented Feb 17, 2017

I found it happens when the first character of the block accessed by FileSplit is '\n'. At that scenario, bufferLength=0 and buffer is empty. It causes an exception due to ArrayIndexOutOfBounds because try to get buffer.getBytes()[0] in Line 66 on SingleFastqInputFormat.java.
if we can remove '=' in Line 65 of SingleFastqInputFormat.java when checking "bufferLength >= 0", the issue can be solved. I'm not sure it is a right solution or not, but it works on my pipeline.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 21, 2017

Member

Thank you for the feedback, @A-Tsai

Member

heuermh commented Feb 21, 2017

Thank you for the feedback, @A-Tsai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment