SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader #3093

industrial-sloth · 2014-11-04T18:38:14Z

replaces the existing read() call with readFully().

srowen · 2014-11-04T18:41:40Z

core/src/main/scala/org/apache/spark/input/FixedLengthBinaryRecordReader.scala

@@ -115,7 +115,7 @@ private[spark] class FixedLengthBinaryRecordReader
    if (currentPosition < splitEnd) {
      // setup a buffer to store the record
      val buffer = recordValue.getBytes
-      fileInputStream.read(buffer, 0, recordLength)
+      fileInputStream.readFully(buffer)


Hm, but this also doesn't check how many bytes were actually read?

yep, but readFully will either block until the full number of bytes is available, or throw an error:
http://docs.oracle.com/javase/6/docs/api/java/io/DataInput.html#readFully(byte[])

AmplabJenkins · 2014-11-04T18:42:11Z

Can one of the admins verify this patch?

industrial-sloth · 2014-11-04T18:56:17Z

FWIW, we got bit by this over in the Thunder project, where we'd been using this class before Jeremy contributed it into Spark. The circumstance where it came up was doing big reads from S3, where the underlying buffer size was something like 16kb? I forget exactly... anyway, when the records didn't align with this buffer, badness happened.

Here's the equivalent change I made over there:
thunder-project/thunder@7ea31c8

The previous version on that patch shows my previous fix, which does check the number of bytes returned, before I realized that readFully() was a thing. Both that fix and the one here do resolve the original problem we encountered, though I'm having some trouble coming up with a clean unit test for it...

mateiz · 2014-11-05T00:28:08Z

Good catch, thanks! Can you check that this is the only version of read() in that code?

mateiz · 2014-11-05T00:28:39Z

BTW it would be good to open a JIRA issue for this on https://issues.apache.org/jira/browse/SPARK but unfortunately ASF JIRA seems to be down at the moment.

mateiz · 2014-11-05T00:28:51Z

Jenkins, this is ok to test

SparkQA · 2014-11-05T00:32:37Z

Test build #22904 has started for PR 3093 at commit a245c8a.

This patch merges cleanly.

SparkQA · 2014-11-05T01:54:25Z

Test build #22904 has finished for PR 3093 at commit a245c8a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class Params(
- class RDDFunctions[T: ClassTag](self: RDD[T]) extends Serializable
- class VectorUDT(UserDefinedType):
- class NullType(PrimitiveType):
- class UserDefinedType(DataType):
- case class ScalaUdfBuilder[T: TypeTag](f: AnyRef)

AmplabJenkins · 2014-11-05T01:54:29Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22904/
Test PASSed.

industrial-sloth · 2014-11-05T01:59:43Z

Thanks @mateiz - Oops, I actually did open https://issues.apache.org/jira/browse/SPARK-4222 for this, then (inexplicably...) left it off the commit message. Not sure offhand of how best to address that - would be happy to close this PR and open a new one with a proper message, let me know if that's how you'd like to proceed.

This was indeed the only use of read() in the FixedLengthBinary* files.

srowen · 2014-11-05T06:35:07Z

@industrial-sloth just change the PR title to "SPARK-4222 [CORE] ..." since that title will be the commit message when merged.

industrial-sloth · 2014-11-05T12:28:01Z

Thanks @srowen! Done.

mateiz · 2014-11-05T23:37:08Z

Cool, thanks. Will merge this soon.

replaces the existing read() call with readFully(). Author: industrial-sloth <industrial-sloth@users.noreply.github.com> Closes #3093 from industrial-sloth/branch-1.2-fixedLenRecRdr and squashes the following commits: a245c8a [industrial-sloth] use readFully in FixedLengthBinaryRecordReader

use readFully in FixedLengthBinaryRecordReader

a245c8a

srowen reviewed Nov 4, 2014
View reviewed changes

industrial-sloth changed the title ~~use readFully in FixedLengthBinaryRecordReader~~ SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader Nov 5, 2014

asfgit closed this in f37817b Nov 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader #3093

SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader #3093

industrial-sloth commented Nov 4, 2014

srowen Nov 4, 2014

industrial-sloth Nov 4, 2014

AmplabJenkins commented Nov 4, 2014

industrial-sloth commented Nov 4, 2014

mateiz commented Nov 5, 2014

mateiz commented Nov 5, 2014

mateiz commented Nov 5, 2014

SparkQA commented Nov 5, 2014

SparkQA commented Nov 5, 2014

AmplabJenkins commented Nov 5, 2014

industrial-sloth commented Nov 5, 2014

srowen commented Nov 5, 2014

industrial-sloth commented Nov 5, 2014

mateiz commented Nov 5, 2014

SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader #3093

SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader #3093

Conversation

industrial-sloth commented Nov 4, 2014

srowen Nov 4, 2014

Choose a reason for hiding this comment

industrial-sloth Nov 4, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Nov 4, 2014

industrial-sloth commented Nov 4, 2014

mateiz commented Nov 5, 2014

mateiz commented Nov 5, 2014

mateiz commented Nov 5, 2014

SparkQA commented Nov 5, 2014

SparkQA commented Nov 5, 2014

AmplabJenkins commented Nov 5, 2014

industrial-sloth commented Nov 5, 2014

srowen commented Nov 5, 2014

industrial-sloth commented Nov 5, 2014

mateiz commented Nov 5, 2014