[SPARK-9358][SQL] Code generation for UnsafeRow joiner. #7821

rxin · 2015-07-31T07:48:28Z

This patch creates a code generated unsafe row concatenator that can be used to concatenate/join two UnsafeRows into a single UnsafeRow.

Since it is inherently hard to test these low level stuff, the test suites employ randomized testing heavily in order to guarantee correctness.

SparkQA · 2015-07-31T09:16:12Z

Test build #39190 has finished for PR 7821 at commit 3efda28.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MulticlassClassificationEvaluator (override val uid: String)
- class NaiveBayes(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol):
- class NaiveBayesModel(JavaModel):
- class MulticlassClassificationEvaluator(JavaEvaluator, HasLabelCol, HasPredictionCol):
- abstract class UnsafeRowConcat
- |class SpecificRowConat extends $

SparkQA · 2015-07-31T09:49:43Z

Test build #39192 has finished for PR 7821 at commit eacc9b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class UnsafeRowConcat
- |class SpecificRowConat extends $

SparkQA · 2015-07-31T18:37:37Z

Test build #1258 has finished for PR 7821 at commit eacc9b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-31T22:06:22Z

Test build #39279 has finished for PR 7821 at commit 5fab88c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class UnsafeRowConcat
- |class SpecificRowConat extends $
- class SpecificUnsafeProjection extends $

davies · 2015-07-31T23:04:32Z

.../src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeFixedWidthAggregationMap.java

-          return false;
-        }
-      } else if (!UnsafeRow.settableFieldTypes.contains(field.dataType())) {
+      if (!UnsafeRow.settableFieldTypes.contains(field.dataType())) {


call isFixedLength ?

oops yes - missed that

SparkQA · 2015-07-31T23:53:41Z

Test build #39284 has finished for PR 7821 at commit b6ff5a3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class UnsafeRowConcat
- |class SpecificRowConat extends $
- class SpecificUnsafeProjection extends $

davies · 2015-08-01T00:41:00Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateRowConcat.scala

+     """.stripMargin
+
+    // --------------------- copy fixed length portion from row 2 ----------------------- //
+    cursor += schema1.size * 8


Should we move this into above section?

SparkQA · 2015-08-01T00:47:15Z

Test build #39296 has finished for PR 7821 at commit db168db.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class UnsafeRowConcat
- |class SpecificRowConat extends $
- class SpecificUnsafeProjection extends $

davies · 2015-08-01T00:49:15Z

...yst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateRowConcat.scala

+       |// Copy variable length data for row2
+       |long numBytesBitsetAndFixedRow2 = ${(bitset2Words + schema2.size) * 8};
+       |long numBytesVariableRow2 = row2.getSizeInBytes() - numBytesBitsetAndFixedRow2;
+       |PlatformDependent.copyMemory(


We could skip the copy if no variable length

rxin · 2015-08-01T02:36:34Z

@davies I did everything except the following:

Seq.tabulate(outputBitsetWords) { i => 
  val bitset = if (i< bitset1Words) {
    getLong(obj1, i * 8)
  } else if (i == bitset1Words && bitset1Remainder > 0) {
    getLong(obj1, i* 8) | getLong(obj2, 0) >>> bitset1Remainder
  } else {
    getLong(obj2, i - bitset1Words) <<< bitset1Remainder | getLong(obj2 i - bitset1Words + 1) >>> bitset1Remainder
  }
  putLong(buf, i * 8, bitset)
}

I like the idea, but we can revisit this during QA to clean it up. Just have too many other things to do right now, and based on my experience, it takes a lot of time to get rewrite like this working. Filed a ticket to track it: https://issues.apache.org/jira/browse/SPARK-9518

davies · 2015-08-01T03:15:04Z

LGTM

SparkQA · 2015-08-01T04:04:44Z

Test build #39316 has finished for PR 7821 at commit 72c5d8e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-01T04:08:47Z

Test build #39317 has finished for PR 7821 at commit 8717f35.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…enerateUnsafeRowJoiner ## What changes were proposed in this pull request? This PR fixes a longstanding correctness bug in `GenerateUnsafeRowJoiner`. This class was introduced in #7821 (July 2015 / Spark 1.5.0+) and is used to combine pairs of UnsafeRows in TungstenAggregationIterator, CartesianProductExec, and AppendColumns. ### Bugs fixed by this patch 1. **Incorrect combining of null-tracking bitmaps**: when concatenating two UnsafeRows, the implementation "Concatenate the two bitsets together into a single one, taking padding into account". If one row has no columns then it has a bitset size of 0, but the code was incorrectly assuming that if the left row had a non-zero number of fields then the right row would also have at least one field, so it was copying invalid bytes and and treating them as part of the bitset. I'm not sure whether this bug was also present in the original implementation or whether it was introduced in #7892 (which fixed another bug in this code). 2. **Incorrect updating of data offsets for null variable-length fields**: after updating the bitsets and copying fixed-length and variable-length data, we need to perform adjustments to the offsets pointing the start of variable length fields's data. The existing code was _conditionally_ adding a fixed offset to correct for the new length of the combined row, but it is unsafe to do this if the variable-length field has a null value: we always represent nulls by storing `0` in the fixed-length slot, but this code was incorrectly incrementing those values. This bug was present since the original version of `GenerateUnsafeRowJoiner`. ### Why this bug remained latent for so long The PR which introduced `GenerateUnsafeRowJoiner` features several randomized tests, including tests of the cases where one side of the join has no fields and where string-valued fields are null. However, the existing assertions were too weak to uncover this bug: - If a null field has a non-zero value in its fixed-length data slot then this will not cause problems for field accesses because the null-tracking bitmap should still be correct and we will not try to use the incorrect offset for anything. - If the null tracking bitmap is corrupted by joining against a row with no fields then the corruption occurs in field numbers past the actual field numbers contained in the row. Thus valid `isNullAt()` calls will not read the incorrectly-set bits. The existing `GenerateUnsafeRowJoinerSuite` tests only exercised `.get()` and `isNullAt()`, but didn't actually check the UnsafeRows for bit-for-bit equality, preventing these bugs from failing assertions. It turns out that there was even a [GenerateUnsafeRowJoinerBitsetSuite](https://github.com/apache/spark/blob/03377d2522776267a07b7d6ae9bddf79a4e0f516/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerBitsetSuite.scala) but it looks like it also didn't catch this problem because it only tested the bitsets in an end-to-end fashion by accessing them through the `UnsafeRow` interface instead of actually comparing the bitsets' bytes. ### Impact of these bugs - This bug will cause `equals()` and `hashCode()` to be incorrect for these rows, which will be problematic in case`GenerateUnsafeRowJoiner`'s results are used as join or grouping keys. - Chained / repeated invocations of `GenerateUnsafeRowJoiner` may result in reads from invalid null bitmap positions causing fields to incorrectly become NULL (see the end-to-end example below). - It looks like this generally only happens in `CartesianProductExec`, which our query optimizer often avoids executing (usually we try to plan a `BroadcastNestedLoopJoin` instead). ### End-to-end test case demonstrating the problem The following query demonstrates how this bug may result in incorrect query results: ```sql set spark.sql.autoBroadcastJoinThreshold=-1; -- Needed to trigger CartesianProductExec create table a as select * from values 1; create table b as select * from values 2; SELECT t3.col1, t1.col1 FROM a t1 CROSS JOIN b t2 CROSS JOIN b t3 ``` This should return `(2, 1)` but instead was returning `(null, 1)`. Column pruning ends up trimming off all columns from `t2`, so when `t2` joins with another table this triggers the bitmap-copying bug. This incorrect bitmap is subsequently copied again when performing the final join, causing the final output to have an incorrectly-set null bit for the first field. ## How was this patch tested? Strengthened the assertions in existing tests in GenerateUnsafeRowJoinerSuite. Also verified that the end-to-end test case which uncovered this now passes. Author: Josh Rosen <joshrosen@databricks.com> Closes #20181 from JoshRosen/SPARK-22984-fix-generate-unsaferow-joiner-bitmap-bugs.

…enerateUnsafeRowJoiner ## What changes were proposed in this pull request? This PR fixes a longstanding correctness bug in `GenerateUnsafeRowJoiner`. This class was introduced in #7821 (July 2015 / Spark 1.5.0+) and is used to combine pairs of UnsafeRows in TungstenAggregationIterator, CartesianProductExec, and AppendColumns. ### Bugs fixed by this patch 1. **Incorrect combining of null-tracking bitmaps**: when concatenating two UnsafeRows, the implementation "Concatenate the two bitsets together into a single one, taking padding into account". If one row has no columns then it has a bitset size of 0, but the code was incorrectly assuming that if the left row had a non-zero number of fields then the right row would also have at least one field, so it was copying invalid bytes and and treating them as part of the bitset. I'm not sure whether this bug was also present in the original implementation or whether it was introduced in #7892 (which fixed another bug in this code). 2. **Incorrect updating of data offsets for null variable-length fields**: after updating the bitsets and copying fixed-length and variable-length data, we need to perform adjustments to the offsets pointing the start of variable length fields's data. The existing code was _conditionally_ adding a fixed offset to correct for the new length of the combined row, but it is unsafe to do this if the variable-length field has a null value: we always represent nulls by storing `0` in the fixed-length slot, but this code was incorrectly incrementing those values. This bug was present since the original version of `GenerateUnsafeRowJoiner`. ### Why this bug remained latent for so long The PR which introduced `GenerateUnsafeRowJoiner` features several randomized tests, including tests of the cases where one side of the join has no fields and where string-valued fields are null. However, the existing assertions were too weak to uncover this bug: - If a null field has a non-zero value in its fixed-length data slot then this will not cause problems for field accesses because the null-tracking bitmap should still be correct and we will not try to use the incorrect offset for anything. - If the null tracking bitmap is corrupted by joining against a row with no fields then the corruption occurs in field numbers past the actual field numbers contained in the row. Thus valid `isNullAt()` calls will not read the incorrectly-set bits. The existing `GenerateUnsafeRowJoinerSuite` tests only exercised `.get()` and `isNullAt()`, but didn't actually check the UnsafeRows for bit-for-bit equality, preventing these bugs from failing assertions. It turns out that there was even a [GenerateUnsafeRowJoinerBitsetSuite](https://github.com/apache/spark/blob/03377d2522776267a07b7d6ae9bddf79a4e0f516/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerBitsetSuite.scala) but it looks like it also didn't catch this problem because it only tested the bitsets in an end-to-end fashion by accessing them through the `UnsafeRow` interface instead of actually comparing the bitsets' bytes. ### Impact of these bugs - This bug will cause `equals()` and `hashCode()` to be incorrect for these rows, which will be problematic in case`GenerateUnsafeRowJoiner`'s results are used as join or grouping keys. - Chained / repeated invocations of `GenerateUnsafeRowJoiner` may result in reads from invalid null bitmap positions causing fields to incorrectly become NULL (see the end-to-end example below). - It looks like this generally only happens in `CartesianProductExec`, which our query optimizer often avoids executing (usually we try to plan a `BroadcastNestedLoopJoin` instead). ### End-to-end test case demonstrating the problem The following query demonstrates how this bug may result in incorrect query results: ```sql set spark.sql.autoBroadcastJoinThreshold=-1; -- Needed to trigger CartesianProductExec create table a as select * from values 1; create table b as select * from values 2; SELECT t3.col1, t1.col1 FROM a t1 CROSS JOIN b t2 CROSS JOIN b t3 ``` This should return `(2, 1)` but instead was returning `(null, 1)`. Column pruning ends up trimming off all columns from `t2`, so when `t2` joins with another table this triggers the bitmap-copying bug. This incorrect bitmap is subsequently copied again when performing the final join, causing the final output to have an incorrectly-set null bit for the first field. ## How was this patch tested? Strengthened the assertions in existing tests in GenerateUnsafeRowJoinerSuite. Also verified that the end-to-end test case which uncovered this now passes. Author: Josh Rosen <joshrosen@databricks.com> Closes #20181 from JoshRosen/SPARK-22984-fix-generate-unsaferow-joiner-bitmap-bugs. (cherry picked from commit f20131d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…enerateUnsafeRowJoiner ## What changes were proposed in this pull request? This PR fixes a longstanding correctness bug in `GenerateUnsafeRowJoiner`. This class was introduced in apache#7821 (July 2015 / Spark 1.5.0+) and is used to combine pairs of UnsafeRows in TungstenAggregationIterator, CartesianProductExec, and AppendColumns. ### Bugs fixed by this patch 1. **Incorrect combining of null-tracking bitmaps**: when concatenating two UnsafeRows, the implementation "Concatenate the two bitsets together into a single one, taking padding into account". If one row has no columns then it has a bitset size of 0, but the code was incorrectly assuming that if the left row had a non-zero number of fields then the right row would also have at least one field, so it was copying invalid bytes and and treating them as part of the bitset. I'm not sure whether this bug was also present in the original implementation or whether it was introduced in apache#7892 (which fixed another bug in this code). 2. **Incorrect updating of data offsets for null variable-length fields**: after updating the bitsets and copying fixed-length and variable-length data, we need to perform adjustments to the offsets pointing the start of variable length fields's data. The existing code was _conditionally_ adding a fixed offset to correct for the new length of the combined row, but it is unsafe to do this if the variable-length field has a null value: we always represent nulls by storing `0` in the fixed-length slot, but this code was incorrectly incrementing those values. This bug was present since the original version of `GenerateUnsafeRowJoiner`. ### Why this bug remained latent for so long The PR which introduced `GenerateUnsafeRowJoiner` features several randomized tests, including tests of the cases where one side of the join has no fields and where string-valued fields are null. However, the existing assertions were too weak to uncover this bug: - If a null field has a non-zero value in its fixed-length data slot then this will not cause problems for field accesses because the null-tracking bitmap should still be correct and we will not try to use the incorrect offset for anything. - If the null tracking bitmap is corrupted by joining against a row with no fields then the corruption occurs in field numbers past the actual field numbers contained in the row. Thus valid `isNullAt()` calls will not read the incorrectly-set bits. The existing `GenerateUnsafeRowJoinerSuite` tests only exercised `.get()` and `isNullAt()`, but didn't actually check the UnsafeRows for bit-for-bit equality, preventing these bugs from failing assertions. It turns out that there was even a [GenerateUnsafeRowJoinerBitsetSuite](https://github.com/apache/spark/blob/03377d2522776267a07b7d6ae9bddf79a4e0f516/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerBitsetSuite.scala) but it looks like it also didn't catch this problem because it only tested the bitsets in an end-to-end fashion by accessing them through the `UnsafeRow` interface instead of actually comparing the bitsets' bytes. ### Impact of these bugs - This bug will cause `equals()` and `hashCode()` to be incorrect for these rows, which will be problematic in case`GenerateUnsafeRowJoiner`'s results are used as join or grouping keys. - Chained / repeated invocations of `GenerateUnsafeRowJoiner` may result in reads from invalid null bitmap positions causing fields to incorrectly become NULL (see the end-to-end example below). - It looks like this generally only happens in `CartesianProductExec`, which our query optimizer often avoids executing (usually we try to plan a `BroadcastNestedLoopJoin` instead). ### End-to-end test case demonstrating the problem The following query demonstrates how this bug may result in incorrect query results: ```sql set spark.sql.autoBroadcastJoinThreshold=-1; -- Needed to trigger CartesianProductExec create table a as select * from values 1; create table b as select * from values 2; SELECT t3.col1, t1.col1 FROM a t1 CROSS JOIN b t2 CROSS JOIN b t3 ``` This should return `(2, 1)` but instead was returning `(null, 1)`. Column pruning ends up trimming off all columns from `t2`, so when `t2` joins with another table this triggers the bitmap-copying bug. This incorrect bitmap is subsequently copied again when performing the final join, causing the final output to have an incorrectly-set null bit for the first field. ## How was this patch tested? Strengthened the assertions in existing tests in GenerateUnsafeRowJoinerSuite. Also verified that the end-to-end test case which uncovered this now passes. Author: Josh Rosen <joshrosen@databricks.com> Closes apache#20181 from JoshRosen/SPARK-22984-fix-generate-unsaferow-joiner-bitmap-bugs. (cherry picked from commit f20131d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

rxin force-pushed the rowconcat branch from eacc9b1 to 35e894b Compare July 31, 2015 21:51

rxin changed the title ~~[SPARK-9358][SQL][WIP] Code generation for UnsafeRow concat.~~ [SPARK-9358][SQL] Code generation for UnsafeRow concat. Jul 31, 2015

davies reviewed Jul 31, 2015
View reviewed changes

davies reviewed Aug 1, 2015
View reviewed changes

rxin added 9 commits July 31, 2015 19:28

[SPARK-9358][SQL][WIP] Code generation for UnsafeRow concat.

0f89716

Fixed a bug .

6269f96

Updated.

e9a4347

Support concat data as well.

00354b9

Updated documentation.

6687b6f

Test fixes.

f0913aa

Reset random data generator.

40c3fb2

Fixed offset.

a84ed2e

Fixed a bug.

72c5d8e

rxin force-pushed the rowconcat branch from db168db to 72c5d8e Compare August 1, 2015 02:28

Rebase and code review.

8717f35

rxin changed the title ~~[SPARK-9358][SQL] Code generation for UnsafeRow concat.~~ [SPARK-9358][SQL] Code generation for UnsafeRow joiner. Aug 1, 2015

asfgit closed this in 03377d2 Aug 1, 2015

JoshRosen mentioned this pull request Jan 8, 2018

[SPARK-22984] Fix incorrect bitmap copying and offset adjustment in GenerateUnsafeRowJoiner #20181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9358][SQL] Code generation for UnsafeRow joiner. #7821

[SPARK-9358][SQL] Code generation for UnsafeRow joiner. #7821

rxin commented Jul 31, 2015

SparkQA commented Jul 31, 2015

SparkQA commented Jul 31, 2015

SparkQA commented Jul 31, 2015

SparkQA commented Jul 31, 2015

davies Jul 31, 2015

rxin Jul 31, 2015

SparkQA commented Jul 31, 2015

davies Aug 1, 2015

SparkQA commented Aug 1, 2015

davies Aug 1, 2015

rxin commented Aug 1, 2015

davies commented Aug 1, 2015

SparkQA commented Aug 1, 2015

SparkQA commented Aug 1, 2015

[SPARK-9358][SQL] Code generation for UnsafeRow joiner. #7821

[SPARK-9358][SQL] Code generation for UnsafeRow joiner. #7821

Conversation

rxin commented Jul 31, 2015

SparkQA commented Jul 31, 2015

SparkQA commented Jul 31, 2015

SparkQA commented Jul 31, 2015

SparkQA commented Jul 31, 2015

davies Jul 31, 2015

Choose a reason for hiding this comment

rxin Jul 31, 2015

Choose a reason for hiding this comment

SparkQA commented Jul 31, 2015

davies Aug 1, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 1, 2015

davies Aug 1, 2015

Choose a reason for hiding this comment

rxin commented Aug 1, 2015

davies commented Aug 1, 2015

SparkQA commented Aug 1, 2015

SparkQA commented Aug 1, 2015