Skip to content

[SPARK-16269][SQL] Support null handling for vectorized hashmap during hash aggregate#13960

Closed
ooq wants to merge 5 commits intoapache:masterfrom
ooq:spt_nullable_in_vhm
Closed

[SPARK-16269][SQL] Support null handling for vectorized hashmap during hash aggregate#13960
ooq wants to merge 5 commits intoapache:masterfrom
ooq:spt_nullable_in_vhm

Conversation

@ooq
Copy link
Contributor

@ooq ooq commented Jun 29, 2016

What changes were proposed in this pull request?

The current impl of vectorized hashmap does not support null-value keys for nullable data types. This patch fix the problem by adding generateFindOrInsertWithNullable() method in VectorizedHashMapGenerator.scala, which code-generates another version of findOrInsert that handles null-value keys.

We need null support so the aggregate logic does not have to fallback to BytesToBytesMap. This would also us to remove BytesToBytesMap completely.

Note that this patch does degrade query performance when there are no nulls. Because the vectorized hashmap now contains null-value keys, we need to perform more null checks when comparing key values, i.e., in equals. In the future, we might use two levels of vectorized hashmap, with the first level only dealing non-null values, and second level dealing any values.

How was this patch tested?

No additional test is added. A simple benchmark test is included to show the performance gain.

@ooq
Copy link
Contributor Author

ooq commented Jun 29, 2016

JIRA will be added after server is up. cc @davies @sameeragarwal @rxin

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61430 has finished for PR 13960 at commit 4355039.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ooq ooq changed the title [SQL] Support null handling for vectorized hashmap during hash aggregate [SPARK-16269][SQL] Support null handling for vectorized hashmap during hash aggregate Jun 29, 2016
@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61433 has finished for PR 13960 at commit 7bb9570.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61434 has finished for PR 13960 at commit 7b59a55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61487 has finished for PR 13960 at commit 95abd4f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2016

Test build #62766 has finished for PR 13960 at commit 95abd4f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ooq
Copy link
Contributor Author

ooq commented Jul 27, 2016

closing this because the patch degrade query performance when there are no nulls

@ooq ooq closed this Jul 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants