[SPARK-16269][SQL] Support null handling for vectorized hashmap during hash aggregate#13960
Closed
ooq wants to merge 5 commits intoapache:masterfrom
Closed
[SPARK-16269][SQL] Support null handling for vectorized hashmap during hash aggregate#13960ooq wants to merge 5 commits intoapache:masterfrom
ooq wants to merge 5 commits intoapache:masterfrom
Conversation
Contributor
Author
|
JIRA will be added after server is up. cc @davies @sameeragarwal @rxin |
|
Test build #61430 has finished for PR 13960 at commit
|
|
Test build #61433 has finished for PR 13960 at commit
|
|
Test build #61434 has finished for PR 13960 at commit
|
|
Test build #61487 has finished for PR 13960 at commit
|
|
Test build #62766 has finished for PR 13960 at commit
|
Contributor
Author
|
closing this because the patch degrade query performance when there are no nulls |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The current impl of vectorized hashmap does not support null-value keys for nullable data types. This patch fix the problem by adding
generateFindOrInsertWithNullable()method inVectorizedHashMapGenerator.scala, which code-generates another version offindOrInsertthat handles null-value keys.We need null support so the aggregate logic does not have to fallback to BytesToBytesMap. This would also us to remove BytesToBytesMap completely.
Note that this patch does degrade query performance when there are no nulls. Because the vectorized hashmap now contains null-value keys, we need to perform more null checks when comparing key values, i.e., in
equals. In the future, we might use two levels of vectorized hashmap, with the first level only dealing non-null values, and second level dealing any values.How was this patch tested?
No additional test is added. A simple benchmark test is included to show the performance gain.