Skip to content

[Backport] Fix cool nested column bug caused by not properly validating that global id is present in global dictionary before lookup local id (#13561)#13572

Merged
kfaraz merged 1 commit intoapache:25.0.0from
kfaraz:backport_nested_col_fix
Dec 16, 2022

Conversation

@kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Dec 15, 2022

Backports #13561

…bal id is present in global dictionary before lookup up local id (apache#13561)

This commit fixes a bug with nested column "value set" indexes caused by not properly
validating that the globalId looked up for value is present in the global dictionary prior to
looking it up in the local dictionary, which when "adjusting" the global ids for value type
can cause incorrect selection of value indexes.

To use an example of a variant typed nested column with 3 values `["1", null, -2]`.
The string dictionary is `[null, "1"]`, the long dictionary is `[-2]` and our local dictionary is `[0, 1, 2]`.

The code for variant typed indexes checks if the value is present in all global dictionaries
and returns indexes for all matches. So in this case, we first lookup "1" in the string dictionary,
find it at global id 1, all is good. Now, we check the long dictionary for `1`, which due to 
`-(insertionpoint + 1)` gives us `-(1 + 2) = -2`. Since the global id space is actually stacked
dictionaries, global ids for long and double values must be "adjusted" by the size of string
dictionary, and size of string + size of long for doubles.

Prior to this patch we were not checking that the globalId is 0 or larger, we then immediately
looked up the `localDictionary.indexOf(-2 + adjustLong) = localDictionary.indexOf(-2 + 2) = localDictionary.indexOf(0)` ... which is an actual value contained in the dictionary! The fix is
to skip the longs completely since there were no global matches.

On to doubles, `-(insertionPoint + 1)` gives us `-(0 + 1) = -1`. The double adjust value is '3'
since 2 strings and 1 long, so `localDictionary.indexOf(-1 + 3)` = `localDictionary.indexOf(2)` 
which is also a real value in our local dictionary that is definitely not '1'.

So in this one case, looking for '1' incorrectly ended up matching every row.
@kfaraz kfaraz added this to the 25.0 milestone Dec 15, 2022
@kfaraz kfaraz merged commit 1b009ed into apache:25.0.0 Dec 16, 2022
@kfaraz kfaraz deleted the backport_nested_col_fix branch May 3, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants