-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More specific null/empty str handling in IndexMerger #2306
Conversation
@@ -641,6 +642,7 @@ public Metadata apply(IndexableAdapter input) | |||
List<Indexed<String>> dimValueLookups = Lists.newArrayListWithCapacity(indexes.size() + 1); | |||
DimValueConverter[] converters = new DimValueConverter[indexes.size()]; | |||
boolean dimHasValues = false; | |||
boolean dimHasMissing = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a more descriptive name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fjy changed this to "dimAbsentFromIndex"
👍 |
What will happen if a dim has values in all index, but all index do not have null/empty value? |
@binlijin The null/empty str will not be added to the dictionary in that case (i.e., dimAbsentFromIndex will be false) Off the top of my head, testNonLexicographicDimOrderMerge() in IndexMergerTest is an example where the dims don't have null/empty values |
@jon-wei good |
dimHasValuesByIndex[i] = false; | ||
} | ||
} | ||
|
||
boolean convertMissingDims = dimHasValues & dimAbsentFromIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be &&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianm changed, thanks for catching that
@jon-wei It seems like the ideal condition would be:
Checking for mixed nulls/values shouldn't add additional expense as |
@gianm It's still necessary to add the null/empty str to the dictionary for the case where a dim is missing from one index, and has no null values in another (missing and non-mixed) when they merge. In the legacy merger this was handled by bumping the dictionary in IndexIO:
|
👍 |
More specific null/empty str handling in IndexMerger
Addresses a concern raised in PR #2006 about dimension cardinalities.
Changes the conditions for when an entry for the null/empty str is added to the value dictionary during index merging.
Old behavior:
New behavior: