Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More specific null/empty str handling in IndexMerger #2306

Merged
merged 1 commit into from
Jan 21, 2016

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Jan 20, 2016

Addresses a concern raised in PR #2006 about dimension cardinalities.

Changes the conditions for when an entry for the null/empty str is added to the value dictionary during index merging.

Old behavior:

  • If dim has values in at least one index, ensure the null/empty str is in the dictionary and convert missing columns to empty strings for merging

New behavior:

  • If dim is entirely missing from an index, but has values in another index, ensure the null/empty str is in the dictionary and convert missing columns to empty strings for merging

@@ -641,6 +642,7 @@ public Metadata apply(IndexableAdapter input)
List<Indexed<String>> dimValueLookups = Lists.newArrayListWithCapacity(indexes.size() + 1);
DimValueConverter[] converters = new DimValueConverter[indexes.size()];
boolean dimHasValues = false;
boolean dimHasMissing = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a more descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fjy changed this to "dimAbsentFromIndex"

@fjy
Copy link
Contributor

fjy commented Jan 20, 2016

@navis does this address your concerns from #2006 ?

@fjy
Copy link
Contributor

fjy commented Jan 21, 2016

👍

@fjy fjy added this to the 0.9.0 milestone Jan 21, 2016
@binlijin
Copy link
Contributor

What will happen if a dim has values in all index, but all index do not have null/empty value?

@jon-wei
Copy link
Contributor Author

jon-wei commented Jan 21, 2016

@binlijin The null/empty str will not be added to the dictionary in that case (i.e., dimAbsentFromIndex will be false)

Off the top of my head, testNonLexicographicDimOrderMerge() in IndexMergerTest is an example where the dims don't have null/empty values

@binlijin
Copy link
Contributor

@jon-wei good

dimHasValuesByIndex[i] = false;
}
}

boolean convertMissingDims = dimHasValues & dimAbsentFromIndex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm changed, thanks for catching that

@gianm
Copy link
Contributor

gianm commented Jan 21, 2016

@jon-wei It seems like the ideal condition would be:

  • If dim is entirely missing from one index, but has mixed nulls and values in another index, ensure the null/empty str is in the dictionary and convert missing columns to empty strings for merging

Checking for mixed nulls/values shouldn't add additional expense as isNullColumn already iterates over all the dim values. So I think it's feasible to do this check the ideal way.

@jon-wei
Copy link
Contributor Author

jon-wei commented Jan 21, 2016

@gianm It's still necessary to add the null/empty str to the dictionary for the case where a dim is missing from one index, and has no null values in another (missing and non-mixed) when they merge.

In the legacy merger this was handled by bumping the dictionary in IndexIO:

          if (onlyOneValue) {
            log.info("Dimension[%s] is single value, converting...", dimension);
            final boolean bumpedDictionary;
            if (nullsSet != null) {
              log.info("Dimension[%s] has null rows.", dimension);
              final ImmutableBitmap theNullSet = bitmapFactory.makeImmutableBitmap(nullsSet);

              if (dictionary.get(0) != null) {
                log.info("Dimension[%s] has no null value in the dictionary, expanding...", dimension);
                bumpedDictionary = true;
                final List<String> nullList = Lists.newArrayList();
                nullList.add(null);

                dictionary = GenericIndexed.fromIterable(
                    Iterables.concat(nullList, dictionary),
                    GenericIndexed.STRING_STRATEGY
                );

@gianm
Copy link
Contributor

gianm commented Jan 21, 2016

👍

fjy added a commit that referenced this pull request Jan 21, 2016
More specific null/empty str handling in IndexMerger
@fjy fjy merged commit 3f99811 into apache:master Jan 21, 2016
@fjy fjy mentioned this pull request Feb 5, 2016
@fjy fjy added the Improvement label Feb 5, 2016
@jon-wei jon-wei deleted the inherit2 branch October 6, 2017 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants