-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support default star-tree #5147
Support default star-tree #5147
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5147 +/- ##
============================================
+ Coverage 65.90% 65.95% +0.05%
Complexity 12 12
============================================
Files 1052 1055 +3
Lines 54170 54150 -20
Branches 8078 8063 -15
============================================
+ Hits 35702 35717 +15
+ Misses 15819 15783 -36
- Partials 2649 2650 +1
Continue to review full report at Codecov.
|
List<StarTreeV2BuilderConfig> starTreeV2BuilderConfigs = new ArrayList<>(starTreeIndexConfigs.size()); | ||
for (StarTreeIndexConfig starTreeIndexConfig : starTreeIndexConfigs) { | ||
starTreeV2BuilderConfigs.add(StarTreeV2BuilderConfig.fromIndexConfig(starTreeIndexConfig)); | ||
if (indexingConfig.isEnableDefaultStarTree()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both are defined, you should pay attention to the specific start tree definition requested. This will keep backward compat also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't have backward compatible issue because default star-tree is not enabled by default.
But this is a great question, and we should not ignore the explicitly configured star-tree. Modify the code to generate star-tree for both configs (if both are defined, generate both default and customized).
Also, please be sure to add documentation |
42fd0e9
to
08095da
Compare
Documentation for default star-tree config is inside StarTreeV2BuilderConfig. Will add more documentation in gitbook later |
08095da
to
03f2fae
Compare
} | ||
|
||
public void setEnableDefaultStarTree(boolean enableDefaultStarTree) { | ||
_enableDefaultStarTree = enableDefaultStarTree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to add this config may be to add a "mode" field in StarTreeIndexConfig. If mode is set to "auto" then ignore other settings. Otherwise, follow the other serttings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the current way so that the default star-tree can be enabled by one boolean flag. Don't want to change the existing StarTreeIndexConfig because adding a mode seems more confusing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang will be good to get user feedback on this. @elonazoulay - you have used this feature - your inputs will be valuable.
@@ -145,6 +146,14 @@ public void setOnHeapDictionaryColumns(List<String> onHeapDictionaryColumns) { | |||
_onHeapDictionaryColumns = onHeapDictionaryColumns; | |||
} | |||
|
|||
public boolean isEnableDefaultStarTree() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isDefaultStarTreeEnabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to match the variable name (key of the config), and I prefer the config to be "enableDefaultStarTree": true
} | ||
|
||
public void setEnableDefaultStarTree(boolean enableDefaultStarTree) { | ||
_enableDefaultStarTree = enableDefaultStarTree; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang will be good to get user feedback on this. @elonazoulay - you have used this feature - your inputs will be valuable.
switch (fieldSpec.getFieldType()) { | ||
case DIMENSION: | ||
case DATE_TIME: | ||
ColumnMetadata columnMetadata = segmentMetadata.getColumnMetadataFor(column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we checking the cardinality threshold for date_time but not for time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I assume time will be included in most queries and in the range filter or group by, so I decide to always include time column as the last dimension to split.
For DATE_TIME
, I assume the query pattern should be similar to other dimensions, so use the same rule for them.
Updated the comments for this.
Another way is to just treat all of them the same, but IMO always putting time column last should suit a wider range of use cases.
break; | ||
case TIME: | ||
columnMetadata = segmentMetadata.getColumnMetadataFor(column); | ||
if (columnMetadata.hasDictionary()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot generate star tree on a column that's not dictionary encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the dimension must be dictionary encoded because we only store dictionary id in star-tree.
} | ||
break; | ||
case METRIC: | ||
if (fieldSpec.getDataType().isNumeric()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about hyperloglog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For BYTES
column, there is no way to tell what type of data it represents from the metadata. Similar to all other aggregation types (such as MIN
, MAX
, etc.), DICTINCTCOUNTHLL
needs to be configured through the customized star-tree config.
03f2fae
to
3ae1065
Compare
3ae1065
to
f2c78e0
Compare
@Jackie-Jiang did you get feedback from @elonazoulay as suggested by Kishore on the table config part? Or, are we going ahead without the feedback? |
No I haven't. @elonazoulay Does the default config (described in the pr message) look good to you? |
please rebase and push |
Support generating default star-tree config with the following rules: - All dictionary-encoded single-value dimensions (including date-time columns) with cardinality smaller or equal to the threshold (10000) will be included in the split order, sorted by their cardinality in descending order - Time column (if exists and dictionary-encoded) will be appended to the split order as the last element - Use COUNT(*) and SUM for all numeric metrics as function column pairs - Use default value (10000) for max leaf records
f2c78e0
to
274a212
Compare
Support generating default star-tree config with the following rules: