Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compression configuration for aggregationConfigs to StartreeIndexConfigs #11744

Conversation

t0mpere
Copy link
Contributor

@t0mpere t0mpere commented Oct 5, 2023

Adding aggregationConfigs to enable configuring compression.

"starTreeIndexConfigs": [
        {
          "dimensionsSplitOrder": [
            "a",
            "b",
            "c"
          ],
          "skipStarNodeCreationForDimensions": [],
          "functionColumnPairs": [],
          "aggregationConfigs": [
            {
              "columnName": "column1",
              "aggregationFunction": "SUM",
              "compressionCodec": "SNAPPY"
            },
            {
              "columnName": "column2",
              "aggregationFunction": "distinctcounthll",
              "compressionCodec": "LZ4"
            }
          ],
          "maxLeafRecords": 10000
        }
      ]

See: #9694

This change was tested manually with multiple segment refresh on DOUBLE and BYTES metrics columns.

@t0mpere t0mpere marked this pull request as draft October 5, 2023 16:15
@t0mpere t0mpere marked this pull request as ready for review October 6, 2023 17:34
@codecov-commenter
Copy link

codecov-commenter commented Oct 7, 2023

Codecov Report

Merging #11744 (200fbcf) into master (e0a1f62) will increase coverage by 0.05%.
Report is 49 commits behind head on master.
The diff coverage is 62.62%.

@@             Coverage Diff              @@
##             master   #11744      +/-   ##
============================================
+ Coverage     63.13%   63.18%   +0.05%     
- Complexity     1118     1141      +23     
============================================
  Files          2342     2344       +2     
  Lines        125883   126415     +532     
  Branches      19357    19450      +93     
============================================
+ Hits          79477    79878     +401     
- Misses        40749    40859     +110     
- Partials       5657     5678      +21     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (?)
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 63.16% <62.62%> (+0.10%) ⬆️
java-17 63.05% <62.62%> (+0.12%) ⬆️
java-20 63.05% <62.62%> (+0.06%) ⬆️
temurin 63.18% <62.62%> (+0.05%) ⬆️
unittests 63.18% <62.62%> (+0.05%) ⬆️
unittests1 67.36% <62.62%> (+0.07%) ⬆️
unittests2 14.39% <0.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...al/startree/v2/builder/StarTreeIndexSeparator.java 85.41% <100.00%> (-1.13%) ⬇️
...egment/spi/index/startree/StarTreeV2Constants.java 50.00% <ø> (ø)
...segment/spi/index/startree/StarTreeV2Metadata.java 95.00% <100.00%> (+1.25%) ⬆️
...cal/startree/v2/builder/BaseSingleTreeBuilder.java 90.47% <91.66%> (+0.02%) ⬆️
...ocal/startree/v2/builder/MultipleTreesBuilder.java 54.25% <0.00%> (-1.79%) ⬇️
...he/pinot/segment/local/utils/TableConfigUtils.java 71.08% <75.00%> (-2.34%) ⬇️
...he/pinot/spi/config/table/StarTreeIndexConfig.java 80.00% <57.14%> (-20.00%) ⬇️
...l/startree/v2/builder/StarTreeV2BuilderConfig.java 76.25% <71.42%> (+3.01%) ⬆️
...ot/spi/config/table/StarTreeAggregationConfig.java 0.00% <0.00%> (ø)
.../index/startree/AggregationFunctionColumnPair.java 66.66% <55.88%> (-14.82%) ⬇️

... and 146 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang Jackie-Jiang added feature Configuration Config changes (addition/deletion/change in behavior) labels Oct 9, 2023
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add some tests to ensure the behavior is correct? We may consider using different codec for each aggregation in BaseStarTreeV2Test or StarTreeClusterIntegrationTest

_dimensionsSplitOrder = dimensionsSplitOrder;
_skipStarNodeCreationForDimensions = skipStarNodeCreationForDimensions;
_functionColumnPairs = functionColumnPairs;
_functionColumnPairs = functionColumnPairs != null ? functionColumnPairs : new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest treating empty as null instead of treating null as empty. In most of the cases, only one will be configured. We may also add this logic to _skipStarNodeCreationForDimensions

Suggested change
_functionColumnPairs = functionColumnPairs != null ? functionColumnPairs : new ArrayList<>();
_skipStarNodeCreationForDimensions = CollectionUtils.isNotEmpty(skipStarNodeCreationForDimensions) ? skipStarNodeCreationForDimensions : null;
_functionColumnPairs = CollectionUtils.isNotEmpty(functionColumnPairs) ? functionColumnPairs : null;
_aggregationConfigs = CollectionUtils.isNotEmpty(aggregationConfigs) ? aggregationConfigs : null;
Preconditions.checkArgument(_functionColumnPairs != null || _aggregationConfigs != null, ...)


// Adds current object to Configuration by reference
public void addToConfiguration(Configuration configuration) {
String configPrefix = StarTreeV2Constants.MetadataKey.AGGREGATION_CONFIG + "." + this.toColumnName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest not using column name as the prefix because in the future we might want to deprecate it. Using index might be good: aggregate.0.function, aggregate.0.column, aggregate.0.codec

Configuration functionColPairsConfig =
metadataProperties.subset(MetadataKey.AGGREGATION_CONFIG + "." + functionColumnPair);
if (!functionColPairsConfig.isEmpty()) {
_functionColumnPairs.add(AggregationFunctionColumnPair.fromConfiguration(functionColPairsConfig));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(CRITICAL) This will cause the same aggregation being added twice if the codec is different

Copy link
Contributor Author

@t0mpere t0mpere Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can modify the equals to account for this, ignoring the compression type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the data structure to a treeMap with functionColumnPair as a string key to avoid duplicates.

@t0mpere
Copy link
Contributor Author

t0mpere commented Oct 17, 2023

Thanks for the review. Adressing everything :)

@Jackie-Jiang
Copy link
Contributor

Have you pushed the changes?

@t0mpere
Copy link
Contributor Author

t0mpere commented Oct 19, 2023

Not yet, I'll try to do it by eod

@Jackie-Jiang
Copy link
Contributor

I applied a commit to introduce the AggregationSpec and revert the changes in AggregationFunctionColumnPair because that is used to match the functions at query time. Changed the TreeSet<AggregationFunctionColumnPair> to TreeMap<AggregationFunctionColumnPair, AggregationSpec> so that we can use keySet() to achieve the same behavior.
Other changes:

  • Removed the duplicate config check because I feel user might config both fields, and we prioritize on aggregationConfigs over functionColumnPairs
  • Simplified the test changes to use a random compression codec in BaseStarTreeV2Test.

Please take a look and let me know if you have concern about the change

@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Oct 25, 2023
@t0mpere
Copy link
Contributor Author

t0mpere commented Oct 26, 2023

Looks good from my side, and much cleaner. Thanks for helping. :)

@Jackie-Jiang Jackie-Jiang merged commit 9da03b4 into apache:master Oct 30, 2023
19 checks passed
@Jackie-Jiang
Copy link
Contributor

@t0mpere Thanks for the contribution! Could you please also help contribute the documentation for the new config? It can be added to this repo: git@github.com:pinot-contrib/pinot-docs.git

@t0mpere
Copy link
Contributor Author

t0mpere commented Oct 31, 2023

Sure, I'm OOO at the moment I will take care in 2 weeks as soon as I will get back to my laptop.

@t0mpere t0mpere changed the title Add compression configuration for functionColumnPairs to StartreeIndexConfigs Add compression configuration for aggregationConfigs to StartreeIndexConfigs Nov 13, 2023
@t0mpere
Copy link
Contributor Author

t0mpere commented Nov 13, 2023

For reference,
Documentation pr: pinot-contrib/pinot-docs#258

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants