Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wire EmptySegmentPruner to routing config #8067

Merged
merged 6 commits into from Feb 1, 2022
Merged

Conversation

mqliang
Copy link
Contributor

@mqliang mqliang commented Jan 24, 2022

Description

#6466 adds EmptySegmentPruner to prune empty segments, this EmptySegmentPruner is added to all tables during the routing table build, and will read segment metadata for all segments on a table.

This means that when brokers start, it read metadata for ALL segments, which may take a long time, especially when BrokerResource getting too big

This PR makes EmptySegmentPruner be disabled by default and wire it to the config so that user can specify the EmptySegmentPruner in routing config if they need it.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@mqliang
Copy link
Contributor Author

mqliang commented Jan 25, 2022

cc @snleee @npawar

@mqliang mqliang closed this Jan 25, 2022
@mqliang mqliang reopened this Jan 25, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2022

Codecov Report

Merging #8067 (9c5d913) into master (e7ea235) will increase coverage by 26.77%.
The diff coverage is 17.50%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #8067       +/-   ##
=============================================
+ Coverage     37.93%   64.70%   +26.77%     
- Complexity       81     4306     +4225     
=============================================
  Files          1606     1572       -34     
  Lines         83405    81999     -1406     
  Branches      12455    12326      -129     
=============================================
+ Hits          31638    53060    +21422     
+ Misses        49311    25177    -24134     
- Partials       2456     3762     +1306     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.93% <0.00%> (?)
unittests2 14.14% <17.50%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...he/pinot/segment/local/utils/TableConfigUtils.java 65.83% <0.00%> (+65.83%) ⬆️
...g/apache/pinot/spi/config/table/RoutingConfig.java 100.00% <ø> (+100.00%) ⬆️
...er/routing/segmentpruner/SegmentPrunerFactory.java 92.53% <70.00%> (-2.63%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/SegmentReloadMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/core/data/manager/realtime/TimerService.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1170 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7ea235...9c5d913. Read the comment docs.

@Jackie-Jiang
Copy link
Contributor

Add @npawar to the discussion.
I believe only the Kinesis consumer can create empty segment, should we auto-enable empty segment pruner for Kinesis only?

@snleee
Copy link
Contributor

snleee commented Jan 25, 2022

@Jackie-Jiang We may need the empty segment pruner in the future for some other use case so it is probably good to keep the empty segment pruner config wired. Anyway, I like your idea to auto-enable empty segment pruner if Kinesis consumer is used. In that way, we can keep the backward compatibility. That will save some effort to enable the empty segment pruner for all tables using Kinesis consumers.

@mqliang How do you think?

@@ -69,7 +70,10 @@ private SegmentPrunerFactory() {
}
}
}
segmentPruners.addAll(sortSegmentPruners(configuredSegmentPruners));
// sort all segment pruners so that always prune empty segments first. After that, pruned based on time
Copy link
Contributor

@snleee snleee Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the comment on why you picked the order in this way?

empty -> time -> partition

If we are trying to sort them in a particular order for improving the performance, this order may not be the optimal case. (We need to move the pruner that will potentially prune the most segments to front)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 140 to 160
int left = 0;
int right = pruners.size() - 1;
int current = 0;
while (current <= right) {
SegmentPruner currentPruner = pruners.get(current);
// if currentPruner is EmptySegmentPruner, move it to front by swapping with the left pointer
if (currentPruner instanceof EmptySegmentPruner) {
pruners.set(current, pruners.get(left));
pruners.set(left, currentPruner);
left++;
}
}
for (SegmentPruner pruner : pruners) {
if (!(pruner instanceof TimeSegmentPruner)) {
sortedPruners.add(pruner);
// if current is PartitionSegmentPruner, move it to end by swapping with right pointer
if (currentPruner instanceof PartitionSegmentPruner) {
pruners.set(current, pruners.get(right));
pruners.set(right, currentPruner);
right--;
// may have swapped an EmptySegmentPruner/PartitionSegmentPruner from the end of list that requires extra
// processing, so decrement the current index to account for it.
current--;
}
current++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm here is not hard to follow, but it's not as easy to understand as the older version. Since at most there will be three pruners, looping through pruners multiple times doesn't affect performance. For the sake of simplicity, I suggest using the older version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mqliang
Copy link
Contributor Author

mqliang commented Jan 25, 2022

@Jackie-Jiang We may need the empty segment pruner in the future for some other use case so it is probably good to keep the empty segment pruner config wired. Anyway, I like your idea to auto-enable empty segment pruner if Kinesis consumer is used. In that way, we can keep the backward compatibility. That will save some effort to enable the empty segment pruner for all tables using Kinesis consumers.

@mqliang How do you think?

Agree. PR has been updated

@mqliang mqliang force-pushed the prune-config branch 2 times, most recently from cd5e17e to 2d8bd07 Compare January 25, 2022 22:37
<groupId>org.apache.pinot</groupId>
<artifactId>pinot-kinesis</artifactId>
<version>0.10.0-SNAPSHOT</version>
<scope>compile</scope>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove hardcoded version

List<SegmentPruner> segmentPruners = new ArrayList<>();
// Always prune out empty segments first
segmentPruners.add(new EmptySegmentPruner(tableConfig, propertyStore));
boolean isKinesisEnabled = isKinesisEnabled(tableConfig);
Copy link
Contributor

@npawar npawar Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is leaking immensely specific information into the broker about Kinesis and the behavior of empty segments. Plus, adding kinesis plugin dependency in pinot-broker is not the best..
How about one of these options:

  1. Add validations to TableConfigUtils.validate, to check that a kinesis stream table has this pruner added (or if there's any logic in that path which decorates the table config)
  2. Move this method isKinesisEnabled to TableConfigUtils and rename it as needsEmptySegmentPruner. Part of that, check if routingTypes already has EmptySegmentPruner, if not check if kinesis. Possibly even add "needsEmptySegmetPruner" to StreamConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npawar
There are two TableConfigUtils,

  • org.apache.pinot.segment.local.utils.TableConfigUtils;
  • org.apache.pinot.common.utils.config.TableConfigUtils;

Which one is preferred to put needsEmptySegmentPruner ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @snleee There are two TableConfigUtils,

org.apache.pinot.segment.local.utils.TableConfigUtils;
org.apache.pinot.common.utils.config.TableConfigUtils;
  • If we put the needsEmptySegmentPruner function in org.apache.pinot.segment.local.utils.TableConfigUtils;, then we need add pinot-kinesis dependency to package pinot-segment-local
  • If we put the needsEmptySegmentPruner function in org.apache.pinot.common.utils.config.TableConfigUtils, then we need add pinot-kinesis dependency to package pinot-common

kinesis plugin dependency will not been added to pinot-broker, but must be add to either pinot-segment-local or pinot-common

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it to the one that has all the table config validations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But neither pinot-common nor pinot-segment-local should depend on plugin. Do this instead add "needsEmptySegmetPruner" to StreamConfig

assertTrue(segmentPruners.get(0) instanceof EmptySegmentPruner);
assertTrue(segmentPruners.get(1) instanceof TimeSegmentPruner);
assertEquals(segmentPruners.size(), 1);
assertTrue(segmentPruners.get(0) instanceof TimeSegmentPruner);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some test to check that EmptySegmentPruner is added as expected if 1. already in table config 2. because streaming table with isEmptySegment?

@mqliang mqliang force-pushed the prune-config branch 2 times, most recently from 430147a to 2d8bd07 Compare January 31, 2022 20:02
Copy link
Contributor

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but LGTM.

segmentPruners.add(new EmptySegmentPruner(tableConfig, propertyStore));
boolean needsEmptySegment = TableConfigUtils.needsEmptySegmentPruner(tableConfig);
if (needsEmptySegment) {
// Always add EmptySegmentPruner if Kinesis consumer is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this sentence to the javadoc of needsEmptySegmentPruner method as well.

@@ -857,4 +860,53 @@ public static void verifyHybridTableConfigs(String rawTableName, TableConfig off
public enum ValidationType {
ALL, TASK, UPSERT
}

/**
* Helper method to check is EmptySegmentPruner for a TableConfig.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the description here.

IndexingConfig indexingConfig = tableConfig.getIndexingConfig();
if (indexingConfig != null) {
Map<String, String> streamConfig = indexingConfig.getStreamConfigs();
if (streamConfig != null && KinesisConfig.STREAM_TYPE.equals(
Copy link
Contributor

@snleee snleee Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @Jackie-Jiang and @npawar, it's probably better to put the following hard-coded value in TableConfigUtils than pulling the entire pinot-kinesis module. Let's add the comments on why we use hard-coded value.

// Please add the explanation on why we use this instead of KinesisConfig.STREAM_TYPE
private static final String KINESIS_STREAM_TYPE = "kinesis";

@@ -39,6 +39,7 @@
import org.apache.pinot.common.request.BrokerRequest;
import org.apache.pinot.controller.helix.ControllerTest;
import org.apache.pinot.parsers.QueryCompiler;
import org.apache.pinot.plugin.stream.kinesis.KinesisConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also may cause the issue. we can use hard-coded value here as well.

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the change!

@snleee snleee merged commit 71e28a2 into apache:master Feb 1, 2022
@mqliang mqliang deleted the prune-config branch February 1, 2022 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants