New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wire EmptySegmentPruner to routing config #8067
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8067 +/- ##
=============================================
+ Coverage 37.93% 64.70% +26.77%
- Complexity 81 4306 +4225
=============================================
Files 1606 1572 -34
Lines 83405 81999 -1406
Branches 12455 12326 -129
=============================================
+ Hits 31638 53060 +21422
+ Misses 49311 25177 -24134
- Partials 2456 3762 +1306
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Add @npawar to the discussion. |
@Jackie-Jiang We may need the empty segment pruner in the future for some other use case so it is probably good to keep the empty segment pruner config wired. Anyway, I like your idea to auto-enable empty segment pruner if Kinesis consumer is used. In that way, we can keep the backward compatibility. That will save some effort to enable the empty segment pruner for all tables using Kinesis consumers. @mqliang How do you think? |
@@ -69,7 +70,10 @@ private SegmentPrunerFactory() { | |||
} | |||
} | |||
} | |||
segmentPruners.addAll(sortSegmentPruners(configuredSegmentPruners)); | |||
// sort all segment pruners so that always prune empty segments first. After that, pruned based on time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the comment on why you picked the order in this way?
empty -> time -> partition
If we are trying to sort them in a particular order for improving the performance, this order may not be the optimal case. (We need to move the pruner that will potentially prune the most segments to front)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
int left = 0; | ||
int right = pruners.size() - 1; | ||
int current = 0; | ||
while (current <= right) { | ||
SegmentPruner currentPruner = pruners.get(current); | ||
// if currentPruner is EmptySegmentPruner, move it to front by swapping with the left pointer | ||
if (currentPruner instanceof EmptySegmentPruner) { | ||
pruners.set(current, pruners.get(left)); | ||
pruners.set(left, currentPruner); | ||
left++; | ||
} | ||
} | ||
for (SegmentPruner pruner : pruners) { | ||
if (!(pruner instanceof TimeSegmentPruner)) { | ||
sortedPruners.add(pruner); | ||
// if current is PartitionSegmentPruner, move it to end by swapping with right pointer | ||
if (currentPruner instanceof PartitionSegmentPruner) { | ||
pruners.set(current, pruners.get(right)); | ||
pruners.set(right, currentPruner); | ||
right--; | ||
// may have swapped an EmptySegmentPruner/PartitionSegmentPruner from the end of list that requires extra | ||
// processing, so decrement the current index to account for it. | ||
current--; | ||
} | ||
current++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm here is not hard to follow, but it's not as easy to understand as the older version. Since at most there will be three pruners, looping through pruners multiple times doesn't affect performance. For the sake of simplicity, I suggest using the older version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Agree. PR has been updated |
cd5e17e
to
2d8bd07
Compare
pinot-broker/pom.xml
Outdated
<groupId>org.apache.pinot</groupId> | ||
<artifactId>pinot-kinesis</artifactId> | ||
<version>0.10.0-SNAPSHOT</version> | ||
<scope>compile</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove hardcoded version
List<SegmentPruner> segmentPruners = new ArrayList<>(); | ||
// Always prune out empty segments first | ||
segmentPruners.add(new EmptySegmentPruner(tableConfig, propertyStore)); | ||
boolean isKinesisEnabled = isKinesisEnabled(tableConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is leaking immensely specific information into the broker about Kinesis and the behavior of empty segments. Plus, adding kinesis plugin dependency in pinot-broker is not the best..
How about one of these options:
- Add validations to TableConfigUtils.validate, to check that a kinesis stream table has this pruner added (or if there's any logic in that path which decorates the table config)
- Move this method
isKinesisEnabled
toTableConfigUtils
and rename it asneedsEmptySegmentPruner
. Part of that, check if routingTypes already has EmptySegmentPruner, if not check if kinesis. Possibly even add "needsEmptySegmetPruner" to StreamConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npawar
There are two TableConfigUtils,
- org.apache.pinot.segment.local.utils.TableConfigUtils;
- org.apache.pinot.common.utils.config.TableConfigUtils;
Which one is preferred to put needsEmptySegmentPruner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @snleee There are two TableConfigUtils,
org.apache.pinot.segment.local.utils.TableConfigUtils;
org.apache.pinot.common.utils.config.TableConfigUtils;
- If we put the
needsEmptySegmentPruner
function inorg.apache.pinot.segment.local.utils.TableConfigUtils;
, then we need addpinot-kinesis
dependency to packagepinot-segment-local
- If we put the
needsEmptySegmentPruner
function inorg.apache.pinot.common.utils.config.TableConfigUtils
, then we need addpinot-kinesis
dependency to packagepinot-common
kinesis plugin dependency will not been added to pinot-broker, but must be add to either pinot-segment-local
or pinot-common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add it to the one that has all the table config validations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But neither pinot-common nor pinot-segment-local should depend on plugin. Do this instead add "needsEmptySegmetPruner" to StreamConfig
assertTrue(segmentPruners.get(0) instanceof EmptySegmentPruner); | ||
assertTrue(segmentPruners.get(1) instanceof TimeSegmentPruner); | ||
assertEquals(segmentPruners.size(), 1); | ||
assertTrue(segmentPruners.get(0) instanceof TimeSegmentPruner); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some test to check that EmptySegmentPruner is added as expected if 1. already in table config 2. because streaming table with isEmptySegment?
430147a
to
2d8bd07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but LGTM.
segmentPruners.add(new EmptySegmentPruner(tableConfig, propertyStore)); | ||
boolean needsEmptySegment = TableConfigUtils.needsEmptySegmentPruner(tableConfig); | ||
if (needsEmptySegment) { | ||
// Always add EmptySegmentPruner if Kinesis consumer is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this sentence to the javadoc of needsEmptySegmentPruner
method as well.
@@ -857,4 +860,53 @@ public static void verifyHybridTableConfigs(String rawTableName, TableConfig off | |||
public enum ValidationType { | |||
ALL, TASK, UPSERT | |||
} | |||
|
|||
/** | |||
* Helper method to check is EmptySegmentPruner for a TableConfig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the description here.
IndexingConfig indexingConfig = tableConfig.getIndexingConfig(); | ||
if (indexingConfig != null) { | ||
Map<String, String> streamConfig = indexingConfig.getStreamConfigs(); | ||
if (streamConfig != null && KinesisConfig.STREAM_TYPE.equals( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with @Jackie-Jiang and @npawar, it's probably better to put the following hard-coded value in TableConfigUtils
than pulling the entire pinot-kinesis
module. Let's add the comments on why we use hard-coded value.
// Please add the explanation on why we use this instead of KinesisConfig.STREAM_TYPE
private static final String KINESIS_STREAM_TYPE = "kinesis";
@@ -39,6 +39,7 @@ | |||
import org.apache.pinot.common.request.BrokerRequest; | |||
import org.apache.pinot.controller.helix.ControllerTest; | |||
import org.apache.pinot.parsers.QueryCompiler; | |||
import org.apache.pinot.plugin.stream.kinesis.KinesisConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also may cause the issue. we can use hard-coded value here as well.
…odule as dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the change!
Description
#6466 adds EmptySegmentPruner to prune empty segments, this EmptySegmentPruner is added to all tables during the routing table build, and will read segment metadata for all segments on a table.
This means that when brokers start, it read metadata for ALL segments, which may take a long time, especially when BrokerResource getting too big
This PR makes EmptySegmentPruner be disabled by default and wire it to the config so that user can specify the EmptySegmentPruner in routing config if they need it.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
backward-incompat
, and complete the section below on Release Notes)Does this PR fix a zero-downtime upgrade introduced earlier?
backward-incompat
, and complete the section below on Release Notes)Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation