Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the replication in segment assignment strategy #9816

Merged
merged 1 commit into from Nov 17, 2022

Conversation

GSharayu
Copy link
Contributor

@GSharayu GSharayu commented Nov 16, 2022

The Balanced and Replica group segment assignment strategy would assume replication to be fetched from only table config as tableConfig.getValidationConfig().getReplicationNumber() irrespective of whether its a realtime table or offline table
But for real time tables, if we have tableConfig.getValidationConfig().getReplicasPerPartitionNumber(), this should be preferred over table replication

No additional changes are required
Reference Issue: #9309
Also, issue #8804

@GSharayu
Copy link
Contributor Author

GSharayu commented Nov 16, 2022

@mcvsubbu @sajjad-moradi @jugomezv @snleee : please review

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GSharayu Do we have a case where we pick up BalancedNumSegmentAssignment for realtime table? I thought that we haven't wired the segment assignment strategy for realtime tables yet.

@@ -51,6 +51,10 @@ public void init(HelixManager helixManager, TableConfig tableConfig) {
SegmentsValidationAndRetentionConfig validationAndRetentionConfig = tableConfig.getValidationConfig();
Preconditions.checkState(validationAndRetentionConfig != null, "Validation Config is null");
_replication = validationAndRetentionConfig.getReplicationNumber();
// Number of replicas per partition of low-level consumers check is for the real time tables only
if (validationAndRetentionConfig.getReplicasPerPartition() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if the table is offline or real-time and then decide where to pull the replication data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to wrap all the logic for whether to use replication or replicasPerPartition within the TableConfig itself? That way, tableConfig.getReplicationNumber should just return the correct value depending on whether it is offline or realtime table.

Copy link
Contributor Author

@GSharayu GSharayu Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we saw real time tables using balanced strategy in Production. @jugomezv can you please confirm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navina We can wire that in table config but it would also mean cleaning up at all the other places we use this. I would so keep that out of scope for this fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tag this PR with the (open) issue #8804.

Since we have a problem in production, we should fix in some way (add TODOs as needed), but fix the overall replication/replicasPerPartition as a part of that issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2022

Codecov Report

Merging #9816 (f95ac08) into master (47c8f18) will increase coverage by 9.68%.
The diff coverage is 86.27%.

@@             Coverage Diff              @@
##             master    #9816      +/-   ##
============================================
+ Coverage     60.58%   70.26%   +9.68%     
+ Complexity     5281     4998     -283     
============================================
  Files          1949     1964      +15     
  Lines        104632   105033     +401     
  Branches      15847    15896      +49     
============================================
+ Hits          63389    73803   +10414     
+ Misses        36540    26096   -10444     
- Partials       4703     5134     +431     
Flag Coverage Δ
integration1 25.27% <2.94%> (?)
integration2 24.68% <0.98%> (+0.24%) ⬆️
unittests1 67.86% <85.41%> (+0.05%) ⬆️
unittests2 15.77% <86.27%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...apache/pinot/common/function/FunctionRegistry.java 86.84% <ø> (ø)
...not/query/planner/logical/RelToStageConverter.java 83.33% <ø> (+3.70%) ⬆️
...che/pinot/query/planner/logical/RexExpression.java 81.33% <ø> (+5.33%) ⬆️
...ache/pinot/query/planner/logical/StagePlanner.java 100.00% <ø> (ø)
...e/pinot/query/runtime/operator/FilterOperator.java 76.00% <ø> (+16.00%) ⬆️
...ery/runtime/operator/operands/FunctionOperand.java 100.00% <ø> (ø)
...query/runtime/operator/MailboxReceiveOperator.java 79.36% <50.00%> (-0.97%) ⬇️
...query/runtime/operator/operands/FilterOperand.java 83.90% <50.00%> (+16.05%) ⬆️
...va/org/apache/pinot/query/runtime/QueryRunner.java 86.58% <83.33%> (+8.90%) ⬆️
...ot/query/runtime/executor/RoundRobinScheduler.java 87.50% <87.50%> (ø)
... and 397 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@snleee
Copy link
Contributor

snleee commented Nov 16, 2022

Can we also add the testing case to cover this? We should add the test where we set replication / numReplicasPerPartition differently.

@@ -51,6 +52,10 @@ public void init(HelixManager helixManager, TableConfig tableConfig) {
SegmentsValidationAndRetentionConfig validationAndRetentionConfig = tableConfig.getValidationConfig();
Preconditions.checkState(validationAndRetentionConfig != null, "Validation Config is null");
_replication = validationAndRetentionConfig.getReplicationNumber();
// Number of replicas per partition of low-level consumers check is for the real time tables only
if (tableConfig.getTableType() == TableType.REALTIME) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the TODO comment that we can clean this up once the table config has the new API that picks up the correct replication depending on the table type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a todo and a unit test.

Copy link
Contributor

@sajjad-moradi sajjad-moradi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Could you add a unit test?

Comment on lines 54 to 58
_replication = validationAndRetentionConfig.getReplicationNumber();
// Number of replicas per partition of low-level consumers check is for the real time tables only
if (tableConfig.getTableType() == TableType.REALTIME) {
_replication = validationAndRetentionConfig.getReplicasPerPartitionNumber();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

_replication = tableConfig.getTableType() == TableType.REALTIME
    ? validationAndRetentionConfig.getReplicasPerPartitionNumber()
    : validationAndRetentionConfig.getReplicationNumber();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@GSharayu GSharayu force-pushed the new_master branch 3 times, most recently from 9edbf3f to ee3aab0 Compare November 16, 2022 23:33
Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@snleee snleee merged commit 6d49424 into apache:master Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants