-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](fe) modify replicas to replica in CloudTablet #59814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the Tablet class hierarchy to differentiate between cloud and local tablet implementations. The Tablet class is converted to an abstract class, with CloudTablet now storing a single replica (instead of a list) and LocalTablet maintaining the original list-based approach.
Changes:
- Converted Tablet to an abstract class with abstract methods for replica management
- Modified CloudTablet to use a single Replica field instead of a List, with backward-compatible deserialization
- Created LocalTablet class to handle local (non-cloud) tablets with list-based replica management
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/catalog/Tablet.java | Converted from concrete to abstract class, moved replica management methods to subclasses |
| fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudTablet.java | Changed to use single Replica field with GsonPostProcessable for backward compatibility |
| fe/fe-core/src/main/java/org/apache/doris/catalog/LocalTablet.java | New class with replica list management moved from Tablet |
| fe/fe-core/src/test/java/org/apache/doris/catalog/TabletTest.java | Removed test for clearReplica method which no longer exists |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudTablet.java
Show resolved
Hide resolved
7188e24 to
bd94591
Compare
|
run buildall |
bd94591 to
413f524
Compare
|
run buildall |
TPC-H: Total hot run time: 31628 ms |
TPC-DS: Total hot run time: 172943 ms |
| } | ||
|
|
||
| return delete || !hasBackend; | ||
| if (replica.getVersion() <= newReplica.getVersion()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in cloud, On the cloud, there is only one replica; this function (isLatestReplicaAndDeleteOld) can probably be deleted.
deardeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review: [fix](fe) modify replicas to replica in CloudTablet
Summary
This PR refactors the Tablet class hierarchy to properly differentiate between cloud and local tablet implementations:
- Converting
Tabletto an abstract class - Moving replica management to subclasses (
LocalTabletandCloudTablet) - Changing
CloudTabletto use a singleReplicafield instead ofList<Replica>(since cloud mode only has one replica)
Overall Assessment: ⚠️ Needs Revision
Critical Issues
1. Missing hashCode() implementation violates Java contract 🔴
Both LocalTablet and CloudTablet override equals() but don't override hashCode(). This violates Java's equals/hashCode contract and will cause bugs if tablets are used in HashMap, HashSet, or any hash-based collections.
LocalTablet.java - needs:
@Override
public int hashCode() {
return Objects.hash(id, replicas);
}CloudTablet.java - needs:
@Override
public int hashCode() {
return Objects.hash(id, replica);
}2. Potential breaking change: deleteReplica/deleteReplicaByBackendId throw UnsupportedOperationException for CloudTablet 🔴
The base Tablet class now throws UnsupportedOperationException for these methods, and CloudTablet doesn't override them:
// Tablet.java
public boolean deleteReplica(Replica replica) {
throw new UnsupportedOperationException("deleteReplica is not supported in Tablet");
}These methods are called from:
TabletSchedCtx.java:934and:968ReportHandler.java:1124InternalCatalog.java:1165
Please verify that these code paths are never executed in cloud mode, or implement these methods in CloudTablet.
3. readyToBeRepaired also throws UnsupportedOperationException 🔴
Same issue - this method is called from TabletScheduler, TabletChecker, and ColocateTableCheckerAndBalancer. If tablet repair is disabled in cloud mode, this should be documented; otherwise, implement it.
Medium Issues
4. Redundant null check in LocalTablet constructor 🟡
public LocalTablet(long tabletId) {
super(tabletId);
if (this.replicas == null) { // Always true for a new instance
this.replicas = new ArrayList<>();
}
...
}Consider using a field initializer or implementing GsonPostProcessable like CloudTablet does.
5. CloudTablet.getReplicas() creates new ArrayList on every call 🟡
@Override
public List<Replica> getReplicas() {
if (replica == null) {
return Lists.newArrayList();
}
return Lists.newArrayList(replica);
}This allocates a new ArrayList on every call, which could be problematic in hot paths. Consider:
@Override
public List<Replica> getReplicas() {
if (replica == null) {
return Collections.emptyList();
}
return Collections.singletonList(replica);
}6. PR title/description doesn't match scope 🟡
The actual changes are much broader than "modify replicas to replica in CloudTablet":
- Making
Tabletabstract - Moving 150+ lines of code to
LocalTablet - Changing method signatures and contracts
The description should be updated to accurately reflect the architectural refactoring.
Minor Issues
7. isLatestReplicaAndDeleteOld could be simplified in CloudTablet 🟢
As @deardeng noted, since cloud only has one replica, this method could be inlined:
@Override
public void addReplica(Replica newReplica, boolean isRestore) {
if (replica == null || replica.getVersion() <= newReplica.getVersion()) {
this.replica = newReplica;
if (!isRestore) {
Env.getCurrentInvertedIndex().addReplica(id, newReplica);
}
}
}Test Coverage Concern
The test changes only remove clearReplica() test. Given the significant refactoring:
- No new tests for
LocalTablet - No tests for
CloudTablet.gsonPostProcess()migration - No tests verifying
UnsupportedOperationExceptionbehavior
Recommendation
Hold merge until:
- Add
hashCode()implementations to bothLocalTabletandCloudTablet - Verify or document that
deleteReplica/deleteReplicaByBackendId/readyToBeRepairedare never called in cloud mode (or implement them) - Update PR description to reflect actual scope of changes
- Consider the performance optimization for
CloudTablet.getReplicas()
PR Review:
|
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
### What problem does this PR solve? #59814 modify `replicas` to `replica` in `CloudTablet`, and will merge into 4.1. This pr add compatible code for CloudTablet to make 4.1 can downgrade to 4.0. Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)