Skip to content

Conversation

@mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Jan 13, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings January 13, 2026 06:22
@Thearas
Copy link
Contributor

Thearas commented Jan 13, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mymeiyi mymeiyi changed the title [fix](fe) modify replicas in CloudTablet [fix](fe) modify replicas to replica in CloudTablet Jan 13, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Tablet class hierarchy to differentiate between cloud and local tablet implementations. The Tablet class is converted to an abstract class, with CloudTablet now storing a single replica (instead of a list) and LocalTablet maintaining the original list-based approach.

Changes:

  • Converted Tablet to an abstract class with abstract methods for replica management
  • Modified CloudTablet to use a single Replica field instead of a List, with backward-compatible deserialization
  • Created LocalTablet class to handle local (non-cloud) tablets with list-based replica management

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/catalog/Tablet.java Converted from concrete to abstract class, moved replica management methods to subclasses
fe/fe-core/src/main/java/org/apache/doris/cloud/catalog/CloudTablet.java Changed to use single Replica field with GsonPostProcessable for backward compatibility
fe/fe-core/src/main/java/org/apache/doris/catalog/LocalTablet.java New class with replica list management moved from Tablet
fe/fe-core/src/test/java/org/apache/doris/catalog/TabletTest.java Removed test for clearReplica method which no longer exists

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mymeiyi mymeiyi force-pushed the fix-cloud-replica-master branch from 7188e24 to bd94591 Compare January 13, 2026 06:56
@mymeiyi
Copy link
Contributor Author

mymeiyi commented Jan 13, 2026

run buildall

@mymeiyi mymeiyi force-pushed the fix-cloud-replica-master branch from bd94591 to 413f524 Compare January 13, 2026 07:06
@mymeiyi
Copy link
Contributor Author

mymeiyi commented Jan 13, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31628 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 413f5241b7ab908e86a8930e06de142ef469c713, data reload: false

------ Round 1 ----------------------------------
q1	17665	4184	4041	4041
q2	2096	365	270	270
q3	10067	1294	724	724
q4	10203	815	313	313
q5	7547	2095	1814	1814
q6	198	173	138	138
q7	948	807	653	653
q8	9280	1373	1129	1129
q9	4831	4531	4564	4531
q10	6758	1805	1426	1426
q11	524	299	292	292
q12	701	740	582	582
q13	17765	3854	3105	3105
q14	311	299	271	271
q15	579	522	507	507
q16	679	686	634	634
q17	652	770	512	512
q18	6591	6334	6825	6334
q19	1142	1082	613	613
q20	412	395	289	289
q21	3291	2711	2437	2437
q22	1171	1084	1013	1013
Total cold run time: 103411 ms
Total hot run time: 31628 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4298	4387	4259	4259
q2	359	430	338	338
q3	2275	2711	2417	2417
q4	1440	1917	1489	1489
q5	4413	4340	4413	4340
q6	214	185	128	128
q7	1988	1943	1761	1761
q8	2542	2480	2431	2431
q9	7176	7026	7101	7026
q10	2520	2782	2283	2283
q11	571	480	480	480
q12	707	804	608	608
q13	3626	4066	3304	3304
q14	275	296	258	258
q15	521	489	483	483
q16	622	663	601	601
q17	1102	1307	1308	1307
q18	7705	7207	7322	7207
q19	836	791	783	783
q20	1876	1959	1813	1813
q21	4413	4300	4059	4059
q22	1093	1038	1002	1002
Total cold run time: 50572 ms
Total hot run time: 48377 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172943 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 413f5241b7ab908e86a8930e06de142ef469c713, data reload: false

query5	4780	617	487	487
query6	348	225	216	216
query7	4223	467	246	246
query8	332	249	241	241
query9	8730	2854	2844	2844
query10	514	375	328	328
query11	15147	15116	14905	14905
query12	193	115	124	115
query13	1276	471	366	366
query14	6476	3025	2797	2797
query14_1	2665	2627	2668	2627
query15	210	198	171	171
query16	1002	485	393	393
query17	1088	707	589	589
query18	2696	435	333	333
query19	231	229	200	200
query20	132	124	119	119
query21	216	140	128	128
query22	3848	3998	3911	3911
query23	16066	15690	15344	15344
query23_1	15418	15718	15393	15393
query24	7186	1541	1139	1139
query24_1	1178	1156	1172	1156
query25	571	475	447	447
query26	1249	266	158	158
query27	2764	437	268	268
query28	4513	2169	2151	2151
query29	776	538	450	450
query30	342	243	212	212
query31	817	637	549	549
query32	85	77	75	75
query33	543	355	317	317
query34	906	879	514	514
query35	731	776	728	728
query36	881	906	801	801
query37	128	97	94	94
query38	2737	2686	2608	2608
query39	778	763	740	740
query39_1	712	711	705	705
query40	215	133	119	119
query41	71	65	60	60
query42	111	102	105	102
query43	460	423	409	409
query44	1312	739	735	735
query45	189	189	181	181
query46	836	922	557	557
query47	1474	1431	1322	1322
query48	286	336	244	244
query49	602	415	338	338
query50	600	271	205	205
query51	3741	3753	3738	3738
query52	108	115	96	96
query53	291	318	277	277
query54	284	279	252	252
query55	83	78	77	77
query56	313	332	319	319
query57	1021	995	967	967
query58	271	266	258	258
query59	2054	2224	1985	1985
query60	339	334	306	306
query61	162	157	155	155
query62	409	344	317	317
query63	308	265	272	265
query64	4825	1329	963	963
query65	3793	3715	3770	3715
query66	1384	435	326	326
query67	15651	15021	15037	15021
query68	5970	989	709	709
query69	513	353	325	325
query70	1061	899	946	899
query71	346	310	289	289
query72	6032	3367	3346	3346
query73	761	723	299	299
query74	8848	8782	8523	8523
query75	2786	2803	2435	2435
query76	3481	1064	654	654
query77	508	370	315	315
query78	9682	9746	9313	9313
query79	1516	892	563	563
query80	618	584	488	488
query81	514	262	228	228
query82	224	145	111	111
query83	266	257	240	240
query84	269	115	95	95
query85	892	517	452	452
query86	393	298	290	290
query87	2872	2843	2763	2763
query88	4073	2134	2099	2099
query89	385	350	329	329
query90	2241	170	159	159
query91	172	158	136	136
query92	89	71	72	71
query93	1717	904	524	524
query94	567	335	292	292
query95	586	382	325	325
query96	561	480	198	198
query97	2312	2416	2295	2295
query98	236	206	216	206
query99	592	568	501	501
Total cold run time: 253899 ms
Total hot run time: 172943 ms

}

return delete || !hasBackend;
if (replica.getVersion() <= newReplica.getVersion()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in cloud, On the cloud, there is only one replica; this function (isLatestReplicaAndDeleteOld) can probably be deleted.

Copy link
Contributor

@deardeng deardeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: [fix](fe) modify replicas to replica in CloudTablet

Summary

This PR refactors the Tablet class hierarchy to properly differentiate between cloud and local tablet implementations:

  1. Converting Tablet to an abstract class
  2. Moving replica management to subclasses (LocalTablet and CloudTablet)
  3. Changing CloudTablet to use a single Replica field instead of List<Replica> (since cloud mode only has one replica)

Overall Assessment: ⚠️ Needs Revision


Critical Issues

1. Missing hashCode() implementation violates Java contract 🔴

Both LocalTablet and CloudTablet override equals() but don't override hashCode(). This violates Java's equals/hashCode contract and will cause bugs if tablets are used in HashMap, HashSet, or any hash-based collections.

LocalTablet.java - needs:

@Override
public int hashCode() {
    return Objects.hash(id, replicas);
}

CloudTablet.java - needs:

@Override
public int hashCode() {
    return Objects.hash(id, replica);
}

2. Potential breaking change: deleteReplica/deleteReplicaByBackendId throw UnsupportedOperationException for CloudTablet 🔴

The base Tablet class now throws UnsupportedOperationException for these methods, and CloudTablet doesn't override them:

// Tablet.java
public boolean deleteReplica(Replica replica) {
    throw new UnsupportedOperationException("deleteReplica is not supported in Tablet");
}

These methods are called from:

  • TabletSchedCtx.java:934 and :968
  • ReportHandler.java:1124
  • InternalCatalog.java:1165

Please verify that these code paths are never executed in cloud mode, or implement these methods in CloudTablet.

3. readyToBeRepaired also throws UnsupportedOperationException 🔴

Same issue - this method is called from TabletScheduler, TabletChecker, and ColocateTableCheckerAndBalancer. If tablet repair is disabled in cloud mode, this should be documented; otherwise, implement it.


Medium Issues

4. Redundant null check in LocalTablet constructor 🟡

public LocalTablet(long tabletId) {
    super(tabletId);
    if (this.replicas == null) {  // Always true for a new instance
        this.replicas = new ArrayList<>();
    }
    ...
}

Consider using a field initializer or implementing GsonPostProcessable like CloudTablet does.

5. CloudTablet.getReplicas() creates new ArrayList on every call 🟡

@Override
public List<Replica> getReplicas() {
    if (replica == null) {
        return Lists.newArrayList();
    }
    return Lists.newArrayList(replica);
}

This allocates a new ArrayList on every call, which could be problematic in hot paths. Consider:

@Override
public List<Replica> getReplicas() {
    if (replica == null) {
        return Collections.emptyList();
    }
    return Collections.singletonList(replica);
}

6. PR title/description doesn't match scope 🟡

The actual changes are much broader than "modify replicas to replica in CloudTablet":

  • Making Tablet abstract
  • Moving 150+ lines of code to LocalTablet
  • Changing method signatures and contracts

The description should be updated to accurately reflect the architectural refactoring.


Minor Issues

7. isLatestReplicaAndDeleteOld could be simplified in CloudTablet 🟢

As @deardeng noted, since cloud only has one replica, this method could be inlined:

@Override
public void addReplica(Replica newReplica, boolean isRestore) {
    if (replica == null || replica.getVersion() <= newReplica.getVersion()) {
        this.replica = newReplica;
        if (!isRestore) {
            Env.getCurrentInvertedIndex().addReplica(id, newReplica);
        }
    }
}

Test Coverage Concern

The test changes only remove clearReplica() test. Given the significant refactoring:

  • No new tests for LocalTablet
  • No tests for CloudTablet.gsonPostProcess() migration
  • No tests verifying UnsupportedOperationException behavior

Recommendation

Hold merge until:

  1. Add hashCode() implementations to both LocalTablet and CloudTablet
  2. Verify or document that deleteReplica/deleteReplicaByBackendId/readyToBeRepaired are never called in cloud mode (or implement them)
  3. Update PR description to reflect actual scope of changes
  4. Consider the performance optimization for CloudTablet.getReplicas()

@dataroaring
Copy link
Contributor

PR Review: [fix](fe) modify replicas to replica in CloudTablet

Summary

This PR refactors the Tablet class hierarchy to properly differentiate between cloud and local tablet implementations:

  1. Converting Tablet to an abstract class
  2. Moving replica management to subclasses (LocalTablet and CloudTablet)
  3. Changing CloudTablet to use a single Replica field instead of List<Replica> (since cloud mode only has one replica)

Overall Assessment: ⚠️ Needs Revision


Critical Issues

1. Missing hashCode() implementation violates Java contract 🔴

Both LocalTablet and CloudTablet override equals() but do not override hashCode(). This violates Java equals/hashCode contract and will cause bugs if tablets are used in HashMap, HashSet, or any hash-based collections.

LocalTablet.java - needs:

@Override
public int hashCode() {
    return Objects.hash(id, replicas);
}

CloudTablet.java - needs:

@Override
public int hashCode() {
    return Objects.hash(id, replica);
}

2. Potential breaking change: deleteReplica/deleteReplicaByBackendId throw UnsupportedOperationException for CloudTablet 🔴

The base Tablet class now throws UnsupportedOperationException for these methods, and CloudTablet does not override them:

// Tablet.java
public boolean deleteReplica(Replica replica) {
    throw new UnsupportedOperationException("deleteReplica is not supported in Tablet");
}

These methods are called from:

  • TabletSchedCtx.java:934 and :968
  • ReportHandler.java:1124
  • InternalCatalog.java:1165

Please verify that these code paths are never executed in cloud mode, or implement these methods in CloudTablet.

3. readyToBeRepaired also throws UnsupportedOperationException 🔴

Same issue - this method is called from TabletScheduler, TabletChecker, and ColocateTableCheckerAndBalancer. If tablet repair is disabled in cloud mode, this should be documented; otherwise, implement it.


Medium Issues

4. Redundant null check in LocalTablet constructor 🟡

public LocalTablet(long tabletId) {
    super(tabletId);
    if (this.replicas == null) {  // Always true for a new instance
        this.replicas = new ArrayList<>();
    }
    ...
}

Consider using a field initializer or implementing GsonPostProcessable like CloudTablet does.

5. CloudTablet.getReplicas() creates new ArrayList on every call 🟡

@Override
public List<Replica> getReplicas() {
    if (replica == null) {
        return Lists.newArrayList();
    }
    return Lists.newArrayList(replica);
}

This allocates a new ArrayList on every call, which could be problematic in hot paths. Consider:

@Override
public List<Replica> getReplicas() {
    if (replica == null) {
        return Collections.emptyList();
    }
    return Collections.singletonList(replica);
}

6. PR title/description does not match scope 🟡

The actual changes are much broader than "modify replicas to replica in CloudTablet":

  • Making Tablet abstract
  • Moving 150+ lines of code to LocalTablet
  • Changing method signatures and contracts

The description should be updated to accurately reflect the architectural refactoring.


Minor Issues

7. isLatestReplicaAndDeleteOld could be simplified in CloudTablet 🟢

As @deardeng noted, since cloud only has one replica, this method could be inlined:

@Override
public void addReplica(Replica newReplica, boolean isRestore) {
    if (replica == null || replica.getVersion() <= newReplica.getVersion()) {
        this.replica = newReplica;
        if (!isRestore) {
            Env.getCurrentInvertedIndex().addReplica(id, newReplica);
        }
    }
}

Test Coverage Concern

The test changes only remove clearReplica() test. Given the significant refactoring:

  • No new tests for LocalTablet
  • No tests for CloudTablet.gsonPostProcess() migration
  • No tests verifying UnsupportedOperationException behavior

Recommendation

Hold merge until:

  1. Add hashCode() implementations to both LocalTablet and CloudTablet
  2. Verify or document that deleteReplica/deleteReplicaByBackendId/readyToBeRepaired are never called in cloud mode (or implement them)
  3. Update PR description to reflect actual scope of changes
  4. Consider the performance optimization for CloudTablet.getReplicas()

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 15, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit d11a318 into apache:master Jan 15, 2026
29 of 32 checks passed
yiguolei pushed a commit that referenced this pull request Jan 15, 2026
### What problem does this PR solve?

#59814 modify `replicas` to
`replica` in `CloudTablet`, and will merge into 4.1.
This pr add compatible code for CloudTablet to make 4.1 can downgrade to
4.0.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants