[improvement](publish) Add publish task cumulation window and clone f… #27968

deardeng · 2023-12-04T12:41:48Z

…ailed window to check be's health state

Proposed changes

Add two windows to detect the health status of be, optimize the publish version and TableSchedule logic

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…ailed window to check be's health state

yujun777 · 2023-12-06T02:12:12Z

fe/fe-core/src/main/java/org/apache/doris/clone/TabletSchedCtx.java

            }

+            if (Config.create_new_replica_in_health_backends && !healthBes.isEmpty()
+                    && (Env.getCurrentSystemInfo().isLastPublishVersionAccumulated(be.getId())


no need consider publish version task accumulate here. just check clone failed a lot replica

yujun777 · 2023-12-06T02:15:21Z

fe/fe-core/src/main/java/org/apache/doris/clone/TabletSchedCtx.java

     * database lock should be held.
     */
-    public void chooseDestReplicaForVersionIncomplete(Map<Long, PathSlot> backendsWorkingSlots)
+    public void chooseDestReplicaForVersionIncomplete(Map<Long, PathSlot> backendsWorkingSlots, List<Long> healthBes)


use a boolean like 'skipAlwaysCloneFail'

yujun777 · 2023-12-06T02:33:54Z

fe/fe-core/src/main/java/org/apache/doris/clone/TabletScheduler.java

        stat.counterReplicaVersionMissingErr.incrementAndGet();
        try {
-            tabletCtx.chooseDestReplicaForVersionIncomplete(backendsWorkingSlots);
+            Set<Long> bes = Sets.newHashSet();


bool skipAlwaysCloneFail = Config.create_new_replica_in_health_backends && backends.stream().anyMatch(be -> be.isSchedAvailable && backend not in tablet.backends && backend.getTag() == tablet.getTag() && backend contains disk tablet's storage medium
);

yujun777 · 2023-12-06T02:36:58Z

fe/fe-core/src/main/java/org/apache/doris/clone/TabletScheduler.java

            tabletCtx.setErrMsg(e.getMessage());
            if (e.getStatus() == Status.RUNNING_FAILED) {
                tabletCtx.increaseFailedRunningCounter();
+                Env.getCurrentSystemInfo().updateControlMaps(tabletCtx.getSrcBackendId(),


can we check cloneTask return code is ok at line 1718

yujun777 · 2023-12-06T02:44:07Z

fe/fe-core/src/main/java/org/apache/doris/system/SystemInfoService.java

+        backendIdLastTimesIsAccumulated = ImmutableMap.copyOf(copiedMap);
+    }
+
+    public void updateControlMaps(Long backendId, Map<Long, Set<Long>> map) {


this set maybe too big ?

yujun777 · 2023-12-06T02:47:58Z

fe/fe-core/src/main/java/org/apache/doris/task/AgentTaskQueue.java

+        if (!tasks.containsRow(backendId) || !runningTasks.containsKey(TTaskType.PUBLISH_VERSION)) {
+            return;
+        }
+        Env.getCurrentSystemInfo().updateLastPublishVersionFailedMap(backendId,
+                runningTasks.get(TTaskType.PUBLISH_VERSION).size() > Config.publish_version_queued_limit_number);


delete tasks.containsRow(backendId)
because when a txn finish, fe will remove all be's publish task in agent queue. so task in fe may empty

yujun777 · 2023-12-06T02:53:07Z

fe/fe-core/src/main/java/org/apache/doris/transaction/PublishVersionDaemon.java

+            Set<Long> slowBes = Sets.newHashSet();
+            AtomicBoolean hasBackendAliveAndUnfinishedTask = new AtomicBoolean(false);
+            transactionState.getPublishVersionTasks().forEach((beId, task) -> {
+                if (task.isFinished()) {


combine these code.
bool uninishTaskIsDeadOrPublishSlow = false;
if (task.isFinish()) {
finishNum++
other ...
} else {
if (be.isDead or be.isPublishSlow()) {
uninishTaskIsDeadOrPublishSlow = true;
}
}

[improvement](publish) Add publish task cumulation window and clone f…

5cdcea1

…ailed window to check be's health state

deardeng marked this pull request as draft December 4, 2023 12:42

deardeng marked this pull request as ready for review December 4, 2023 12:42

deardeng marked this pull request as draft December 4, 2023 12:43

deardeng added 2 commits December 4, 2023 22:21

fix

973c4e6

fix

d91fef3

yujun777 reviewed Dec 6, 2023

View reviewed changes

deardeng closed this Dec 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[improvement](publish) Add publish task cumulation window and clone f… #27968

[improvement](publish) Add publish task cumulation window and clone f… #27968

Uh oh!

deardeng commented Dec 4, 2023

Uh oh!

yujun777 Dec 6, 2023

Uh oh!

yujun777 Dec 6, 2023 •

edited

Loading

Uh oh!

yujun777 Dec 6, 2023

Uh oh!

yujun777 Dec 6, 2023

Uh oh!

yujun777 Dec 6, 2023

Uh oh!

yujun777 Dec 6, 2023

Uh oh!

yujun777 Dec 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[improvement](publish) Add publish task cumulation window and clone f… #27968

[improvement](publish) Add publish task cumulation window and clone f… #27968

Uh oh!

Conversation

deardeng commented Dec 4, 2023

Proposed changes

Further comments

Uh oh!

yujun777 Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

yujun777 Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yujun777 Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

yujun777 Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

yujun777 Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

yujun777 Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

yujun777 Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yujun777 Dec 6, 2023 •

edited

Loading