-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Improvement](tablet clone) impr tablet sched speed and fix tablet sched failed too many times #21856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
28fe805 to
cbc0f28
Compare
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
This reverts commit 0189a37.
3f87faf to
60ef2c1
Compare
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
yiguolei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…hed failed too many times (#21856)
…hed failed too many times (apache#21856)
Proposed changes
Issue Number: close #xxx
Impr tablet sched speed;
a. Many sched tasks are rather quickly, for example: begin a clone task、begin/end deleting a replica, change a replica to slowly, their cost time is far less than 1s. And the sched thread no need to wait 1s, so we change sched period from 1s to 100ms;
b. When sched ctx finished, check the tablet immedately. If it's unhealth, put a new ctx for this tablet into the pending queue. So it can repair quickly. No need to wait the TabletChecker's check.
c. If all the backends of a tablet are alive or decommission, then this tablet can put into sched pending queue immediately.
Run a test: 3 BE, each BE contains 1000 empty tablets, and decommission 1 BE. The old scheduler take 900s, the new scheduler took 40s.
Fix tablet sched too many times and may block forever.
a. Repair task no limit sched failed count. It may keep in the pending queue forever. It will stop adding other task added into the pending queue. For example, a decommission task may fail forever if its txn could not finish. So we limit the sched failed count;
b. A running clone task may also failed forever, so we also limit the running failed count;
Remove tablet sched ctx's dynamic priority.
The dynamic priority is hard to understand. Also it change rather slowly, it need a few minutes to change priority. This time is rather long. We remove the dynamic priority.
For a tablet, if add its balance task into pending queue. Then the repair task could not add into the queue. This may cause problem: If the tablet's unhealth, the balance task will fail, but next loop the balance task may be selected and add into the queue again. It will stop repairing this tablet.
So we add a fix. If add a balance ctx for a tablet, later if add a repair ctx for this tablet, the balance ctx will auto convert to a repair task.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...