Skip to content

[fix](tablet clone) fix tablet sched failed when tablet missing tag and version incomplete#22861

Merged
dataroaring merged 3 commits intoapache:masterfrom
yujun777:miss-tag-sched-failed
Aug 13, 2023
Merged

[fix](tablet clone) fix tablet sched failed when tablet missing tag and version incomplete#22861
dataroaring merged 3 commits intoapache:masterfrom
yujun777:miss-tag-sched-failed

Conversation

@yujun777
Copy link
Contributor

Proposed changes

Tablet sched will always failed for case below:

  1. At the beginning, 4 BE: A, B, C, D; replica alloc num=3; three replicas are on A, B, C;
  2. User change BE B/C/D to tag:foo, also change table and partitions to tag:foo;
  3. Tablet scheduler will migrate replica from A => D.
  4. Firstly, D clone a replica. During cloning, table loads data. After cloning, there are 4 replicas:
    a) A: version complete, tag mismatch;
    b) B: version complete, tag match;
    c) C: version complete, tag match;
    d) D: version incomplete, tag match;
  5. Tablet's aliveAndVersionComplete is ok(3 replicas: A,B,C), then it check if replicas of each tag are enough , it will found tag:foo 's replica is not engough, it has 2 replicas: B, C (cause the alloc map excludes version incompleted D), so this tablet's health status is REPLICA_MISSING_FOR_TAG;
  6. Scheduler will try to find another BE to clone a new replica, then it will failed.

The PR is to fix this. For a tag, if the alive replicas are enough, but version complete replicas are not enough, then the tablet's status should be VERSION_INCOMPLETE.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@yujun777
Copy link
Contributor Author

run buildall

@yujun777
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.96 seconds
stream load tsv: 511 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162394049 Bytes

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.11 seconds
stream load tsv: 510 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.5 seconds inserted 10000000 Rows, about 338K ops/s
storage size: 17161981474 Bytes

dataroaring
dataroaring previously approved these changes Aug 11, 2023
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 11, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yujun777
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 11, 2023
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 11, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.45 seconds
stream load tsv: 512 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17161991308 Bytes

@dataroaring dataroaring merged commit bff3b90 into apache:master Aug 13, 2023
xiaokang pushed a commit that referenced this pull request Aug 17, 2023
airborne12 pushed a commit to airborne12/apache-doris that referenced this pull request Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants