Skip to content

[improve](env) Ensure next majority is met before drop an alive follower#28101

Merged
yiguolei merged 1 commit intoapache:masterfrom
w41ter:improve/check_quorum_before_drop_frontend
Dec 8, 2023
Merged

[improve](env) Ensure next majority is met before drop an alive follower#28101
yiguolei merged 1 commit intoapache:masterfrom
w41ter:improve/check_quorum_before_drop_frontend

Conversation

@w41ter
Copy link
Contributor

@w41ter w41ter commented Dec 7, 2023

Proposed changes

Issue Number: close #xxx

Here is an example:

mysql> ALTER SYSTEM DROP FOLLOWER "127.0.0.1:19017";
ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to drop this alive
follower, because the quorum requirements are not met after this command
execution. Current num alive followers 2, num followers 3, majority after
execution 2

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@w41ter
Copy link
Contributor Author

w41ter commented Dec 7, 2023

run buildall

for (Frontend fe : frontends.values()) {
if (fe.getRole() == FrontendNodeType.FOLLOWER) {
numFollower += 1;
if (fe.isAlive()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alive status is set by heart mgr, the latency time is 5 seconds, In some situation, it can not detect fe is down ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only responsible for intercepting common cases. Some extreme cases cannot be guaranteed.

@w41ter w41ter force-pushed the improve/check_quorum_before_drop_frontend branch from 5662195 to 6a9cdbc Compare December 7, 2023 06:18
@w41ter
Copy link
Contributor Author

w41ter commented Dec 7, 2023

run buildall

Here is an example:

```
mysql> ALTER SYSTEM DROP FOLLOWER "127.0.0.1:19017";
ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to drop this alive
follower, because the quorum requirements are not met after this command
execution. Current num alive followers 2, num followers 3, majority after
execution 2
```
@w41ter w41ter force-pushed the improve/check_quorum_before_drop_frontend branch from 6a9cdbc to 5283adf Compare December 7, 2023 08:45
@w41ter
Copy link
Contributor Author

w41ter commented Dec 7, 2023

run buildall

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 5283adf6ee08e1ab22ebfaba1a27642ffd7e1562, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4711	4521	4512	4512
q2	372	139	161	139
q3	1466	1218	1242	1218
q4	1105	934	873	873
q5	3201	3212	3188	3188
q6	249	129	128	128
q7	997	495	484	484
q8	2215	2243	2201	2201
q9	6663	6687	6655	6655
q10	3220	3263	3275	3263
q11	321	203	200	200
q12	356	209	212	209
q13	4550	3805	3784	3784
q14	243	214	223	214
q15	568	524	521	521
q16	449	389	394	389
q17	1016	613	541	541
q18	7640	7161	6870	6870
q19	1528	1394	1428	1394
q20	567	300	932	300
q21	3102	2669	2729	2669
q22	362	294	289	289
Total cold run time: 44901 ms
Total hot run time: 40041 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4415	4425	4403	4403
q2	274	163	173	163
q3	3541	3531	3537	3531
q4	2392	2402	2378	2378
q5	5764	5739	5749	5739
q6	240	121	120	120
q7	2374	1876	1886	1876
q8	3528	3516	3530	3516
q9	9079	8993	9059	8993
q10	3908	4004	3995	3995
q11	509	370	403	370
q12	769	602	602	602
q13	4281	3583	3563	3563
q14	291	268	265	265
q15	582	529	520	520
q16	518	455	488	455
q17	1899	1859	1883	1859
q18	8792	8188	8300	8188
q19	1759	1721	1745	1721
q20	2269	1956	1964	1956
q21	6537	6175	6180	6175
q22	502	415	448	415
Total cold run time: 64223 ms
Total hot run time: 60803 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.72 seconds
stream load tsv: 584 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17194698137 Bytes

@w41ter w41ter requested a review from swjtu-zhanglei December 8, 2023 01:56
@yiguolei yiguolei merged commit 99b38dd into apache:master Dec 8, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…wer (apache#28101)

Here is an example:

```
mysql> ALTER SYSTEM DROP FOLLOWER "127.0.0.1:19017";
ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to drop this alive
follower, because the quorum requirements are not met after this command
execution. Current num alive followers 2, num followers 3, majority after
execution 2
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants