Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alter]Schema Change in a big table will cause FE timeout and cancel this job #2942

Closed
WingsGo opened this issue Feb 19, 2020 · 0 comments
Closed

Comments

@WingsGo
Copy link
Contributor

WingsGo commented Feb 19, 2020

Describe the bug
When I doing a schema change job in a table which has 147 partitions,each partitions have 175 tablets,in pending stage,there is a error occurs, the error msg is Create replicas failed. Error: Error replicas: 15072977=32666105, 15072977=32672253,15072977=32653793. I go to FE source code and found the following code,it means that if be create tablet execcess the max timeout(1 min) will cause this error,so i search more info in be.info, try to found out if there is something error when create replicas or just only create replicas is timeout.

https://github.com/apache/incubator-doris/blob/87a84a793e0c66a2c3286a8d3adde3a273462f33/fe/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java#L249-L279

as shown in the last screenshot, the task starts at 00:00:31,ends at 00:01:32,so I go to the be machine , the last information as following,it means that the create replicas task finish and erase all task from queue at 00:01:33:88. So I think we should change the max timeout as configurable to avoid this case, and I will add a PR later.

I0218 00:01:33.876698 37320 tablet_manager.cpp:277] begin to process create tablet. tablet=32677985, schema_hash=1328100050
I0218 00:01:33.876894 37319 task_worker_pool.cpp:328] finish task success. result:1
I0218 00:01:33.876904 37319 task_worker_pool.cpp:286] type: CREATE, signature: 32677937, has been erased, queue size: 1
I0218 00:01:33.878705 37320 tablet_manager.cpp:1329] next_unique_id:267
I0218 00:01:33.879608 37320 tablet.cpp:333] no rowset for version:0-1, tablet: 32677985.1328100050.e24345ecc852edf0-4814d0d8717e04b9
I0218 00:01:33.879624 37320 tablet_manager.cpp:379] this request is for alter tablet request v2, so that not add alter task to tablet
I0218 00:01:33.880174 37320 tablet_meta_manager.cpp:115] save tablet meta , key:tabletmeta_32677985_1328100050 meta_size=11029
I0218 00:01:33.880295 37320 tablet_manager.cpp:437] finish to process create tablet. res=0
I0218 00:01:33.880302 37320 tablet_manager.cpp:324] finish to process create tablet. res=0
I0218 00:01:33.880513 37320 task_worker_pool.cpp:328] finish task success. result:1
I0218 00:01:33.880523 37320 task_worker_pool.cpp:286] type: CREATE, signature: 32677985, has been erased, queue size: 0

Screenshots

Snip20200219_1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants