Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fe will notify new FE type transfer and quit unexpected #2357

Closed
HangyuanLiu opened this issue Dec 3, 2019 · 1 comment
Closed

Fe will notify new FE type transfer and quit unexpected #2357

HangyuanLiu opened this issue Dec 3, 2019 · 1 comment
Assignees
Labels
bug

Comments

@HangyuanLiu
Copy link
Contributor

@HangyuanLiu HangyuanLiu commented Dec 3, 2019

Describe the bug
When the fe pressure is large, fe will notify new FE type transfer and quit

To Reproduce
2019-12-03 01:23:04,052 WARN 53 [Catalog.notifyNewFETypeTransfer():2152] notify new FE type transfer: UNKNOWN
2019-12-03 01:23:04,053 INFO 65 [Catalog$4.runOneCycle():2175] begin to transfer FE type from MASTER to UNKNOWN
2019-12-03 01:23:04,053 ERROR 65 [Catalog$4.runOneCycle():2252] transfer FE type from MASTER to UNKNOWN. exit

@morningman morningman self-assigned this Dec 3, 2019
@morningman morningman added the bug label Dec 3, 2019
@morningman

This comment has been minimized.

Copy link
Contributor

@morningman morningman commented Dec 3, 2019

The timeline for this question is as follows:

  1. For some reason, the master have lost contact with the other two followers . Judging from the logs of the master, for almost 40 seconds, the master did not print any logs. It is suspected that it is stuck due to full gc or other reasons, causing the other two followers to think that the master has been disconnected.

  2. After the other two followers re-elected, they continued to provide services.

  3. The master node is manually restarted afterwards. When restarting it for the first time, it needs to rollback some committed logs, so it needs to be closed and restarted again. After restarting again, it returns to normal.

The main reason is that the master got stuck for 40 seconds for some reason. This issue requires further observation.

At the same time, in order to alleviate this problem, we decided to set bdbje's heartbeat timeout as a configurable value. The default is 30 seconds. Can be configured to 1 minute, try to avoid this problem first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.