-
Notifications
You must be signed in to change notification settings - Fork 5k
Closed
Labels
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
Master-server often disconnect with zookeeper due to server down
log
[WARN] 2022-08-25 12:19:25.150 +0800 org.apache.zookeeper.ClientCnxn:[1108] - [WorkflowInstance-0][TaskInstance-0] - Client session timed out, have not heard from server in 20010ms for sessionid 0x1017e2ca6e10058
[INFO] 2022-08-25 12:19:44.757 +0800 org.apache.zookeeper.ClientCnxn:[1156] - [WorkflowInstance-0][TaskInstance-0] - Client session timed out, have not heard from server in 20010ms for sessionid 0x1017e2ca6e10058, closing socket connection and attempting reconnect
[INFO] 2022-08-25 12:19:44.859 +0800 org.apache.curator.framework.state.ConnectionStateManager:[251] - [WorkflowInstance-0][TaskInstance-0] - State change: SUSPENDED
[WARN] 2022-08-25 12:19:44.861 +0800 org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperConnectionStateListener:[52] - [WorkflowInstance-0][TaskInstance-0] - Registry suspended
[WARN] 2022-08-25 12:19:44.862 +0800 org.apache.dolphinscheduler.server.master.registry.MasterConnectionStateListener:[48] - [WorkflowInstance-0][TaskInstance-0] - registry connection state is SUSPENDED, ready to retry connection
[INFO] 2022-08-25 12:19:46.203 +0800 org.apache.zookeeper.ClientCnxn:[1025] - [WorkflowInstance-0][TaskInstance-0] - Opening socket connection to server development-7/192.168.120.17:2181. Will not attempt to authenticate using SASL (unknown error)
[INFO] 2022-08-25 12:19:46.204 +0800 org.apache.zookeeper.ClientCnxn:[879] - [WorkflowInstance-0][TaskInstance-0] - Socket connection established to development-7/192.168.120.17:2181, initiating session
[WARN] 2022-08-25 12:19:46.206 +0800 org.apache.zookeeper.ClientCnxn:[1285] - [WorkflowInstance-0][TaskInstance-0] - Unable to reconnect to ZooKeeper service, session 0x1017e2ca6e10058 has expired
[WARN] 2022-08-25 12:19:46.206 +0800 org.apache.curator.ConnectionState:[376] - [WorkflowInstance-0][TaskInstance-0] - Session expired event received
[INFO] 2022-08-25 12:19:46.206 +0800 org.apache.zookeeper.ClientCnxn:[1154] - [WorkflowInstance-0][TaskInstance-0] - Unable to reconnect to ZooKeeper service, session 0x1017e2ca6e10058 has expired, closing socket connection
[INFO] 2022-08-25 12:19:46.207 +0800 org.apache.zookeeper.ZooKeeper:[442] - [WorkflowInstance-0][TaskInstance-0] - Initiating client connection, connectString=192.168.120.17:2181 sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@16d41725
[INFO] 2022-08-25 12:19:46.207 +0800 org.apache.curator.framework.state.ConnectionStateManager:[251] - [WorkflowInstance-0][TaskInstance-0] - State change: LOST
[INFO] 2022-08-25 12:19:46.208 +0800 org.apache.zookeeper.ClientCnxn:[522] - [WorkflowInstance-0][TaskInstance-0] - EventThread shut down for session: 0x1017e2ca6e10058
[WARN] 2022-08-25 12:19:46.208 +0800 org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperConnectionStateListener:[44] - [WorkflowInstance-0][TaskInstance-0] - Registry disconnected
[INFO] 2022-08-25 12:19:46.209 +0800 org.apache.zookeeper.ClientCnxn:[1025] - [WorkflowInstance-0][TaskInstance-0] - Opening socket connection to server development-7/192.168.120.17:2181. Will not attempt to authenticate using SASL (unknown error)
[WARN] 2022-08-25 12:19:46.209 +0800 org.apache.dolphinscheduler.server.master.registry.MasterConnectionStateListener:[56] - [WorkflowInstance-0][TaskInstance-0] - registry connection state is DISCONNECTED, ready to stop myself
[INFO] 2022-08-25 12:19:46.209 +0800 org.apache.zookeeper.ClientCnxn:[879] - [WorkflowInstance-0][TaskInstance-0] - Socket connection established to development-7/192.168.120.17:2181, initiating session
[INFO] 2022-08-25 12:19:46.321 +0800 org.apache.zookeeper.ClientCnxn:[1299] - [WorkflowInstance-0][TaskInstance-0] - Session establishment complete on server development-7/192.168.120.17:2181, sessionid = 0x1017e2ca6e1005f, negotiated timeout = 30000
[INFO] 2022-08-25 12:19:46.321 +0800 org.apache.curator.framework.state.ConnectionStateManager:[251] - [WorkflowInstance-0][TaskInstance-0] - State change: RECONNECTED
[WARN] 2022-08-25 12:19:46.371 +0800 org.apache.dolphinscheduler.server.master.dispatch.host.LowerWeightHostManager:[136] - [WorkflowInstance-0][TaskInstance-0] - worker 192.168.120.17:1234 in work group default have not received the heartbeat
[WARN] 2022-08-25 12:19:46.488 +0800 org.apache.dolphinscheduler.server.master.MasterServer:[121] - [WorkflowInstance-0][TaskInstance-0] - MasterServer is already stopped, current cause: i was judged to death, release resources and stop myself
[WARN] 2022-08-25 12:19:46.489 +0800 org.apache.dolphinscheduler.server.master.MasterServer:[121] - [WorkflowInstance-0][TaskInstance-0] - MasterServer is already stopped, current cause: i was judged to death, release resources and stop myself
[WARN] 2022-08-25 12:19:46.489 +0800 org.apache.dolphinscheduler.server.master.MasterServer:[121] - [WorkflowInstance-0][TaskInstance-0] - MasterServer is already stopped, current cause: i was judged to death, release resources and stop myself
[WARN] 2022-08-25 12:19:46.490 +0800 org.apache.dolphinscheduler.server.master.MasterServer:[121] - [WorkflowInstance-0][TaskInstance-0] - MasterServer is already stopped, current cause: i was judged to death, release resources and stop myself
[INFO] 2022-08-25 12:19:49.209 +0800 org.apache.dolphinscheduler.server.master.MasterServer:[133] - [WorkflowInstance-0][TaskInstance-0] - Master server is stopping, current cause : registry connection state is DISCONNECTED, stop myself
[INFO] 2022-08-25 12:19:49.212 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-71661396192695 paused.
[INFO] 2022-08-25 12:19:49.220 +0800 org.eclipse.jetty.server.AbstractConnector:[381] - [WorkflowInstance-0][TaskInstance-0] - Stopped ServerConnector@16bd7ae1{HTTP/1.1, (http/1.1)}{0.0.0.0:5679}
[INFO] 2022-08-25 12:19:49.220 +0800 org.eclipse.jetty.server.session:[149] - [WorkflowInstance-0][TaskInstance-0] - node0 Stopped scavenging
[INFO] 2022-08-25 12:19:49.221 +0800 org.eclipse.jetty.server.handler.ContextHandler.application:[2347] - [WorkflowInstance-0][TaskInstance-0] - Destroying Spring FrameworkServlet 'dispatcherServlet'
[INFO] 2022-08-25 12:19:49.222 +0800 org.eclipse.jetty.server.handler.ContextHandler:[1153] - [WorkflowInstance-0][TaskInstance-0] - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@7f2b584b{application,/,[file:///tmp/jetty-docbase.5679.7084582183858887588/],STOPPED}
[INFO] 2022-08-25 12:19:49.226 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[109] - [WorkflowInstance-0][TaskInstance-0] - Closing Master RPC Server...
[INFO] 2022-08-25 12:19:49.228 +0800 org.apache.dolphinscheduler.remote.NettyRemotingServer:[212] - [WorkflowInstance-0][TaskInstance-0] - netty server closed
[INFO] 2022-08-25 12:19:49.228 +0800 org.apache.dolphinscheduler.server.master.rpc.MasterRPCServer:[111] - [WorkflowInstance-0][TaskInstance-0] - Closed Master RPC Server...
[INFO] 2022-08-25 12:19:49.229 +0800 org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - [WorkflowInstance-0][TaskInstance-0] - Shutting down Quartz Scheduler
[INFO] 2022-08-25 12:19:49.229 +0800 org.quartz.core.QuartzScheduler:[666] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-71661396192695 shutting down.
[INFO] 2022-08-25 12:19:49.229 +0800 org.quartz.core.QuartzScheduler:[585] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-71661396192695 paused.
[INFO] 2022-08-25 12:19:49.234 +0800 org.quartz.core.QuartzScheduler:[740] - [WorkflowInstance-0][TaskInstance-0] - Scheduler DolphinScheduler_$_development-71661396192695 shutdown complete.
[INFO] 2022-08-25 12:19:49.234 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[117] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopping...
[WARN] 2022-08-25 12:19:49.234 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[125] - [WorkflowInstance-0][TaskInstance-0] - State event loop service interrupted, will stop this loop
java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:121)
[INFO] 2022-08-25 12:19:49.234 +0800 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerBootstrap:[118] - [WorkflowInstance-0][TaskInstance-0] - Master schedule bootstrap stopped...
[INFO] 2022-08-25 12:19:49.235 +0800 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[132] - [WorkflowInstance-0][TaskInstance-0] - State event loop service stopped
[INFO] 2022-08-25 12:19:49.237 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[226] - [WorkflowInstance-0][TaskInstance-0] - Master node : 192.168.120.17:5678 unRegistry to register center.
[INFO] 2022-08-25 12:19:49.237 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[228] - [WorkflowInstance-0][TaskInstance-0] - MasterServer heartbeat executor shutdown
[INFO] 2022-08-25 12:19:49.240 +0800 org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - [WorkflowInstance-0][TaskInstance-0] - backgroundOperationsLoop exiting
[INFO] 2022-08-25 12:19:49.240 +0800 org.apache.zookeeper.ClientCnxn:[522] - [WorkflowInstance-0][TaskInstance-0] - EventThread shut down for session: 0x1017e2ca6e1005f
[INFO] 2022-08-25 12:19:49.240 +0800 org.apache.zookeeper.ZooKeeper:[693] - [WorkflowInstance-0][TaskInstance-0] - Session: 0x1017e2ca6e1005f closed
[INFO] 2022-08-25 12:19:49.241 +0800 org.apache.dolphinscheduler.server.master.processor.queue.TaskEventService:[126] - [WorkflowInstance-0][TaskInstance-0] - StateEventResponseWorker stopped
[INFO] 2022-08-25 12:19:49.245 +0800 com.zaxxer.hikari.HikariDataSource:[350] - [WorkflowInstance-0][TaskInstance-0] - DolphinScheduler - Shutdown initiated...
[INFO] 2022-08-25 12:19:49.250 +0800 com.zaxxer.hikari.HikariDataSource:[352] - [WorkflowInstance-0][TaskInstance-0] - DolphinScheduler - Shutdown completed.
What you expected to happen
run normal.
How to reproduce
start master-server register with zookeeper, wait some time.
Anything else
No response
Version
3.0.0
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable