Don't wait joinThread when stopping #8359

bleskes · 2014-11-06T10:22:46Z

When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. However, the joining thread is part of a thread pool and will not stop until the thread pool is shutdown.

Another issue raised by the unneeded wait is that when we shutdown, we may ping ourselves - which results in an ugly warn level log. We now log all remote exception during pings at a debug level.

When a node stops, we cancel any ongoing join process. With elastic#8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits. Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level.

martijnvg · 2014-11-06T10:32:56Z

LGTM

bleskes · 2014-11-06T10:55:26Z

pull request took the wrong approach. I reverted it from master and re-opening.

martijnvg · 2014-11-06T11:31:30Z

LGTM

When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. However, the joining thread is part of a thread pool and will not stop until the thread pool is shutdown. Another issue raised by the unneeded wait is that when we shutdown, we may ping ourselves - which results in an ugly warn level log. We now log all remote exception during pings at a debug level. Closes #8359

s1monw · 2014-12-09T11:00:11Z

@bleskes should we push this to 1.4 too? We just timed out on a node here:

http://build-us-00.elasticsearch.org/job/es_core_14_suse/144/

When a node stops, we cancel any ongoing join process. With #8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits. Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level. Closes #8359

bleskes · 2014-12-11T14:59:23Z

@s1monw done.

s1monw · 2014-12-11T15:05:26Z

thx

When a node stops, we cancel any ongoing join process. With elastic#8327, we improved this logic and wait for it to complete before shutting down the node. In our tests we typically shutdown an entire cluster at once, which makes it very likely for nodes to be joining while shutting down. This introduces a race condition where the joinThread.interrupt can happen before the thread starts waiting on pings which causes shutdown logic to be slow. This commits improves by repeatedly trying to stop the thread in smaller waits. Another side effect of the change is that we are now more likely to ping ourselves while shutting down, we results in an ugly warn level log. We now log all remote exception during pings at a debug level. Closes elastic#8359

bleskes added v1.5.0 :Core/Infra/Core Core issues without another label v2.0.0-beta1 >enhancement review labels Nov 6, 2014

bleskes closed this in 83d9dab Nov 6, 2014

don't wait on joinThread at all. it's a thread pool thread.

00955da

bleskes reopened this Nov 6, 2014

bleskes changed the title ~~Discovery: a more lenient wait joinThread when stopping~~ Discovery: don't wait joinThread when stopping Nov 6, 2014

bleskes closed this in 9192219 Nov 7, 2014

bleskes added the v1.4.2 label Dec 11, 2014

clintongormley changed the title ~~Discovery: don't wait joinThread when stopping~~ Discovery: Don't wait joinThread when stopping Dec 16, 2014

clintongormley removed the review label Mar 19, 2015

clintongormley added :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure and removed :Core/Infra/Core Core issues without another label labels Jun 6, 2015

clintongormley changed the title ~~Discovery: Don't wait joinThread when stopping~~ Don't wait joinThread when stopping Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't wait joinThread when stopping #8359

Don't wait joinThread when stopping #8359

bleskes commented Nov 6, 2014

martijnvg commented Nov 6, 2014

bleskes commented Nov 6, 2014

martijnvg commented Nov 6, 2014

s1monw commented Dec 9, 2014

bleskes commented Dec 11, 2014

s1monw commented Dec 11, 2014

Don't wait joinThread when stopping #8359

Don't wait joinThread when stopping #8359

Conversation

bleskes commented Nov 6, 2014

martijnvg commented Nov 6, 2014

bleskes commented Nov 6, 2014

martijnvg commented Nov 6, 2014

s1monw commented Dec 9, 2014

bleskes commented Dec 11, 2014

s1monw commented Dec 11, 2014