Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Speed up shutdown time #838

Merged
merged 5 commits into from
Feb 12, 2019
Merged

Speed up shutdown time #838

merged 5 commits into from
Feb 12, 2019

Conversation

ajsutton
Copy link
Contributor

@ajsutton ajsutton commented Feb 12, 2019

PR description

EthScheduler executes a lot of tasks which wait for responses from the network and may have a significant number of tasks queued. Using shutdown would wait for all network responses and all queued tasks to complete before exiting which almost always reaches the 2 minute timeout allowed before switching to shutdownNow. All tasks have to cope with being unexpectedly terminated (as would happen with a kill -9) so there's no reason to have this extra delay.

PantheonCommand now also shuts down Log4J correctly to ensure log messages aren't lost. We disable Log4J's shutdown handler via a system property in our start script (which allows logging to work in our shutdown hook) so need to do this manually.

… instead of trying shutdown first.

EthScheduler executes a lot of tasks which wait for responses from the network and may have a significant number of tasks queued.  Using shutdown would wait for all network responses and all queued tasks to complete before exiting which almost always reaches the 2 minute timeout allowed before switching to shutdownNow.  All tasks have to cope with being unexpectedly terminated (as would happen with a kill -9) so there's no reason to have this extra delay.
@shemnon
Copy link
Contributor

shemnon commented Feb 12, 2019

Feels a bit heavy handed. My thought is being better at cleanup should be our first approach: #841

@ajsutton
Copy link
Contributor Author

It is heavy handed but Pantheon must be designed to handle being killed without any warning anyway so making normal shutdown heavy handed doesn't introduce any new requirements. It also frees us up to use blocking calls in our tasks which otherwise would delay shutdown.

@shemnon
Copy link
Contributor

shemnon commented Feb 12, 2019

Per your gist (https://gist.github.com/ajsutton/d8afc6fe7aeac94a2d82cfc98d6b8a6c) it looks like we are attempting to shut down in two separate threads. There are no more service threads in that stack so I wonder if we were actually shutting down in three threads and only one of the threads got notified.

@ajsutton
Copy link
Contributor Author

Yeah I couldn't understand why that one was hanging. The awaitTermination on schedulers should be rock solid regardless of how many threads are waiting for it to shut down.

@@ -55,9 +55,7 @@ public void shutdown_syncWorkerShutsDown() throws InterruptedException {
ethScheduler.stop();

assertThat(syncWorkerExecutor.isShutdown()).isTrue();
assertThat(syncWorkerExecutor.isTerminated()).isFalse();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to test for multiple tasks in the queue and assert they don't get executed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added.

@@ -19,11 +19,9 @@
public class MockEthTask extends AbstractEthTask<Object> {

private boolean executed = false;
private CountDownLatch countdown;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to verify non-execution for forced shutdown we will need the unblockable tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we needed it for EthSchedulerTest anyway - it just didn't need the countdown method.

@ajsutton ajsutton merged commit d779d9f into PegaSysEng:master Feb 12, 2019
@ajsutton ajsutton deleted the fast-shutdown branch February 12, 2019 22:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants