Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-3355: Use supervisor.worker.shutdown.sleep.secs to set worker s… #2974

Merged
merged 1 commit into from Mar 15, 2019

Conversation

srdo
Copy link
Contributor

@srdo srdo commented Mar 12, 2019

…uicide delay to allow users to configure how long they're willing to wait for orderly shutdown

https://issues.apache.org/jira/browse/STORM-3355

This will not be useful on Windows, as there is no easy way that I can see for shutting down JVMs cleanly. I dug around for a while and it looks like the JVM devs also haven't found a reasonable way to do orderly shutdowns on Windows https://bugs.openjdk.java.net/browse/JDK-8056139. We can probably be excused for not doing better.

On Linux it works as expected. Tested by running ExclamationTopology with a cleanup that sleeps for 15 seconds, and setting the timeout parameter to 30 seconds. The worker was allowed to shut down cleanly.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my memory is correct, it was to ensure worker to try to shutdown earlier, and let supervisor kills worker when worker fails to shutdown. If we wait for same secs in both worker and supervisor, it would be non-deterministic.

I'm not sure why we don't rely on supervisor's kill and try to kill in both side. I might see the reason but unfortunately I can't remember. Do you know why?

@@ -127,7 +127,9 @@ public static void main(String[] args) throws Exception {
Worker worker = new Worker(conf, null, stormId, assignmentId, Integer.parseInt(supervisorPort),
Integer.parseInt(portStr), workerId);
worker.start();
Utils.addShutdownHookWithForceKillIn1Sec(worker::shutdown);
LOG.info("Adding shutdown hook with kill in {} secs", ObjectReader.getInt(conf.get(Config.SUPERVISOR_WORKER_SHUTDOWN_SLEEP_SECS)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObjectReader.getInt(conf.get(Config.SUPERVISOR_WORKER_SHUTDOWN_SLEEP_SECS)) will be executed twice once end users configure to print INFO or lower. Would be better to have separate variable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will fix

@srdo
Copy link
Contributor Author

srdo commented Mar 15, 2019

I'm not sure it's an issue that it's non-deterministic. The force kill only happens if the worker fails to shut down within the time limit, which should hopefully be rare. If the time limit is hit, does it matter whether the supervisor kill -9's the worker JVM, or whether the worker JVM suicides?

I assumed the reason we want the worker to suicide is in case the supervisor issues a soft kill, and then crashes. I'm not sure how the supervisor handles it if it is restarted and the worker is a "zombie" hanging in the middle of shutdown?

If that's not the reason, I'm not sure. Relying on the supervisor's kill might be fine?

…uicide delay to allow users to configure how long they're willing to wait for orderly shutdown
@HeartSaVioR
Copy link
Contributor

2c2570f

I found origin commit to introduce 1 sec suicide but Github fails to link the PR and I can't get the reason
of doing it from only commit message. Maybe it's simply (and blindly) added for all of JVM processes? Daemon processes definitely need it.

I wasn't sure about worker, but if we concern about supervisor being crashed just after issuing soft kill, that would make sense.

Btw, I think you're right. Once it reaches force kill it wouldn't matter who triggers it.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 The change is OK to me, and doesn't seem to bring any regression here.

@srdo
Copy link
Contributor Author

srdo commented Mar 15, 2019

The reason for replacing Runtime.halt with Runtime.exit everywhere is most likely that Runtime.exit invokes shutdown hooks before quitting, where Runtime.halt just kills the JVM immediately. It looks to me like that commit is an attempt to do best-effort cleanup (e.g. running Worker.shutdown) on shutdown, where the previous behavior was to just suicide the JVM immediately.

Since the behavior before that commit was for the worker to just die immediately on shutdown, I think the shutdown hook was probably added in the worker because that was the same way it was done for the daemons.

I'm still not sure the shutdown hook is necessary for the worker, but I don't think it hurts either.

Edit: Thanks for looking into why the hook is there btw.

@asfgit asfgit merged commit 60199bd into apache:master Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants