Invoker graceful shutdown and drain mode #3704

dubee · 2018-05-24T18:16:26Z

Allows an invoker to manually be marked as down so that maintenance can be performed on the invoker. When in maintenance mode, the invoker will finish executing all actions currently running and finish processing its queued actions. Since an invoker in maintenance mode will appear as down to the load balancer, new actions will not be queued up for the invoker. The invoker will exit maintenance mode when it is restarted.

Related to #3681

Description

Related issue and scope

I opened an issue to propose and discuss this change (#????)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

codecov-io · 2018-05-24T18:51:05Z

Codecov Report

Merging #3704 into master will increase coverage by 0.44%.
The diff coverage is 13.46%.

@@            Coverage Diff             @@
##           master    #3704      +/-   ##
==========================================
+ Coverage   74.34%   74.79%   +0.44%     
==========================================
  Files         128      128              
  Lines        6081     6129      +48     
  Branches      398      387      -11     
==========================================
+ Hits         4521     4584      +63     
+ Misses       1560     1545      -15

Impacted Files	Coverage Δ
.../scala/src/main/scala/whisk/common/Scheduler.scala	`100% <ø> (ø)`	⬆️
...er/src/main/scala/whisk/core/invoker/Invoker.scala	`0% <ø> (ø)`	⬆️
...ain/scala/whisk/core/invoker/InvokerReactive.scala	`0% <0%> (ø)`	⬆️
...scala/whisk/core/containerpool/ContainerPool.scala	`84.88% <0%> (-1%)`	⬇️
...n/scala/whisk/core/connector/MessageConsumer.scala	`75.28% <43.75%> (-8.29%)`	⬇️
...cala/src/main/scala/whisk/http/ErrorResponse.scala	`89.77% <0%> (+1.13%)`	⬆️
...on/scala/src/main/scala/whisk/common/Logging.scala	`87.35% <0%> (+1.14%)`	⬆️
...ain/scala/whisk/core/entity/ActivationResult.scala	`92.3% <0%> (+1.53%)`	⬆️
...in/scala/whisk/utils/ExecutionContextFactory.scala	`100% <0%> (+7.69%)`	⬆️
...rc/main/scala/whisk/core/entity/ExecManifest.scala	`96.34% <0%> (+14.63%)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e5a5e8...3c5855f. Read the comment docs.

chetanmeh · 2018-05-25T04:51:23Z

I believe this is first management api over REST which allows change of state of system. Does this api has some form of security enabled?

rabbah · 2018-05-25T12:58:22Z

+1 @chetanmeh

The administrative interfaces should have authentication in place. While at it, we should apply the same to the /invokers route in the controller perhaps.

markusthoemmes · 2018-05-27T15:52:37Z

What do we gain over docker stoping the container in this case?

Have you thought about using kill-signal based functionality here to integrate better with tools like kube?

csantanapr · 2018-05-27T15:55:10Z

I vote for not using url I consider this anti pattern.

I preferred os.Signal for use cases that “docker CLI” doesn’t cover CLI already if any.

chetanmeh · 2018-05-28T05:09:17Z

Both Kubernetes and Mesos support graceful shutdown via SIGTERM followed by some kind of graceful shutdown timeout (~ 30 secs and configurable). Only caveat here is that the shell shipped by default with Alpine Linux does not pass signals to children so we would need to launch with bash (currently we do install bash in common Docker base image)

So if we can ensure that Invoker can finish up shutdown in this time window then we should rely on this support. If at all we need to expose a url endpoint then we can secure it via leveraging same credentials which we use for JMX authentication (either directly or probably via some form of JAAS auth)

dubee

@csantanapr, @chetanmeh, I pushed a second commit that uses sigterm to put the invoker in maintenance mode. To put an invoker in this mode, execute: docker kill --signal=USR2 invoker0.

I think it is important to note that maintenance mode and a graceful shutdown are two different features. A graceful shutdown would have to check to see if the invoker's queue is empty and that there are no user containers running, which I think is useful. Whereas maintenance just marks the invoker as down in the load balancer.

dubee · 2018-05-29T20:47:33Z

core/invoker/src/main/scala/whisk/core/invoker/Invoker.scala

+            case "USR2" =>
+              logger.info(this, s"Stopping health scheduler")
+              actorSystem.stop(healthScheduler)
+              while (true) {}


For a graceful shutdown here, we would need to make sure user containers are not running and possibly check that the queue is empty.

By queue do you mean the internal message feed or the kafka topic? The topic doesn't need to be empty as that can take an indeterminate amount of time --- bringing up another invoker to take over the topic would address the messages held in the topic.

For the Kafka topic, I mean. Need to find a way to make the invoker stop consuming messages from the topic. Will take a look.

For the above, might just be able to shutdown the activationFeed actor.

I thought the feed had a shutdown hook. It is worth adding one.

Looks like there is a shutdown hook for containerFactory. Perhaps maintenance mode is not necessary if we have a graceful shutdown.

rabbah · 2018-05-29T20:50:55Z

@chetanmeh 30s may not be enough time to drain all the request live in memory since each request could be for the max allowed duration. This assumes draining means execute the outstanding requests. It could also mean post these requests back to the topic and let some other invoker deal with them. For that, 30s is likely more suitable.

markusthoemmes · 2018-05-30T04:34:11Z

Correct, maintenance mode and graceful shutdown are different things. What's the benefit of this extra kill signal vs. just killing the whole invoker? Running actions will die either way.

dubee · 2018-05-30T15:40:47Z

@markusthoemmes, as long as the thread is busy where we hook the signal the invoker will not shutdown. Currently there is just an infinite loop there, but a graceful shutdown would keep the thread busy until a shutdown can happen.

dubee · 2018-06-01T22:00:56Z

core/invoker/src/main/scala/whisk/core/invoker/InvokerReactive.scala

+    }
+  })
+
+  Signal.handle(


Code is still a little clunky, but I think the behavior is right. Ie, stopping the health scheduler, stopping the Kafka message consumer, and waiting for the user containers to finish.

dubee · 2018-06-04T20:50:49Z

@dgrove-oss, do you think these changes will work with Kube's graceful shutdown feature now?

dubee · 2018-06-04T20:52:43Z

@dgrove-oss, I am using docker kill --signal=TERM invoker0 to kick off the invoker graceful shutdown. This should be equivalent of kill 1 from within the invoker container.

dgrove-oss · 2018-06-04T21:00:56Z

This looks pretty good to me. I think we'd just need to tweak invoker.yaml to set the container'sterminationGracePeriodSeconds to a bigger value than its default of 30 seconds to make sure we give all the user containers the time they need to finish cleanly. Maybe set the grace period for kubernetes to the max time for a user function invocation + a small fudge factor?

rabbah · 2018-06-04T21:20:11Z

Max duration isn’t enough since there could be requests queued still in memory (feeds are double buffered).

markusthoemmes · 2018-06-05T05:50:44Z

I'm wondering if there is a race condition possible: Can the pool be intermittently non-busy while getting new messages from the feed? That is: Should we attach this condition to the feed as in:

Shut down the feed.
Wait until the feed has no more messages and has received a "Processed" message for every message it already handed out.

rabbah · 2018-06-05T06:03:58Z

There’s no other way to know the invoker has quiesced - but a processed message isn’t enough since the pool releases the resource (sends the message) before all the book keeping has finished. For example outstanding db operations might still be in flight to record the activation metadata.

markusthoemmes · 2018-06-05T08:53:20Z

core/invoker/src/main/scala/whisk/core/invoker/InvokerReactive.scala

+    while (Await.result(pool ? Busy, timeout.duration).asInstanceOf[Boolean] == true) {
+      logging.info(this, s"Container pool is busy")
+      Thread.sleep(1000)
+    }


WDYT about applying some Future composition here to simplify the flow, like

/** Polls the pools status and returns a future which completes once the pool is idle. */ def waitForContainerPoolIdle(pool: ActorRef): Future[Unit] = { (pool ? Busy).mapTo[Boolean].flatMap { case false => akka.pattern.after(1.second, system.scheduler)(waitForContainerPoolIdle(pool)) case true => Future.successful(()) }.recoverWith { case _ => akka.pattern.after(1.second, system.scheduler) } } val shutdowns = Seq( gracefulStop(healthScheduler, 5.seconds).recover { case _ => logging.info(this, "Health communication failed to shutdown gracefully")}, gracefulStop(activationFeed, 5.seconds).recover { case _ => logging.info(this, "Activation feed failed to shutdown gracefully")}, waitForContainerPoolIdle(pool)) // Allow the shutdown to take a maximum of 3 times the maximum action runtime since the // feed can be buffered and we want to allow for some grace period. Await.result(shutdowns, TimeLimit.MAX_DURATION * 3)

Does that make sense?

Can you emit a log message while waiting for quiescence?

dubee · 2018-06-06T19:52:38Z

common/scala/src/main/scala/whisk/core/connector/MessageConsumer.scala

+      sendOutstandingMessages()
+      stay
+
+    case Event(FillCompleted(messages), _) =>


Still concerned about a race condition here. There might be a chance that the Busy event returns false while fillPipeline() is running and has yet to trigger the FillCompleted event.

right - if filling, you need to wait.
you'll also need to prevent a further fill.

Hmm, do we know if events are handled in synchronous or asynchronous fashion in the Akka FSM?

Always asynchronous for akka.

chetanmeh · 2018-06-08T10:33:27Z

core/invoker/src/main/scala/whisk/core/invoker/InvokerReactive.scala

+        try {
+          // Allow the shutdown to take a maximum of 3 times the maximum action runtime since the feed can be
+          // buffered and we want to allow for some grace period.
+          Await.result(shutdowns, TimeLimit.MAX_DURATION * 3)


May be we return a different exit code in case of unclean shutdown

chetanmeh · 2018-06-08T10:34:38Z

core/invoker/src/main/scala/whisk/core/invoker/InvokerReactive.scala

+        } finally {
+          containerFactory.cleanup()
+          logging.info(this, "Shutting down invoker")
+          System.exit(0)


Given its a System.exit we should catch and log exception of Await operation

chetanmeh · 2018-06-08T10:46:49Z

core/invoker/src/main/scala/whisk/core/invoker/InvokerReactive.scala

+  // Capture SIGTERM signals to gracefully shutdown the invoker. When gracefully shutting down, the health scheduler is
+  // shutdown preventing additional actions from being scheduler to the invoker, then the invoker processes its buffered
+  // messages from the activation feed, and waits for its user containers to finish running before the process exits.
+  Signal.handle(


Do we need to rely on Signal here or normal shutdown hook would work as its also invoked in response to SIGTERM

From experimenting, order of shutdown hooks is not guaranteed. That means the actor system may shutdown before the graceful shutdown hook is executed. However, the actor system needs to be running in order to perform the graceful shutdown.

Okie. Reading this comment and Signal Source Docs it appeared that JVM would ignore any handler for signals for which it registers its own handler like SIGTERM.

* Java code cannot register a handler for signals that are already used * by the Java VM implementation. The <code>Signal.handle</code> * function raises an <code>IllegalArgumentException</code> if such an attempt * is made.

Per comment Oracle JDK would not throw exception and silently ignore the handler. So was not sure if current approach would always work or not

markusthoemmes · 2018-09-18T09:30:08Z

@dubee what's the status of this? Seems like you've run into some dead-ends/race conditions?

dubee · 2019-01-14T17:26:15Z

Closing this as there seem to be race conditions.

dubee changed the title ~~Terminate invoker health scheduler via REST endpoint~~ Provide invoker maintenance mode May 24, 2018

dubee requested a review from markusthoemmes May 24, 2018 18:30

dubee added the review Review for this PR has been requested and yet needs to be done. label May 24, 2018

dubee force-pushed the invoker-maintenance branch from becb227 to 768f6b2 Compare May 29, 2018 20:41

dubee commented May 29, 2018

View reviewed changes

dubee force-pushed the invoker-maintenance branch from 768f6b2 to 64a9026 Compare June 1, 2018 21:54

dubee changed the title ~~Provide invoker maintenance mode~~ Provide graceful shutdown for invoker Jun 1, 2018

dubee added wip and removed review Review for this PR has been requested and yet needs to be done. labels Jun 1, 2018

dubee commented Jun 1, 2018

View reviewed changes

dubee force-pushed the invoker-maintenance branch 3 times, most recently from 6f3cb5c to 558647a Compare June 4, 2018 20:48

markusthoemmes reviewed Jun 5, 2018

View reviewed changes

dubee added 4 commits June 6, 2018 15:38

Hook TERM instead of USR2

006283f

Move container factory clean up into sigterm hook

390d34a

Refactor with futures

78a4f2b

Account for MessageFeed queue

273c51f

dubee commented Jun 6, 2018

View reviewed changes

dubee force-pushed the invoker-maintenance branch from 5c83169 to 5db3dc5 Compare June 6, 2018 20:06

Refactoring

ecc2308

dubee force-pushed the invoker-maintenance branch from 5db3dc5 to ecc2308 Compare June 6, 2018 20:09

dubee added 3 commits June 7, 2018 14:50

Refactor singal

e3f614a

Wait until queue has been filled to ensure MessageFeed is not busy

a575b58

Add drain mode

4594b80

dubee changed the title ~~Provide graceful shutdown for invoker~~ Invoker graceful shutdown and drain mode Jun 8, 2018

dubee added review Review for this PR has been requested and yet needs to be done. and removed wip labels Jun 8, 2018

chetanmeh reviewed Jun 8, 2018

View reviewed changes

dubee force-pushed the invoker-maintenance branch from 14d3683 to bcafc9c Compare June 8, 2018 15:13

Continue running if graceful shutdown fails

3c5855f

dubee force-pushed the invoker-maintenance branch from bcafc9c to 3c5855f Compare June 8, 2018 16:00

rabbah mentioned this pull request Jun 11, 2018

Add judge condition for invoker deployment #2662

Closed

markusthoemmes added the awaits-contributor The contributor needs to respond to comments from reviewer. label Sep 18, 2018

markusthoemmes assigned dubee Sep 18, 2018

markusthoemmes added wip and removed wip labels Sep 18, 2018

dubee added awaits-reviewer The reviewer needs to respond to comments from contributer. and removed awaits-contributor The contributor needs to respond to comments from reviewer. labels Sep 24, 2018

dubee closed this Jan 14, 2019

Invoker graceful shutdown and drain mode #3704

Invoker graceful shutdown and drain mode #3704

Conversation

dubee commented May 24, 2018 • edited

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

codecov-io commented May 24, 2018 • edited

Codecov Report

chetanmeh commented May 25, 2018

rabbah commented May 25, 2018

markusthoemmes commented May 27, 2018

csantanapr commented May 27, 2018 • edited

chetanmeh commented May 28, 2018

dubee left a comment

Choose a reason for hiding this comment

dubee May 29, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rabbah commented May 29, 2018

markusthoemmes commented May 30, 2018

dubee commented May 30, 2018

Choose a reason for hiding this comment

dubee commented Jun 4, 2018

dubee commented Jun 4, 2018

dgrove-oss commented Jun 4, 2018

rabbah commented Jun 4, 2018

markusthoemmes commented Jun 5, 2018

rabbah commented Jun 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dubee Jun 6, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dubee Jun 7, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chetanmeh Jun 11, 2018 • edited

Choose a reason for hiding this comment

markusthoemmes commented Sep 18, 2018

dubee commented Jan 14, 2019

dubee commented May 24, 2018 •

edited

codecov-io commented May 24, 2018 •

edited

csantanapr commented May 27, 2018 •

edited

dubee May 29, 2018 •

edited

dubee Jun 6, 2018 •

edited

dubee Jun 7, 2018 •

edited

chetanmeh Jun 11, 2018 •

edited