Typed child actors are not stopped when parent actor is restarted by supervision strategy #25556

svroonland · 2018-08-31T09:28:29Z

I'm spawning a child in a typed actor and using a restart supervision strategy on the parent. When the parent fails and is restarted, an exception "not unique actor name [child actor name here]" is thrown.

Apparently the child is not stopped as part of the restart procedure. This is unlike with untyped actors where, when an actor is restarted, its children are stopped and recreated. For typed actors I would also expect behavior as documented here for untyped actors: https://doc.akka.io/docs/akka/2.5/general/supervision.html#what-restarting-means

(I can provide some test cases for untyped and typed on request)

patriknw · 2018-08-31T11:05:27Z

Thanks for reporting

patriknw · 2018-09-05T12:19:08Z

@svroonland I think this is actually expected behavior. I found this comment in a test:

akka/akka-actor-typed-tests/src/test/scala/akka/actor/typed/SupervisionSpec.scala

Lines 395 to 396 in f12f686

    
           // TODO document this difference compared to classic actors, and that 
        
           //      children can be stopped if needed in PreRestart

Unless there are objections I will continue this by adding that documentation, including an example.

svroonland · 2018-09-06T08:41:07Z

Not sure I find it intuitive behavior to not restart the children, but I trust you have broader considerations of course.

patriknw · 2018-09-06T08:55:58Z

One drawback with always stopping the children is that any direct communication with the children from the outside is stopped. The actor references are obsolete and anyone using them must watch and re-establish new ones, or all such interaction must go via the parent.

The ActorRef of the parent is still valid after a restart, but not the ActorRefs of the children. Might be a surprise for many?

michkhol · 2018-09-12T16:41:22Z

I see another problem: if the child actor is stopped in the PreStart message handler, the parent actor apparently does not wait for the stop to finish. If upon restart the parent tries to recreate the child actor with the same name fast enough, you still get the exception.

patriknw · 2018-09-12T18:30:14Z

Have you verified that or is it a theory?
Stopping from parent should free the name immediately and new child with same name should be possible to create.

michkhol · 2018-09-12T18:33:45Z

I have this problem in my code currently and I believe I can come up with a code sample shortly.

michkhol · 2018-09-12T19:14:31Z

The supervisor is supposed to do 10 retries before stopping, but I'm getting an exception at the first try.

import akka.actor.ActorSystem
import akka.actor.typed.{ActorRef, Behavior, PostStop, PreRestart, SupervisorStrategy, Terminated}
import akka.actor.typed.scaladsl.Behaviors

import scala.concurrent.duration._
import akka.actor.typed.scaladsl.adapter._

object TypedSupervisedBug {
  import BadActor._

  implicit val actorSystem = ActorSystem("actorsystem")
  implicit val ec = actorSystem.dispatcher

  val start: Behavior[Msg] = Behaviors.setup { ctx =>
    val supervised: Behavior[Msg] = Behaviors.supervise(BadActor.start)
      .onFailure[IndexOutOfBoundsException](SupervisorStrategy.restartWithLimit(maxNrOfRetries = 10, withinTimeRange = 10.minutes))
    val badActor = ctx.spawn(supervised, "badActor")
    ctx.watch(badActor)
    active
  }

  val active: Behavior[Msg] = Behaviors.receive[Msg] { (ctx, msg) =>
    Behaviors.same
  }.receiveSignal {
    case (_, Terminated(w)) =>
      Behaviors.stopped
    case (_, PostStop) =>
      System.err.println("System has been terminated.")
      actorSystem.terminate().foreach(_ => System.exit(1))
      Behaviors.same
  }

  def main(args: Array[String]): Unit = {
     actorSystem.spawn(TypedSupervisedBug.start, "supevisor")
  }
}

object BadActor {
  trait Msg
  case object TestMsg extends Msg

  val start: Behavior[Msg] = Behaviors.setup { ctx =>
    val child = ctx.spawn(PoorChild.active, "poorchild")
//    ctx.watch(child)
    ctx.self ! TestMsg
    active(child)
  }

  def active(childRef: ActorRef[Msg]): Behavior[Msg] = Behaviors.receive[Msg] { (ctx, msg) =>
    throw new IndexOutOfBoundsException("Test exception")
  }.receiveSignal {
    case (ctx, PreRestart) =>
      System.err.println("BadActor is being restarted.")
      ctx.stop(childRef)
      Behaviors.same
  }
}

object PoorChild {
  import BadActor._

  val active: Behavior[Msg] = Behaviors.receive[Msg] { (ctx, msg) =>
    Behaviors.same
  }
}

patriknw · 2018-09-13T09:52:54Z

Ah, I mixed up things. It's when Terminated is received in the parent that the name is free to be used again. Thanks.

Hmm, stopping children and then starting new with same name for a restart might still be something we want to support. I have to think about this.

michkhol · 2018-09-13T16:17:34Z

Thanks! I would really like to see it implemented as in untyped actors, perhaps with an option not to stop children on parent restart. Otherwise while migrating code to typed actors, one would have an upleasant surprise.

patriknw · 2018-09-14T07:29:51Z

I have analyzed how it's implemented in untyped. It's documented here: https://doc.akka.io/docs/akka/current/general/supervision.html#what-restarting-means

Here is a summary of the interaction. Given an actor hierarchy of grandparent, parent, child where an exception is thrown in parent and parent is restarted.

parent: FaultHandling.handleInvokeFailure
- suspend
- parent ! Failed(self)
grandparent: FaultHandling.handleFailure
- restartChild
parent: receive Recreate sys msg => FaultHandling.faultRecreate
- preRestart
- unwatch and stop children
child: receive Terminate
parent: DeathWatch.watchedActorTerminated
- FaultHandling.handleChildTerminated
- FaultHandling.finishRecreate
- create freshActor instance
- restart surviving children

In Typed the grandparent isn't involved since the supervision strategy is taking care of exception in the failing actor itself. We must probably hook into some of these steps from the supervision strategy (internal.Restarter), such as suspend and faultRecreate. Then the other steps will follow automatically.

We also have to override preRestart to not stop children by default , since that is supposed to be done explicitly from PreRestart signal if desired.

One question is if we want to keep the restart of surviving children or if we should keep them untouched?

I'd like @rkuhn 's advice on this issue so that we don't have an accidental misunderstanding of desired semantics.

michkhol · 2018-09-14T18:42:43Z

We also have to override preRestart to not stop children by default , since that is supposed to be done explicitly from PreRestart signal if desired.

Why change the already expected behavior (from untyped)? If the user does nothing, the children should be stopped by default. I presume, in most cases the parent fully controls the life-cycle of its children, the explicit stop would be semantically redundant. Optionally the user could signal that the children should not be stopped, but in my opinion it is really optional.

patriknw · 2018-09-18T14:34:46Z

I tried a few things. It's not that easy to hook into untyped stuff since we implement supervision and restart quite differently.

Then I started to think about implementing the "wait for children to stop" with ordinary typed things in the Restarter. Would have to use StashBuffer instead of suspend / resume.

An annoying detail is that I can't know which children were stopped in the PreRestart and which were kept so can't know when all expected Terminated signals have been received. One way around that could be to add extra bookkeeping for this specific thing in the typed.ActorContextImpl

@chbatey wdyt?

rkuhn · 2018-09-18T14:49:48Z

Instead of implementing the same semantics, we could also think about whether something else would be more appropriate, also given the theme of simplifying the actor lifecycle by switching to Behavior decorators and making the local decision without inspecting the exception itself.

One thing to keep in mind is that this is a library facility that could as well live in user code, no special magic shall be included—otherwise we break the premise of Behaviors being pure functions of the incoming message and the context (declaring method invocations on the context as permitted effects).

So we could add a second Restarter (or differentiate by a construction parameter) that terminates all child actors, switches to stashing behavior, awaits all Terminated messages, then injects the stash into the initial behavior. The simplification here being that either all or none of the child actors are kept. Users who want something else can copy the code and modify it according to their needs.

patriknw · 2018-09-18T15:25:49Z

I agree that it's probably better to do this in Typed land, and starting with an option to stop all children is probably covering what most will ever need.

patriknw · 2018-09-19T05:36:11Z

Closed the docs PR. When implementing this for real some of the samples from #25570 might be used and updated.

chbatey · 2018-09-19T06:42:19Z

Sorry this notification got lost in the PR rebuild storm.
+1 to all or nothing to keep it simple.
I've nearly rewritten the Restarters as BehaviorInterceptors so we may be in for some fun conflicts.

ktoso · 2018-09-19T06:55:29Z

For the record: also +1 for keeping this in "typed land" :)

patriknw · 2018-09-19T07:55:37Z

rewritten the Restarters as BehaviorInterceptors

I know. I was only trying things out. We'll do the real thing after your refactoring

* stash messages and signals while waiting for children to be stopped * handle watch of other actors * exception from unstash * exception from first setup * merge RestartSupervisor and BackoffSupervisor * remove unused PreStart * docs

* stash messages and signals while waiting for children to be stopped * handle watch of other actors * exception from unstash * exception from first setup * merge RestartSupervisor and BackoffSupervisor * API change: restartWithLimit => restart.withLimit * remove unused PreStart * docs

* stash messages and signals while waiting for children to be stopped * handle watch of other actors * exception from unstash * exception from first setup * merge RestartSupervisor and BackoffSupervisor * API change: restartWithLimit => restart.withLimit * remove unused PreStart * docs * move BubblingSample to separate class * fix: fail after more than limit in restart.withLimit when deferred factory throws * match case RestartOrBackoff instead

olijaun · 2019-10-24T09:07:39Z

I'm a newbie to AKKA and I just investigated a problem that is related to this issue. I finally solved it by using SupervisorStrategy.restart().withStopChildren(true)

I think you really should update the docs: The "What Restarting Means" section (https://doc.akka.io/docs/akka/current/general/supervision.html) is in the Chapter "General Concepts". People starting with Akka will probably read this (regardless of whether they use "untyped" or "typed"). The documentation says: "call the old instance’s preRestart hook (defaults to sending termination requests to all children and calling postStop)"

Regards

patriknw · 2019-10-24T09:33:30Z

Do you see the same problem in the latest documentation: https://doc.akka.io/docs/akka/2.6/general/supervision.html

Since it seems that you are developing something new I'd recommend that you use version 2.6.0-RC1.

olijaun · 2019-10-24T12:54:02Z

I won't switch to RC1 yet because I'm not sure if we will go into production before akka 2.6 :-) But I will at least check the Doc for 2.6 from now on. The respective chapter is not misleading anymore. Thanks!

patriknw self-assigned this Sep 5, 2018

patriknw added a commit that referenced this issue Sep 6, 2018

Document that children are not stopped on parent restart, #25556

1beaf04

patriknw added a commit that referenced this issue Sep 6, 2018

Document that children are not stopped on parent restart, #25556

56334e7

patriknw mentioned this issue Sep 6, 2018

WIP: Document that children are not stopped on parent restart, #25556 #25570

Closed

michkhol mentioned this issue Oct 16, 2018

Akka typed: No Terminated signal from child if parent is restarted #25802

Closed

patriknw removed their assignment Nov 26, 2018

patriknw added the 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted label Nov 26, 2018

patriknw removed the 3 - in progress Someone is working on this ticket label Nov 26, 2018

helena self-assigned this Nov 26, 2018

helena added 3 - in progress Someone is working on this ticket and removed 3 - in progress Someone is working on this ticket labels Nov 26, 2018

helena removed their assignment Nov 28, 2018

patriknw self-assigned this Dec 12, 2018

patriknw added 3 - in progress Someone is working on this ticket and removed 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted labels Dec 12, 2018

patriknw mentioned this issue Dec 19, 2018

Support stop or keep of child actors when parent is restarted, #25556 #26137

Merged

patriknw removed the 3 - in progress Someone is working on this ticket label Jan 17, 2019

patriknw added this to the 2.5.20 milestone Jan 17, 2019

patriknw closed this as completed Jan 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typed child actors are not stopped when parent actor is restarted by supervision strategy #25556

Typed child actors are not stopped when parent actor is restarted by supervision strategy #25556

svroonland commented Aug 31, 2018

patriknw commented Aug 31, 2018

patriknw commented Sep 5, 2018

svroonland commented Sep 6, 2018

patriknw commented Sep 6, 2018

michkhol commented Sep 12, 2018 •

edited

patriknw commented Sep 12, 2018

michkhol commented Sep 12, 2018

michkhol commented Sep 12, 2018 •

edited

patriknw commented Sep 13, 2018

michkhol commented Sep 13, 2018

patriknw commented Sep 14, 2018

michkhol commented Sep 14, 2018 •

edited

patriknw commented Sep 18, 2018

rkuhn commented Sep 18, 2018

patriknw commented Sep 18, 2018

patriknw commented Sep 19, 2018

chbatey commented Sep 19, 2018

ktoso commented Sep 19, 2018

patriknw commented Sep 19, 2018

olijaun commented Oct 24, 2019

patriknw commented Oct 24, 2019

olijaun commented Oct 24, 2019 •

edited

Typed child actors are not stopped when parent actor is restarted by supervision strategy #25556

Typed child actors are not stopped when parent actor is restarted by supervision strategy #25556

Comments

svroonland commented Aug 31, 2018

patriknw commented Aug 31, 2018

patriknw commented Sep 5, 2018

svroonland commented Sep 6, 2018

patriknw commented Sep 6, 2018

michkhol commented Sep 12, 2018 • edited

patriknw commented Sep 12, 2018

michkhol commented Sep 12, 2018

michkhol commented Sep 12, 2018 • edited

patriknw commented Sep 13, 2018

michkhol commented Sep 13, 2018

patriknw commented Sep 14, 2018

michkhol commented Sep 14, 2018 • edited

patriknw commented Sep 18, 2018

rkuhn commented Sep 18, 2018

patriknw commented Sep 18, 2018

patriknw commented Sep 19, 2018

chbatey commented Sep 19, 2018

ktoso commented Sep 19, 2018

patriknw commented Sep 19, 2018

olijaun commented Oct 24, 2019

patriknw commented Oct 24, 2019

olijaun commented Oct 24, 2019 • edited

michkhol commented Sep 12, 2018 •

edited

michkhol commented Sep 12, 2018 •

edited

michkhol commented Sep 14, 2018 •

edited

olijaun commented Oct 24, 2019 •

edited