Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing remote deployment, see #2983 #1094

Merged
merged 3 commits into from
Jan 31, 2013

Conversation

patriknw
Copy link
Member

  • Share same instance of deadLetters between LARP and RARP
  • Generate ChildTerminated from all Terminated in RemoteDeploymentWatcher
    • The problem is that we do remote deployment to a node that isn't alive and with ordinary
      remoting that is not detected at all, as we know. With cluster this was taken care of by
      a later AddressTerminated and ChildTerminated generated by
      RemoteDeploymentWatcher. With the new RemoteDeadLetters the additional watch
      triggers an immediate Terminate which is captured by RemoteDeploymentWatcher
      but not acted upon since it's not an addressTerminated. RemoteDeploymentWatcher
      unwatch and will therefor not act on later AddressTerminated.
    • The long term solution is to have reliable system messages and remote supervision
      without explicit watch, so that we know that the remote deployment fails.
    • This short term solution is to let RemoteDeploymentWatcher always generate
      ChildTerminated, also for non-addressTerminated.
    • It's possibly racy since ChildTerminated is not idempotent.
  • FIXME question of how to handle removeChildWhenToAddressTerminated

@akka-ci
Copy link

akka-ci commented Jan 31, 2013

Started jenkins job akka-pr-validator at https://jenkins.akka.io/job/akka-pr-validator/414/

@bantonsson
Copy link
Member

So isn't there a similar problem in autoReceiveMessage in ActorCell? It uses the same check for addressTerminated on the Terminated message.

@akka-ci
Copy link

akka-ci commented Jan 31, 2013

jenkins job akka-pr-validator: Success - https://jenkins.akka.io/job/akka-pr-validator/414/

private def removeChildWhenToAddressTerminated(child: ActorRef): Unit =
private def removeChildWhenToAddressTerminated(child: ActorRef): Unit = {
// FIXME Needs to call handleChildTerminated, and handle the return value, which is a list of SystemMessages to be processed.
// How to do that? Note that it must be done before processing of the Terminated message.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a ticket and not a comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually solved it during the meeting today, will update
there will be a ticket for the better long term solution though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :-)

…er, see #2983

* The problem is that we do remote deployment to a node that isn't alive and with ordinary
  remoting that is not detected at all, as we know. With cluster this was taken care of by
  a later AddressTerminated and ChildTerminated generated by RemoteDeploymentWatcher. With
  the new RemoteDeadLetters the additional watch triggers an immediate Terminate which is
  captured by RemoteDeploymentWatcher but not acted upon since it's not an addressTerminated.
  RemoteDeploymentWatcher unwatch and will therefor not act on later AddressTerminated.
* The long term solution is to have reliable system messages and remote supervision without
  explicit watch, so that we know that the remote deployment fails.
* This short term solution is to let RemoteDeploymentWatcher always generate ChildTerminated,
  also for non-addressTerminated.
* It's possibly racy since ChildTerminated is not idempotent.
@patriknw
Copy link
Member Author

The replacement for removeChildWhenToAddressTerminated works fine.
Here is the ticket: https://www.assembla.com/spaces/akka/simple_planner#/ticket:2993
Please adjust if you think I missed something.

@akka-ci
Copy link

akka-ci commented Jan 31, 2013

Started jenkins job akka-pr-validator at https://jenkins.akka.io/job/akka-pr-validator/414/

@akka-ci
Copy link

akka-ci commented Jan 31, 2013

jenkins job akka-pr-validator: Success - https://jenkins.akka.io/job/akka-pr-validator/414/


.. note:: Creating a remote deployed child actor with the same name as the terminated
actor is not fully supported. There is a race condition that potentially removes the new
actor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wants to be a warning …

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'll change that and then merge.
I think cherry-pick of the last two commits to release-2.1 should be no problem. Ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, backporting to 2.1 should only be done once we are certain how to proceed; in 2.1 links between systems are not bullet-proof yet with or without this patch, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@rkuhn
Copy link
Contributor

rkuhn commented Jan 31, 2013

LGTM!

@drewhk
Copy link
Member

drewhk commented Jan 31, 2013

LGTM!
Incredible detective job!

* Replaces the previous half-baked removeChildWhenToAddressTerminated
@bantonsson
Copy link
Member

LGTM

patriknw added a commit that referenced this pull request Jan 31, 2013
…-patriknw

Failing remote deployment, see #2983
@patriknw patriknw merged commit f809558 into master Jan 31, 2013
@patriknw patriknw deleted the wip-2983-failing-remote-deployment-patriknw branch January 31, 2013 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants