Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

link/unlink is exception unsafe #286

Open
facundominguez opened this issue May 3, 2016 · 10 comments
Open

link/unlink is exception unsafe #286

facundominguez opened this issue May 3, 2016 · 10 comments
Assignees
Labels
Projects

Comments

@facundominguez
Copy link
Contributor

facundominguez commented May 3, 2016

Suppose I have the following code

f pid = do
  link pid
  blocking_operations_which_depend_on_health_of pid
  unlink pid

If f is interrupted with an exception, unlink might never be called. Thus I rewrite to:

f pid = bracket_ (link pid) (unlink pid) $
          blocking_operations_which_depend_on_health_of pid

But this still doesn't help, because unlink is interruptible, thus if unlink is interrupted with an exception, the process will remain linked to pid after leaving f.

How about

f pid = bracket_ (link pid) (uninterruptibleMask_ $ unlink pid) $
          blocking_operations_which_depend_on_health_of pid

?

This introduces the potential for blocking indefinitely. If pid dies while unlink is executing, the link exception cannot be thrown because uninterruptibleMask_ is in effect. However, when unlink completes, the exception cannot be thrown either, because the very unlink definition does not allow it. The only way to avoid illegal behavior would be to have unlink never return or somehow don't throw the link exception (the latter looks impossible to me).

The second attempt is the one I would expect to work. The problem is determining what tweak of unlink semantics would make it to work.

I would like unlink to be uninterruptible, but I don't see how that could be implemented. An alternative is to guarantee that when unlink runs with exceptions masked, it always unlinks before returning, but it might return with an exception if it is interrupted during its execution.

@hyperthunk
Copy link
Member

That last paragraph sounds like the most pragmatic approach to me?

@mboes
Copy link
Contributor

mboes commented Jun 23, 2016

I'm not sure what exactly the problem is here. Until unlink completes, because I called link, I accept that I'll have some exception thrown at me at any time. So if someone else throws an async exception at me, is that really a problem?

@facundominguez
Copy link
Contributor Author

facundominguez commented Jun 23, 2016

Yes it is, the link exception could interrupt the important clean up in:

bracket_ importantResource importantCleanup $ bracket_ (link pid) (unlink pid) ...

hm, though probably we could argue that importantCleanup should be made uninterruptible ...

Another bad example:

timeout $ ... bracket_ (link pid) (unlink pid) ...

The link exception could arrive after timeout completes if the timeout occurs just when unlink is called.

@mboes
Copy link
Contributor

mboes commented Jun 24, 2016

There is no need to bracketize unlink because if unlink doesn't get called that's quite okay: link is just there to signal that the current thread should have an async exception thrown at it in case a peer disappears, but if it's gotten one anyways then there's no point in calling unlink.

And aplications shouldn't make any assumptions about what order async exceptions will be delivered in, because CH provides no such guarantees.

@hyperthunk
Copy link
Member

Since that's the case, can we close this issue? Should I add something about this to the documentation?

It feels to me like I should probably compile the various issues and conversations around async exceptions in Cloud Haskell - and there have been many - into a page, and publish it to the website. Or would it be better placed in the haddocks?

@facundominguez
Copy link
Contributor Author

facundominguez commented Nov 8, 2018

but if it's gotten one anyways then there's no point in calling unlink.

I'm not getting why there is no point in calling unlink if the original exception is caught and handled. Or if I did get it in the past, I've forgotten how :)

+1 for discussing async exceptions in the documentation.

@hyperthunk
Copy link
Member

Yes actually that's a good question...

I think the answer is in the client's court. Calling unlink means I no logger want link exceptions. If I want to avoid having some other arbitrary async exception interfere with unlink, I can mask it. One might argue that you should always mask unlink, but that's another matter entirely.

Now assuming you mask unlink and miss the link exception, who cares? You were done with linking anyway. The more onerous issue is that you might've missed another async exception that mattered to you more than the link exception.

The broader design issue here, and we should document this when we update the docs around async exception behaviour in general, is don't use links if you require time to do cleanup. Instead you should use monitors, and kill your process in response to a monitor notification if that's the semantics you require.

Better still, use supervisors, and encode the dependency between running processes that way. If all those processes are managed processes based on the client-server library, they'll each get a chance to do cleanup in a structured fashion, and can even deal with their inbox without risking message loss by using the safe message filters available in the client-server or fsm libraries.

So I guess my perspective is, these issues are probably best solved higher up the value chain in libraries, rather than having complex semantics surrounding the primitives those libraries depend on.

Not many erlang programmers use messaging primitives directly all that often. 95% of production erlang code uses gen_servers and supervisors, which have well understood models for fault tolerance, built on the underlying semantics.

@facundominguez
Copy link
Contributor Author

So I guess my perspective is, these issues are probably best solved higher up the value chain in libraries, rather than having complex semantics surrounding the primitives those libraries depend on.

I agree that the semantics shouldn't be complex. The primitives in the foundations should be simple to use. But, alas, unlink is not so easy. After thinking it some more I still don't see a clear way to improve it.

I'm fine promoting this task to discussing the problem in the documentation.

@hyperthunk hyperthunk self-assigned this Nov 9, 2018
@hyperthunk hyperthunk added the task label Nov 9, 2018
@hyperthunk
Copy link
Member

hyperthunk commented Nov 9, 2018

Looking at the code, I'm not sure some of your assertions hold @facundominguez. After sending a control message to the node controller, unlink waits for a message of type DidUnlinkProcess, not an exception. So assuming we run unlink with uninterruptibleMask, why do we think it might never return?

@hyperthunk
Copy link
Member

And in actual fact, ncEffectUnlink in the node controller doesn't even care if the target (i.e. linked-to process) is alive or not. So there's not even a race, because even if the the linked-to process dies before the node controller handles the signal raised by unlinkAsync in the calling process, it will still send DidUnlink back to the caller.

@hyperthunk hyperthunk added this to To do in Release 0.8 Nov 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Release 0.8
  
Backlog
Development

No branches or pull requests

3 participants