Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dead client producer from pub sub #1429

Merged

Conversation

jason-bragg
Copy link
Contributor

The pubsub system was unable to detect (and remove) dead producers on unavailable clients. This would put the pubsub system in a non-recoverable state where all further subscriptions on a stream would fail.

@jason-bragg
Copy link
Contributor Author

There is no consensus that this is the right approach to this issue. Please don't merge until an agreement has been noted.

@gabikliot
Copy link
Contributor

Agree, lets keep this as is without merging, until we investigate other options.
What I would take from this PR for now and merge separately is the ClientNotAvailableException but with the following changes:
Based on the place where it is thrown - in PlacementDirectorsManager - this exc may go to application (grain notifying an outdated client observer). Thus, the exception:
a) has to be public.
b) has to extend OrleansException, as do all our public exception.
c) be moved to https://github.com/dotnet/orleans/blob/master/src/Orleans/Core/Exceptions.cs

@jason-bragg
Copy link
Contributor Author

Regarding the ClientNotAvailableException

The original exception was a KeyNotFoundException, so the ClientNotAvailableException was added as a specialization of KeyNotFoundException to preserve any error handling that may be in place for this error. I'm not opposed to the exception being a specialization of OrleansExceptions, which it should be, but we should be aware that this is a change to the public behavior, which a specialization of KeyNotFoundException would not be.

@gabikliot
Copy link
Contributor

Yes, I understand. But I would not worry about KeyNotFoundException.
We can't be slaves backward compat due to all possible bad decisions. Its much worse I think to have it now not be OrleansException, like all our other exceptions do. Yes theoretically breaking change, but practically I can't imagine someone catching only KeyNotFoundException and NOT OrleansException if he is catching anything, since OrleansException can also be thrown, from any call essentially.

@jason-bragg
Copy link
Contributor Author

@gabikliot Agreed.

@gabikliot
Copy link
Contributor

The plan on this subject is:

  1. Modify this PR to only throw KeyNotFoundException but not do any actual removal from the PubSub Table. I believe @jason-bragg is on it.
  2. Add a test to reproduce the problem reported in Client Stream Producers are not being removed #1421. I think @amccool said he is trying to do that.
  3. Once 1+2 are done, we will open a separate issue and discuss the options for the fix.

@amccool
Copy link
Contributor

amccool commented Feb 22, 2016

re 2) is there is a way to force a failure of a client in TestingSiloHost ?

@gabikliot
Copy link
Contributor

One way is to host the client in a separate app domain and unload it without calling Client.Unitialize. Another is to add support to "KillClient" with some test hook.

I wander what @jdom thinks.

@jdom
Copy link
Member

jdom commented Feb 24, 2016

IMO, I think the AppDomain unloading should be a closer-to-reality test.
Nevertheless AppDomains are not available in .NET Core, so not sure whether introducing new uses of it is the right approach. Although there are some examples of creating clients in new appdomains (for the MultiCluster testing, for example, see TestingClusterHost.NewClient.

Regarding the exception, I agree that it's pretty unlikely users are already catch KeyNotFoundException, so breaking it by specializing OrleansException should be OK. Nevertheless, these kind of sneaky breaking changes in behavior MUST be captured in the changelog... I think we should not way for the release date to start capturing things in the changelog, as we will most likely forget about them by that time.

@gabikliot
Copy link
Contributor

Thumbs up on the changelog. Great idea!

@gabikliot gabikliot changed the title Remove dead client producer from pub sub [ON HOLD] Remove dead client producer from pub sub Feb 26, 2016
@sergeybykov sergeybykov mentioned this pull request Apr 1, 2016
12 tasks
@jason-bragg jason-bragg force-pushed the RemoveClientProducerFromPubSub branch from b60ea0f to bc73477 Compare April 2, 2016 00:11
@jason-bragg jason-bragg changed the title [ON HOLD] Remove dead client producer from pub sub Remove dead client producer from pub sub Apr 2, 2016
@jason-bragg
Copy link
Contributor Author

Updated and tested with "Client Streaming Tests #1637"

@@ -29,6 +29,14 @@ protected OrleansException(SerializationInfo info, StreamingContext context)
}
}

public static class OrleansExceptionExtensions
{
public static bool IsClientNotAvailableException(this OrleansException orleansException)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? Why not check exception type? even better, use standard catch block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this sucks. I had planned to discuss this after this PR, but we can do this now. :)
The problem is that when the catalog encounters this condition, it rejects the message and rejected messages always come back as OrleansException. The original exception becomes the message of the Orleans exception. I was not sure what depended on this behavior so I chose not to change it. This extension class allowed me to avoid addressing what was potentially a breaking change (how we handle rejection responses) in this PR. :/

I think the behavior we want would be:
If original exception is of type OrleansException (which all should be), we pass it through. If it is not, we create an OrleansException with the message and inner exception coming from the oringial exception.

Questions:
Should we address the rejection exception change in this PR?
If so, is the above behavior desirable, or is there better alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dug into this with Sergey. We've an approach I'm going to attempt. Pls see next iteration.

@gabikliot gabikliot mentioned this pull request Apr 2, 2016
@gabikliot gabikliot added the bug label Apr 2, 2016
@gabikliot
Copy link
Contributor

Looks good overall, some small comments.

CI is stuck for some reason.

@sergeybykov
Copy link
Contributor

@dotnet-bot test this please

@jason-bragg jason-bragg force-pushed the RemoveClientProducerFromPubSub branch from bc73477 to ab493db Compare April 4, 2016 19:17
@jason-bragg
Copy link
Contributor Author

Not ready for merge. I'm modifying rejected message error handling to preserve original exception type. Will note when complete.

@jason-bragg jason-bragg force-pushed the RemoveClientProducerFromPubSub branch from ab493db to b869d0c Compare April 5, 2016 00:48
@jason-bragg
Copy link
Contributor Author

Rejected message error handling now preserves original exception type if original exception was of type OrleansException.

PubSub grain error handling adjusted accordingly.

CI build seems to be stuck, but assuming that gets resolved, I think this is ready to go, unless there are more comments.

@gabikliot
Copy link
Contributor

I hope you know that is was not me who added that "up-for-grabs" label.
Its CI impersonating me. ;-)

@gabikliot gabikliot merged commit ec071b9 into dotnet:master Apr 5, 2016
@jdom jdom mentioned this pull request Apr 5, 2016
@github-actions github-actions bot locked and limited conversation to collaborators Dec 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants