Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cancellation tokens in grain method signatures #1599

Merged
merged 1 commit into from
Jun 17, 2016

Conversation

dVakulen
Copy link
Contributor

Implemented using grain extensions accordingly to the #1569 (comment)

@gabikliot
Copy link
Contributor

So the 2 questions (at least) that we need to decide upon are:

  1. Prog. model - are we exposing standard cancellation token or not. The problem is what if the grain defined the callback on the token, OnCancelled? You need to make sure to invoke it, but not violate singe threaded execution and also deal with non-re-entrant grains. Adds much complexity. An alternative is to introduce OrleansCancellationToken (or RemotableCancellationToken), which mimics the standard one, but does not have OnCancelled. Ugly to define our won, but simplifies a lot.

  2. are we OK with the general approach of "tracing" the call chain back? (If token was sent to grain A, to grain B, to grain C, then upon cancellation we send cancel request to A, then to B then to C). If A or B are not there yet (silo crashed), we can't cancel B and C.

var cancelled = ctw.CancellationToken.IsCancellationRequested;
if (!cancelled)
{
CancellationTokenManager.RegisterTokenCallbacks(ctw);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try to maintain layering. Serialization should only serialize, not call upwards into silo logic (we are trying, as best as we can, to maintain acyclic graph of logical dependencies).
You can do CancellationTokenManager.RegisterTokenCallbacks when the request is sent, from Outside/Inside grain client SendRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially this was made with Action callback, which maintained the layering, to avoid unnecessary computations in case of local grain call, and then replaced due to #1569 (comment). But if you think that now the cost of grain extension call and CancellationTokenSource allocation is neglectible compared to the code readability & supportability I will move it to SendRequest methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't want serializer to call back into the runtime logic. Serializer may change, be replaced by another one, ... It should only serialize and have zero side effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Would it be fine to move this methods into CancellationTokenWrapper and mark them with [SerializerMethod], [DeserializerMethod] .. attributes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

@dVakulen
Copy link
Contributor Author

  1. Assuming that we're talking about target grain subscribing to the token from remotely created source - such token woun't have any difference in callbacks execution from same token, but created locally. Registration also captures current execution context.
  2. The reliable cancellation would require either suggested by @sergeybykov broadcast to all silos, which will incurr additional cost on each cancellation token passing, as we need to track disposed but not cancelled original token sources and dispose remote ones, or even more complicated solution.

@gabikliot
Copy link
Contributor

Regarding reentrancy - I am not sure what you are saying is correct. The OnCancelled is a callback, you don't know on what thread it is called. If you do know, please provide a reference and test.

@gabikliot
Copy link
Contributor

Can you please explain what is the whole story about disposing. I don't understand what we are disposing and what is Task DisposeTokenSources(List<Guid> tokenIds);? Who calls it and why?

@dVakulen
Copy link
Contributor Author

About disposing: the token cancellation callback accepts state object and GCObserver is being passed to it. After the cancellation token source disposing it will be marked for finalization and call the following method https://github.com/dotnet/orleans/pull/1599/files#diff-7cf5943b9e8300f7919f8e294c7c7182R88 in which located call to the Task DisposeTokenSources(List<Guid> tokenIds);.
It's needed because some "hot" grains or grains with delayed deactivation can store thousands of cancellation token sources that woun't be marked for collection before grain deactivation.

@gabikliot
Copy link
Contributor

Who initiates the disposal? The source at the sender? So we will be doing an extra grain call to dispose each token? Unacceptable of course.

@dVakulen
Copy link
Contributor Author

No, we're doing extra grain call for bunch of tokens (25 currently).

@dVakulen
Copy link
Contributor Author

Regarding reentrancy - can you please descibe about which `OnCancelled' callback do you mean?
The one that issues remote cancellation call or user defined callback on remote token?

@gabikliot
Copy link
Contributor

Regarding reentrancy - I mean user defined callback on remote token. That is part of what I called "programming model".

I still don't get it about remote disposing - why do we need it? Why doing any at all?

@dVakulen
Copy link
Contributor Author

It's not really about disposing, it's rather about freeing references to CancellationTokenSources so that GC could collect them. It is added in order to avoid memory issues, otherwise lots of cancellation token sources could be stored at the target grain extension. If this seems like unlikely issue - I will remove it.

@gabikliot
Copy link
Contributor

Of course we need to clean up resource (unused cancellation tokens in the target grain extension). But we don't need to send remote msgs for that. We can cleanup locally.
For example, store a weak reference to it in the extension and then when the local token goes out of scope, it is just GCed.

We use those "tricks" all the time in the runtime - remote calls are expensive. Need to do them only when absolutely necessary.

@dVakulen
Copy link
Contributor Author

That would be an easy solution - just to store WeakReference and wait for token to go out of skope, but token can go into long runnig task, or db query, and the guarantees that it will be cancelled in any case are needed.

@gabikliot
Copy link
Contributor

what do you mean? Why would you want to cancel it as long as application holds a reference to it? Plus, most importantly, there are no resources associated with the token, except the token itself. As long as app holds it, you can't get rid of the token. The only other "resource" is the entry in the extension dictionary, which is minor/negligible.

Unless, there is something else that you mean - the source (whom ever created the cancellation source) can dispose the source/token and that operation has a semantic meaning in the application (like it is important to the app that if the source was disposed, any check on the token will throw DisposedException). I hope you don't mean that.

@dVakulen
Copy link
Contributor Author

I was talking about the entry in the extension dictionary. 1 million entries is taking 500 mb of memory, so I thought this could potentially be an issue, but, as it seems, my assumptions were wrong. Will remove the remote disposing part.

@gabikliot
Copy link
Contributor

But as long as the token itself is alive, we have to keep something for its management and we can't get rid of it as long as app can use it to cancel via it. Correct? So if app sends million tokens and wants to keep using them all, I am not sure what we can do better.
In practice, I don't think you will have million outstanding, still-in-progress long-running requests, per silo.

One other option is to say that token is bound to the request which it came with - when this request is replied/times out, we will also dispose the token. But that means the token cannot be used for some longer background operations, that stay around also after the request timed out. Regardless, I think we should design the cancellation orthogonality to this.

@dVakulen
Copy link
Contributor Author

About callback reentrancy and "programming model": lots of libraries and .Net framework methods accepts CancellationToken, but the OrleansCancellationToken couldn't be passed to it, as it can not inherit CancellationToken while hiding one of it's methods, even if the CancellationToken would be class instead of struct. And consumers will not be able to even create local token source to pass it's token to library methods due to missing OnCancelled method on arrived orleans token. So I think that restricting users from using Register method will make this feature almost completly useless.

@gabikliot
Copy link
Contributor

Can you guarantee single threaded execution? If not, I would vote against the whole feature, so just not to sacrifice, by any means, the single threaded execution.

@gabikliot
Copy link
Contributor

Is the callback called synchronously with the cancell call?

@veikkoeeva
Copy link
Contributor

Just to chime in with the observation that this has been implemented in a few places already. Would it be worth taking a look to widen perspective, so to speak?

Maybe the way to go is to stash it in the context as wrapped one, but if there is a CancellationToken in the last argument position, unwrap it automatically. Checking cancellation requires cooperation from any code, so synchronicity shouldn't be issue there. It doesn't look like the remote cancellation can do much more than send a signal to all the remote tokens (i.e. CreateLinkedTokenSource, call their .Cancel()), logical token that cancellation has been requested, which would set IsCancellationRequested to true and call the locally registered handlers if any.

These handlers wouldn't respect the Orleans turn-based guarantees, but it might be OK, because cancellation might have been requested because the grain has been perceived to be stuck. One thing to think here is that how are linked tokens handled, I would imagine they'd need to go via a wrapper and so would the wrapper register itself as one of the cancellation handlers? I'm not sure how this works in this current implementation. Here are links to docs by the original implementators (including a pdf one, at the end it describes how to add CancellationTokenSource to something that implements cancellation differently). Maybe useful ideas also on How do I cancel non-cancelable async operations?.

Sorry if I pollute discussion now, this is somewhat a half-baked writing.

@dVakulen
Copy link
Contributor Author

Callback is executed as part of the cancel method in the grain extension by runtime scheduler worker pool thread, it can interleave with original grain methods but execution remains single threaded.

@gabikliot
Copy link
Contributor

I was not asking about how it executes in Orleans. I was asking about the Cancellation token itself.
When one calls CancellationTokenSource.Cancel, how does the callback registered on the matching CancellationToken being called? Can you confirm, with 100% confidence, that it is executed synchronously with the Cancel call? If not, does it execute on the thread pool?
If it is the former case then all is good, it will work seamlessly in Orleans. But in the latter case we got a problem.

@dVakulen
Copy link
Contributor Author

Im talking about Cancellation token itself - registered callbacks are executed either synchronously with the Cancel call or on Orleans thread pool.

@gabikliot
Copy link
Contributor

How did you come to that conclusion? Do you have documentation, evidence? Where did you see that the callback runs on the current Task Scheduler?

@dVakulen
Copy link
Contributor Author

Added test for cancellation callbacks execution context.

@gabikliot
Copy link
Contributor

Looks like it is called synchronously:
http://stackoverflow.com/questions/31495411/a-call-to-cancellationtokensource-cancel-never-returns
meaning we should be OK.
Good, one less thing to worry about.


internal void AddGrainReference(GrainReference grainReference)
{
_targetGrainReferences.AddOrUpdate(grainReference.GrainId, id => grainReference, (id, ignore) => grainReference);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use TryAdd instead of AddOrUpdate? it shouldn't matter which instance of GrainReference you use as long as they both reference the same grain, correct?

@jdom
Copy link
Member

jdom commented Jun 16, 2016

@dVakulen this is great! as you saw, I only had some minor comments for you to address. As we expected, load test results show no impact with your change in it (again, we did not add a scenario that uses cancellation tokens, I just verified that code without them is unaffected).

I'll merge once you address those minor things.

@gabikliot
Copy link
Contributor

Julian, please leave me the honor to merge this pr (after all your comments are addressed of course). 233 comments and 3 months of work, including multiple rounds of redesign, I think I deserve it. 😀

@jdom
Copy link
Member

jdom commented Jun 16, 2016

Haha, of course, thanks a lot @gabikliot!
BTW @dVakulen,once you address my comments, please rebase and squash, so that we can do an old school merge, and Gabi can get credit as the one merging :)

@veikkoeeva
Copy link
Contributor

@dVakulen This is a great one. 👍

A note for the the future, as per the original scenario, a great test and an example use case would to add an integration test scenario with Microsoft.Owin.Testing that does
HttpClient -> Server -> Orleans Silo -> Storage where the calls are async and then HttpClient cancels out and see that cancellation works on the other side too (I don't know the most suitable way to test this scenario).

@sergeybykov
Copy link
Contributor

@dVakulen @gabikliot This is an epic PR that adds a non-trivial feature that went through a number of iterations. Your effort is very much appreciated!

@veikkoeeva We don't need http in the test scenario, do we? Just making a call from an Orleans client and cancelling it should be enough I think.

@dVakulen
Copy link
Contributor Author

Comments addressed, rebased and squashed.

@jdom
Copy link
Member

jdom commented Jun 16, 2016

Thanks, @dVakulen . There was just one unaddressed comment. But considering it is minor, I'm OK if @gabikliot merges this, and I can address it in a separate PR
EDIT: sorry for the original comment, I was still typing and removing some comments that were actually addressed, when Github decided to focus on the Comment button

@jdom
Copy link
Member

jdom commented Jun 16, 2016

@gabikliot please do the honors! :)

@jdom
Copy link
Member

jdom commented Jun 16, 2016

@dotnet-bot test this please

@gabikliot gabikliot merged commit 3e5d6eb into dotnet:master Jun 17, 2016
@gabikliot
Copy link
Contributor

Thank you @dVakulen !
This was a great contribution, with a complicated implementation, involving both new abstractions, scalable and performant implementation and various edge case considerations.

@dVakulen
Copy link
Contributor Author

Thank you all who had participated in this work, especially @gabikliot and @ReubenBond for thorough reviews and advises; lots of lessons learned, will try to make my next PR's more refined.

@dVakulen dVakulen deleted the cancellation-tokens branch June 17, 2016 20:58
@sergeybykov
Copy link
Contributor

@dVakulen Nah, don't worry, it was refined enough. It is very typical to go through a number of iterations for a significant change like this. Thanks again for slogging through the long process!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants