-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make timer callbacks respect reentrancy of grains #2574
Comments
Heads-up: this is a breaking change to an actively used feature(/bug? :-) ) We are actively relying on timer callback reentrancy. This being said, I agree that the behavior is confusing and should be changed. I think a straightforward migration path would be to call a reentrant grain method (specified using |
@cata, thanks for the heads-up. I didn't realize people rely on this behavior. |
This timer reentrancy initially surprised me as well, but now our code has also come to rely on it. The non-reentrant methods in our grains often control flags used by the timer callback. Would be happy to migrate, but this "loophole" is currently the only way to have a partially reentrant grain - a useful feature that would be sad to lose. |
Maybe we should make the behavior configurable then, to explicitly allow for both reentrant and non-reentrant execution of timer callbacks? |
@sergeybykov - configurable behavior (an extra parameter on a |
I think that is exactly what we agreed upon and planned to do a long time ago. That is at least what I remember. |
@sergeybykov @gabikliot Please can I clarify if this behavior is applicable to Reminders too? I ask because the Reminders are implemented via Timers underneath. I'm thinking something I've implemented ontop of GrainServices may be subject to this interleaving which I did not anticipate would be an issue. If someone can explain the proposed method of resolving this I am happy to have a go. |
@jamescarter-le Reminders are different in behavior from timers because their ticks are delivered via true grain calls that follow reentrancy/interleaving rules. |
It's very important for timers callback not to interleave. Optional configuration (to allow re-entrant) is also welcomed. Please consider also an option to register a timer as such it wont allow Orleans deactivating the Grain. |
We don't have concrete plans for this. But we are open to contributions. |
In the meanwhile, what is the correct way to 'self invoke' from a timer method to avoid interleaving? |
Yes, as reference. |
Ok, got it, thanks |
To avoid the reentrancy of timers - I've used in my timer The timer only invokes the other method which get its turn scheduled, and sometimes, due to other long methods turns, it can be scheduled sequentially, and by that it doesn't respect the expected 'timer' intervals (i.e. run every x seconds). Now, I can add a defense to avoid it running sequentially by checking the last time it was executed, but I was wondering if you have a better solution. FYI - this is the method I used to self-invoke a method with the Grain context scheduler:
|
@shlomiw I may be misunderstanding what you're trying to do here, but if the goal is to ensure that work triggered by the timer conforms to grain call semantics (ie, not reentrant) you need only make await the grain call.
Assuming "action" contains a call to the grain, awaiting SelfInvokeAfter in the timer callback will prevent the next timer call from being scheduled until the first one is complete. By calling factory.StartNew(async () => await action(g)) and not waiting the completion of the operation, the timer callback is free to return immediately and begin counting down to the next trigger, before current work is complete. |
If I can await the self-invoke from the timer then it sure makes sense (I thought I have to schedule it for next turn, to avoid 'deadlock', but it should be possible since the timer is reentrant!). |
@jason-bragg - thanks! it works correctly as you suggested :) |
@shlomiw, Glad that helped. |
@jason-bragg - I'm aware of it, thanks for bringing that up! |
Now I have a scenario where I want to use a timer for grain local maintenance, which still respects the re-entrancy (i.e. - non-entrant), but I don't want it to keep the grain alive. Any idea how? |
I am sometimes using await loops with delays: while (! done)
{
do_maintenance_work();
await Task.Delay(period);
} As far as I know, this does the right thing if the grain deactivates (i.e. it will no longer resume at the await) but I am not 100% sure. |
@sebastianburckhardt - tnx for your reply! |
btw - instead of using timers, I was thinking about using grain call filter (interceptor) and execute my maintenance after I got a request (i.e. every once in a while). |
@gabikliot, @sergeybykov, @jason-bragg, I just tested timer invocation with the suggested workaround and I still see interleaving calls. I'm using Orleans The grain implementationusing System;
using System.Threading.Tasks;
using OrleansGrainInterfaces;
using Orleans;
using Microsoft.Extensions.Logging;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using Orleans.Concurrency;
namespace OrleansGrains
{
public class ValueGrain : Grain, IValueGrain
{
private readonly ILogger _logger;
private readonly Random _rnd;
IDisposable _timer;
public ValueGrain(ILogger<ValueGrain> logger) {
_logger = logger;
_rnd = new Random();
}
public async Task<string> Get()
{
Console.WriteLine($"***Start***");
await Task.Delay(3000);
Console.WriteLine($"***Finish***");
return this.GetPrimaryKeyString();
}
public async Task OnTimer(object data)
{
await this.AsReference<IValueGrain>().Update();
}
public async Task Update()
{
Console.WriteLine($"***StartTimer***");
var waitPeriod = _rnd.Next(5000,15000);
await Task.Delay(waitPeriod);
Console.WriteLine($"***FinishTimer***");
}
public override Task OnActivateAsync()
{
_timer = RegisterTimer(this.OnTimer, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(10));
return Task.CompletedTask;
}
public override Task OnDeactivateAsync()
{
_timer?.Dispose();
return Task.CompletedTask;
}
}
} Output produced:
I understand that the cals are (supposedly?) interleaved between
I guess I'm not understanding this two sentences correctly, as they seem contradictory to me. So, timer callbacks will always interleave (based on testing), but the workaround, by using I'd appreciate an explanation, as I'm probably just misunderstanding some concepts here. |
@ReubenBond in my tests I get I took the |
@turowicz, could you link me to a repro? It may be that something is causing the timer callback passed to |
Would really love to see this be resolved so grain timers fit within the actor's expected concurrency model. |
@ReubenBond this is the failing change: Commit: Surveily/Orleans.Streaming.Grains@1cdd9e5 |
@ReubenBond it fails correctly as the access violation is swallowed inside the IQueueAdapterReceiver, but it shows in test as invalid number of invocations in Moq due to reattempt to process message. |
When I want timer to execute in non-interleaved way I just call self via interface method. Works like a charm. |
@yevhen thats great but my calls are made via IQueueAdapterReceiver and it causes Access Violations in composition with a timer. Perhaps its only caused by the OrleansInternalClient? |
It shouldn't make any difference in theory. AFAIU, the internal client just skips the socket and routes messages by directly using the silo's internal infrastructure. |
@turowicz Can you share the code? |
@yevhen the code is linked here #2574 (comment) |
@ReubenBond can you confirm that there is nothing odd with calling the Grains from the IQueueAdapter? |
@ReubenBond the proposed solution by you and others with the grain queueing a request to itself from inside the callback has a serious flaw unfortunately… Take the following scenario:
Of course the dead-lock will eventually resolve, as the call to the grain itself from inside the timer callback will eventually timeout, depending on how Orleans is configured. Also, this call cannot be picked up by another activation of the same grain, because this one (the previous one) has not yet deactivated AFAIK (right?). A good way to solve this is to not await the call to the grain itself from inside the timer callback, or even better, have that method that gets called be annotated with [OneWay]. Even though this might have the problem of timer ticks getting batched together one after another when the period is really small and the execution of one tick takes longer than the period, due to no longer awaiting. IMO, this should AT LEAST be properly documented/described and provided with alternatives with pros/cons, even before any fix is added (will it be?). This has to be done, given that Orleans is supported by MS and is a first class-citizen. PS: If my description does not provide enough explanation, I am happy to provide more information or even reproducible steps in code. |
@tomachristian I've had no issues with the workaround from this thread when using If you're already doing that then I'm not sure how you'd getting a deadlock at all when timer callbacks don't block calls to the grain. That's the whole issue being raised in this thread - timer callbacks do not block calls to the grain (which isn't intuitive). If they did then that workaround should also trigger a deadlock because the timer callback triggers an external call (separate call chain via |
@Rohansi please try this code to see what I mean. I didn't say I have this problem, I am using OneWay to avoid this, but people using the solution proposed here MIGHT have this problem. It is non-deterministic.
You are right that timer callbacks do not block calls to the grain, I am not saying they do. I explained in those bullet points what is going on. Grain gets deactivated (eg. by runtime), cannot process further requests/invocations (neither from outside or from itself), the call to itself never gets fullfilled, the callback never completes, thus the OnDeactivateAsync never gets called. |
@tomachristian I haven't tried that specifically because it is different from the suggested workaround - there is no The call to |
@Rohansi it dead-locks with and without Task.Run, I invite you to try it. |
It is already reported here #7865, btw. |
Good to know. I actually stopped using Orleans because it was so prone to tricky deadlocks. Loved using it but these are some pretty critical issues that have been unresolved for too long. |
@Rohansi indeed. Unfortuantely I am noticing this myself as well, more and more |
@tomachristian @Rohansi thank you for reporting. I've opened some PRs (#8950 (3.x: #8949, 7.x: #8951), #8953, and #8954) to fix the deadlock and improve timers in Orleans. |
PR for Non-reentrant timers: #8955 |
Dear @ReubenBond, apologies for coming back so late. Are you saying I should add |
@turowicz you should not have ConfigureAwait(false) in that code. If you still hit this with v3.7.2 or v8.2.0 (once it's released), please open a new issue and we can investigate together |
We have none of those calls. We are on |
Today, timer callbacks interleave with grain methods (at
await
points) for both reentrant and non-reentrant grains. This is a source of confusion because developers, naturally, do not expect interleaving in non-reentrant grains. We need to make timer callbacks respect reentrancy of grain classes and not interleave for non-reentrant grains.The text was updated successfully, but these errors were encountered: