New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start ReminderService initial load in the background #1520
Start ReminderService initial load in the background #1520
Conversation
Ping @fsimonazzi |
Can you provide some background on the problem that motivated this change? |
The goal is to reduce silo startup time when there are A LOT of reminders to load (regardless of what's being discussed in #947 to separate into quantums, assuming that all reminders fit in memory, it's just that it takes very long to scan the reminders table). Doing the initial load of the reminders in the background after the silo is considered functional helps reduce the downtime, with no considerable side effects. The only minor side effect is potentially timing out on the call to |
@@ -242,6 +238,28 @@ public void RangeChangeNotification(IRingRange old, IRingRange now, bool increas | |||
|
|||
#region Internal implementation methods | |||
|
|||
private async Task StartInBackground() | |||
{ | |||
await ReadAndUpdateReminders(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that throws, the rest of the code will not execute.
Just putting try finally might not be good too: you did not really start yet, the reminders were not read into memory first time, so if you start accepting register requests, it may lead to a wrong behaivour.
Instead, I think you need to carefully make sure you can start the timer first and check all cases.
OK, in general sounds good, BUT:
|
Completely agree on the error handling for the first read. I actually did the work and rebased, but I guess I didn't force push and didn't realize I didn't update the PR. I'll push it tomorrow morning. |
No, I don't agree. That was exactly my point. Init only checks and creates the table, it does not attempt to read. The first read may fail for many reasons, for example format change. What is very unlikely is the 2nd read to fail after 1st succeeded, but 1st read failing is non unlikely. So if you go with the semantics change (which I generally support), you may want to indicate that error more clearly, for example with a specific error msg to the register requests if you decide not to serve them. The bigger problem is lets say there are no new registers - and the first read failed - the reminders will just not tick, silently, without any visible indication, maybe only an error in the log. That is a big semantic change. |
Thanks @gabikliot for the feedback. I added (an updated version of) the retry logic that would retry a number of times, and afterwards will stop the service but making the I hardcoded some details related to retrying (retry count, delay time, and max wait time for callers). We could add config knobs for them, but I wasn't sure if it's what we want, and also wanted to validate that you agree with the approach first. |
@@ -400,7 +477,7 @@ private bool TryStopPreviousTimer(GrainReference grainRef, string reminderName) | |||
private async Task DoResponsibilitySanityCheck(GrainReference grainRef, string debugInfo) | |||
{ | |||
if (status != ReminderServiceStatus.Started) | |||
await startedTask.Task; | |||
await startedTask.Task.WithTimeout(InitialLoadMaxWaitTimeForUpdates); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the callers will be getting this timeout if they call while the service is trying to init.
Instead, it would be better to do like you do now await startedTask.Task.WithTimeout(InitialLoadMaxWaitTimeForUpdates)
but then catch and rethrow "service is still trying to init".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that could potentially make the grain be stuck for 10 minutes. I would prefer to bail early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it would not.
As I wrote, you still put the WithTimeout
(await startedTask.WithTimeout(...)), you just rethrow an explicit exc instead of a Timeout exc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I miss understood. I thought you were suggesting that I wait longer, until no retries are left. Good, I will change the exception.
@gabikliot, I addressed all the comments. I kept separated commits so that it is easier for you to review what changed from iteration to iteration. Let me know if it's good for merging and I'll squash and rebase on top of the current master |
When this one is ready, we need to merge it and include in the 1.1.3 release. |
OK, you can squash. |
This way the service will not block the silo from fully initializing before all the reminders are loaded. Avoid reading and starting all the reminders if the Silo is stopped before finishing the first load. Give an explicit exception when trying to register a reminder and the service is still booting Retry forever, but fail fast to update reminders after a few retries
91993c9
to
d1286c3
Compare
Done, thanks @gabikliot . This was rebased and squashed, and it's ready to merge unless someone else has more feedback |
Start ReminderService initial load in the background
Thanks for introducing this change in the 1.1.3 release. We were experiencing a delay of nearly 40 seconds during silo startup due to the initial reminders load from Azure table storage. We've upgraded to the latest Orleans bits and now the general silo startup time has considerably decreased. Even with a lot of reminders, it is aligned with what we've seen in some tests that we performed in the past where we had no reminders at all in Azure table storage. Just wanted to share our findings. |
@giglesias Thanks for confirming! |
Awesome @giglesias, thanks for validating it. |
This way the service will not block the silo from fully initializing before all the reminders are loaded.
Avoid reading and starting all the reminders if the Silo is stopped before finishing the first load.