-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reminders scalability issues #947
Comments
That's the general direction we wanted to refactor the reminder service - to only have a fraction of overall data in memory that is relevant within a next relatively small window of time. |
ProposalIn order to improve the scalability of Orleans reminders, we should page them into memory based on when they are due, rather than loading up all (partitioned) reminders at silo startup. This work can effectively be sub-divided into 2 parts:
Reminder ServiceMuch like the existing implementation, the custom IReminderService will load reminders into memory, and trigger them on the appropriate grains when necessary. Unlike the existing implementation, it will load them based on the time period ('quantum') the original reminder was set for. The quantum-length will be configurable, allowing reminders to be loaded in at a configurable granularity (e.g. only reminders that will fire in the next 5 minutes, or the next hour). For the most part, this can be handled by the existing GrainTimer. A timer can be set up to load reminders from the data store periodically, based on the configured quantum-size. When the timer fires, reminders will be fetched from the storage which match both the current time, and the consistent hashing chunk the silo is responsible for. ChallengesThe most significant challenge with this approach is grain recurrence. Grains can be set to recur on a regular time period, like this RegisterOrUpdateReminder("reminderName", DateTime.UtcNow, TimeSpan.FromMinutes(5)); When all reminders are loaded into memory, the 'due-time' of a reminder can be calculated based on the recurrence interval and the first time the reminder was fired. However, when we're loading reminders into memory based only on their due time, we can't calculate this recurrence interval. We also can't store recurring reminders infinitely into the future (e.g. adding a database record for every 5 minutes, ad-infinitum). Our proposal is to introduce a 'clean-up' task which will schedule subsequent reminders in the future. Reminders will be marked for clean-up, and periodically a task will run which will calculate the next time a reminder should run (based on the recurrence period). These reminders will then explicitly be scheduled in the future.
Custom Storage ProvidersAs with the existing implementation, the storage provider for reminders will be configurable. A new interface, 'IQuantumReminderTable' will be introduced public interface IQuantumReminderTable : IReminderTable
{
Task<ReminderTableData> ReadRowsForQuantum(uint begin, uint end, DateTime now);
bool IsInCurrentQuantum(ReminderEntry entry, DateTime utcNow);
Task<IEnumerable<ReminderTableData>> GetPastReminders(uint begin, uint end, DateTime now);
} These operations are required to support querying based on a specified time period, and cleaning up past reminders (as discussed above). We aim to provide storage providers reaching parity with the existing implementations in the Core Orleans project (Azure Table Storage, SQL, and a Mock implementation). ChallengesThe primary challenge we have encountered is with the Azure Table Storage provider. Specifically, it is not clear how to create a partitioning scheme which supports both querying by quantum (necessary for paging in Reminders), and querying by grain (necessary for operations like deleting reminders) in a way which does not cause a full-table scan. It may be that we have to introduce 2 Azure tables, one which will effectively act as an index for the other. The first table could be the same as the existing Reminders table, with the second providing a mapping from time/quantum to reminder ID (which can be queried in the other table). This potentially poses some issues with keeping the tables consistent, as transactions can only be performed on entities in the same table and partition. Thanks for reading! I'd appreciate any feedback. |
I agree that the storage mapping is the hardest problem here. |
Hi guys, is there an ETA for this one or any update that you can share? |
Apologies - I didn't see Gabi's response to this issue! From the perspective of the table schemas - I'd expect one of the tables to be identical to the current reminders table. An alternative implementation would be to use an Azure Queue in order schedule the reminders (using the built in visibility timeout mechanism). In terms of our progress/an ETA, whilst I started implementing this our work has been re-prioritized slightly so I'm not currently actively working on it. |
If the PK is Quantum , then in the situation when all your Billion reminders have the same Quantum - 12 hours lets say, you will put all reminders in that index table in the same PK, which would not scale. Did I understand it right? There might not be a large number of different quanta. We need to use PK on something that is naturally very well partitioned. Like next tick time (assuming all reminders were not created at the same time) , but I don't know how to build such an index without re-keying (reshuffling the reminder after every tick). |
Yeah that's correct. The choice of quantum size is a trade-off - if you make it too big then you're paging lots of reminders into memory (effectively the issue with the current reminders system, which has a 'quantum' of infinite time). If you choose too small a quantum then no (or few) reminders will be in the same quantum, so you'll be doing lots of unnecessary queries to table storage. Ideally, you'd probably want around 1000 reminders per quantum, as that's the max you can retrieve in 1 table storage request. The appropriate quantum size is therefore application specific. If you set the quantum to be 1 tick (thereby partitioning by tick) its fairly unlikely you'd have more than 1 reminder per partition. That'd mean you'd be doing 1 query per reminder, which is quite a large overhead. |
OK, now I see! But you referred to something very different. You referred to an internal runtime tuning parameter, which controls how reminders are grouped in the table and how much is read together in one read. It is basically the "index shard/partition". I get it now. I think the most important point that you did not mention is that in your solution the index is "dynamic" and changes based on reminder tick (and not only when new reminders are created/deleted): it needs to be updated after every reminder tick. Once reminder ticks, it need to be reinserted into the index table into a different place. That is, the index table is organized in a chronological order of next time to tick. Just that you round it up into buckets/shards/partitions - next partition to page in. Did I now understand it correctly? |
@MSFT-AshleyIngram any feedback? Did I now understand it correctly? |
Yeah that seems correct. There are effectively 2 challenges - the first is keeping the 2 tables in sync, especially given we will have to update the index after a reminder triggers/ticks in order to handle recurring reminders. The second is what to do if the reminders table needs to be re-partitioned (e.g. if the quantum size changes). |
If I understand correctly, for OrleansRemindersTable in relational storage this essentially means calcuting using |
@MSFT-AshleyIngram, Why would the quantum ever change? Also, the index table is constantly being changed - after every reminder tick this reminder is removed from the index and re-inserted into a new place. Correct? So this is an opportunity to re-partition the table, if one wants to. But why would one want to? @veikkoeeva , indeed with relational storage this will be much easier and natural. |
I'm concerned about a design where every reminder tick would cause a write (potentially multiple writes) to the index table. That would make it not only more expensive to run, but also more brittle in case of storage outages/throttling. The original design we discussed more than a year ago didn't require writes upon ticks by imposing certain limitations on reminder periods and accuracy. |
Here's a copy of the original proposal. Constraints & AssumptionsIn the interest of time, there are things not considered in this prototype which are assumed to be solvable in a future iteration (terms are defined later in this document): • Caching policy for reminder bucket agents to keep more frequent reminders in memory. A menu of reminder periods will be supported: • Smallest supported period is 5 minutes. Semantics & TopologyReminders will be divided between buckets, each bucket containing reminders of a given period. • The number of buckets is predetermined and finite because the number of periods supported is predetermined and finite. Each bucket will be managed by a reminder bucket agent-- a grain that reads upcoming reminders from the database and triggers them. • One reminder bucket agent exists per bucket per silo. Quantum Affinity SpaceThe quantum affinity space (QAS) is defined as the number of quanta needed to equal the length of the period: QAS(rp, q) = rp / q where: The QAS cannot be fractional, so the quantum must be a factor of all available bucket periods. Quantum AffinityAn additional value called the quantum affinity must be calculated for each reminder. QA(gi, rn, qas) = UD(gr ++ rn) % qas where: • QA is a function that calculates the quantum affinity. WRT Azure Table StorageThe quantum affinity will be stored in a new column to make the information accessible. • used by a specified service id. Auzre Table performs best when it doesn't have to do a full table scan. Therefore, it's best if the partition key contain information that is constructed in such a way that most queries do not result in range scans instead. The following fields will be concatenated in order to create a partition key:
Fields 1-3 are queried by identity, not as a range. Field 4, however, is queried using a range. Placing this last in the field means that if we know fields 1, 2, and 3, we can query ranges of hash values because the partition keys are sorted according to hash value and bucketed according to the other values. For example, the following sorted partition keys are separated by implicit buckets: // BUCKET__CASH Serviceid0_category0_quantum0__00000000 Serviceid0_category0_quantum1__00000002 Serviceid0_category1_quantum0__00000000 Serviceid0_category1_quantum1__00000004 Serviceid1_category0_quantum0__00000000 Within each implicit bucket (e.g. Serviceid0_category0_quantum1), consistent hashes can be queried by range (0-1, 2-5, etc.). Algorithm StepsActivationWhen a reminder bucket agent is activated, it:
Timer TickWhen a reminder agent's quantum timer ticks, it:
QAIn+1=(QAIn + 1) % QAS Where:
a. Construct the partition key that the agent is interested by combining the grain reference descriptor, bucket, and QAIn+1.
a. Reminders may be triggered in parallel, if it works out in practice. Hand WavingFor simplicity, creation and deletion of reminders are handled through direct writes to the database. This is, in fact, the only time writes need to be made to the database. We'll need to change this strategy once we wish to support either caching or cancellation guarantees sooner than one quantum's worth of time. |
I agree that writing to table storage every time we trigger a reminder is problematic. We originally thought of this to resolve the issue of recurrent reminders. If my understanding of your proposal is correct Sergey, this would be roughly analogous to giving a reminder a label based on (say) the number of the hour in a week it occurs (from 1 to 168) and using that as a Partition Key. Assuming that a reminder only recurs once a week, it never needs to be updated (unless it is deleted/edited). If a reminder occurred more often than once a week, it would require duplicate entries in the index (say, if you wanted it to occur at 3am every day, you'd need entries for 3, 27, 51, etc). If a reminder occurred less often than once a week (say every fortnight) you'd be unable to represent it within this system. If your reminder happened more frequently than once an hour, you'd be unable to represent it either (at least in table storage) as the PK and RK would have to be unique. The 2 values (1 hour and 1 week) would therefore have to be configurable in order to ensure you can accurately represent the full range of reminders you need within your application. Is this correct? |
First of all, this is not my proposal. It was produced by a member of the team at the time - Michael. My interpretation of it is that it simply breaks a day into a 5-minute buckets, and a reminder can be scheduled only within those buckets. That's why it says:
So a reminder that you need to do something once a week would need to be scheduled to fire every day, and then the grain may decide to do nothing 6 out of 7 times. |
I agree that this is better than our original proposal. Its much better than having to shuffle reminders around to handle recurrence. Is there any reason you wouldn't make those 2 periods (5 mins and 24hrs) configurable, with some sensible defaults? That way, as in my example above you could change it to 1 hour and 1 week if you wanted to schedule longer distance reminders with less granularity (or whatever trade-off makes sense for that application). This seems to me to be slightly better than having the grain itself having to handle skipping days (which seems pretty similar to the scheduling that ReminderService handles). |
It's an interesting question. I think there will have to be an obvious shared starting point. For 24 hours it's easy - midnight UTC. If it's a week instead of 24 hours - midnight on Monday of some fixed year? A month is already a problem with unequal numbers of days in them. But what if you say it's 4 days - when would we start the each 4-day period? What would be the 'anchor' point? I guess we could choose an arbitrary day in the past (1/1/2000) and start counting days from that date. So I don't see any sensible settings other than 24 hours (or factors of it) or a week or N days. My hunch is that the 24 hours limit would be least confusing, but I see how N days could work. |
Yeah I think having the "upper-bound" in terms of a "number of days" would make most sense. For the "lower band" I think minutes would be suitable (I think there are many use-cases for a reminder which triggers on a higher frequency than 1m). |
Like to add a point regarding the "lower bound" of reminders period: Currently the reminder service supports minimal 1 minute intervals. This should remain for backward compatibility, as part of a customizable reminders service if not default one. I think SOLID is important here - current implementation is suitable for 'small' amount of `high frequency' reminders and can support less than 1 minute intervals, its 1 minute lower bound limitation is artificial and may be removed, so the suggestion above that can help with scaling to millions of reminders should in my mind be added as a separate reminder service, added in open-closed fashion. |
Persistent reminders are a great tool in Orleans belt. Can be also very useful to keep specific grains alive across silos, without requiring a direct invocation for reactivation (in my case I need it for maintenance clean-up logic after a period of time). |
No worries, it won't get lost. It's in the backlog, and contributions are more than welcome. :-) |
To add to this. We have roughly 20-30k of a specific type of grain that all fire off the RegisterOrUpdateReminder method in the activation of the grain. We've just implemented this and are seeing some degraded performance across our nodes. Response rates have gone up and our database is showing that the Update call on the clustered index in the reminder table is the most cpu intense transaction being made for a good 5 minutes after startup - I've briefly read some of the suggestions and it seems it may help our situation? |
@NoMercy82 Just to be clear on this, are you using ADO.NET for the reminders? If so, they're almost as slow as possible currently. Adding indexes will help, but the layout and a bit of the query would need to be redone. There's a lot of room for improvement for what comes to ADO.NET. |
@veikkoeeva - we're using ADO.NET also for the reminders. Until it will be redone, what indexes would you suggest adding? Thanks in advance |
@shlomiw Apologies for the delay. I'm not sure what would be the most useful index to create without testing. If you look at https://github.com/dotnet/orleans/blob/master/src/AdoNet/Orleans.Reminders.AdoNet/SQLServer-Reminders.sql#L102, the range queries are most often used (I assume), so maybe
You can test these also by capturing (or creating) a query, copying the table with contents and just making queries and checking the query plan characteristics. There's some more options at More information at https://gitter.im/dotnet/orleans?at=5ae8c72797e5506e048ca6cc. |
@veikkoeeva I look to reminder sql's and don't understand why are there checking of null values of parameters in query and not in the code? If one of the parameter is null do not run the query
|
@ifle Good question. Until recently everything was in one script, so the rationale was more clearly presented. If you look at https://github.com/dotnet/orleans/blob/master/src/AdoNet/Shared/SQLServer-Main.sql, you see the idea is that the database boundary is like an interface and Orleans sends in in certain format with a script to run it. I.e. the names and types matter. From that perspective it is a sanity check, defensive programming that also documents the intention. Shouldn't hurt the performance, adds a bit of robustness and maybe a bit opionately doesn't remove checks "because we can knowing somewhere else is code that checks the invariant and no change ever will expose a bug potentially serious ramificastions because of that". Just in case you -- or someone else reading this -- wonders why this is like it is, it matters to be able to change the queries or the layout in deployment specific ways either statically or dynamically. Like filegroups or schemas or even introduce dynamic partitioning of tables (e.g. by creating new tables on the fly and starting to use them). Additionally, not all features are available on all vendors or versions (e.g. in-memory tables with natively compiler procedures might make sense and one might even add them to this default implementation via flag if available or then adjust on one's own). Though this is documented elsewhere. |
@veikkoeeva - thank you very much for the lead. I'll keep track and monitor. Will add the indices as needed. |
@shlomiw Did you add the index, by the way? If so, what was the effect? |
@veikkoeeva - since my relevant grains have short life timespan, the reminders are being cleaned-up, and the table is being kept small. When we'll have more traffic coming in, it might grow enough to add the index. Anyway, I think it's important to add all the 'hot-spot' indices to the sql script, it can make a big performance difference. |
@shlomiw True. Let's do that unless a bigger refactoring takes place. :) |
We currently have a silo running on version 2.3.4 which uses Ado reminders and we have about 53,000 reminders which tick every minute or less. (Silo is setup on a cluster of 4 silos) All works well until we've updated to orleans 2.4.3 and the performance of our MariaDb server where reminders are persisted has degraded badly. The CPU and memory usage are maxing out and latency on the grains where ReceiveReminder runs have increased from 500ms/req to 20,000ms/req (aprox). We've tried several solutions to investigate the issue such as deploying only the csproj files with the updated nuget packages. We've turned on in memory reminders instead of Ado and silo works fine. We've attempted a previous version 2.4.2 but no luck. We've tried updating all orelans dlls to 2.4.3 except Microsoft.Orleans.Reminders.AdoNet still no luck. We've updated MySql.Data to latest 8.0.18 yet we still get these performance issues. We've tried adding the index on the orleansreminderstable, which made the CPU go slightly down, but we still get latency on grains which has a ripple effect of everything failing then. Can you kindly shed some light on this please? |
@claycass17 This sounds unrelated to the limitation that this issue is tracking. Can you open a separate issue and share logs? Between 2.3.4 and 2.4.3, I don't see any change that could have impacted behavior of reminders. So, my hunch is the problem is somewhere else. Logs might help to find that out. |
@claycass17 Can you share which kind of index you added? Do have the opportunity to switch back to the 2.3.4 version to double-check there is this dramatic difference between the 2.3.4 and 2.4.2 & 2.4.3? Can you say if the call frequency to database changed between the initial version switch? Or what is the frequency of reminder calls now? |
I can easily switch from 2.4.3 to 2.3.4 and the environment goes back to normal immediately on 2.3.4. As for query frequency, when tested locally using UseLocalhostClustering having about 1000 reminders I get the same amount of hits (29 hits) on both versions SELECT However on a dev environment having 53,000 reminders and using UseConsulClustering (with version 2.4.3) we get thousands of hits of the same query (having different GrainHash filters). |
@claycass17 Sorry if I have missed this, how do you host your deployment? E.g. Kubernetes, Linux/Windows some other? How easy or difficult would it be to create a project that reproduce the problem? Using https://github.com/dotnet/orleans/tree/master/Samples/OneBoxDeployment might be one option, you just add reminder grains, add a test or an API function that creates a lot of grains artificially (like the one about state) so that one can observe the case under the debugger. I'm myself unfortunatelly very blocked until the second week of December (one has to sleep sometimes), but my thinking here is that if this case can be replicated easily like this, maybe someone, maybe the core team, can easier time to troubleshoot. I have a .NET Core 3 update in progress at https://github.com/veikkoeeva/orleans/tree/update-oneboxdeployment-to-core3 if it matters, but I'm not sure when I get that done, December the latest of maybe earlier if I get a few other, very minor things solved. |
The index we added was simply on the GrainHash (create index orleansreminderstable_GrainHash_index on orleansreminderstable (GrainHash);) knowing ServiceId is part of a composite key. However the issue here is not the performance of the select, but more the frequency. We've captured the frequency of the SELECT quoted above in previous post on both versions in a 40min time span and these are the results..drum roll.. Orleans v2.3.4 = 2,144 hits, v2.4.3 = 251,133 hits on approx 53,000 reminders. |
Digging deeper into this we've noticed that 'Added Server..' log is being printed more than 13,000 times in 40min. The actual log comes from AddServer in ConsistentRingProvider when its called from SiloStatusChangeNotification, further more from NotifyObservers in SiloStatusListenerManager from ProcessMembershipUpdates. Eventually deep down these calls end up calling Orleans query with key ReadRangeRows2Key in large amounts. Our membership is handled via Consul. What could be changing the state of our silos? This issue is visible on version 2.4.3. but not on 2.3.4. |
Apologies for cluttering this post. A new issue has been reported here #6089 |
Orleans currently loads the entire reminders table into memory on Silo startup, partitioning reminders across available silos in the cluster, regardless of the time that reminders are set for.
This causes a limitation on the number of reminders that can be set at any given time, because of memory constraints across the cluster.
We propose paging reminders, loading only the reminders which will trigger within a configurable time period in the future.
The text was updated successfully, but these errors were encountered: