-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Removing grain from data store #4475
Comments
Hi again, Is //Jan |
Nah, It only cleared the state, it didn't remove the grain per se from storage. //Jan |
Behavior of |
Thanks Sergey, I'll check //Jan |
@jan-johansson-mr Currently the ADO.NET provider just clears the state, as you have seen. The reason is that removing potentially many rows from the index one-by-one will cause trouble in busy systems and clearing state saves space (likely). One option is a special, in-storage user-initiated scrubbing operation that batch-removes cleared rows and maybe fixes the index with presumably the best knowledge when this shuld be done. Though the index should work well even when severely fragmented. A question: Would it feel sensible to add a flag like in some other providers? Maybe with information about the downsides and potential options (the actual removal could perhaps be a batch removal). It seems we need documentation either way. |
To be honest, I don't know. It's very easy to find which rows have the cleared state (set to NULL) in the database, and perhaps an external 'clean up' might be needed now and then. But the amount of memory "leaked" by not removing the rows in the database is kind of negligible compared to the rows with populated state. But I think it's good to document clearing the state and what it implies :-) //Jan |
HELP WANTED: Can someone provide me with some information about clearing the state and what it implies? |
Hi Jill, State management is kind of complicated. In my case I wanted to remove unused grains, to have a clean data set with no "leak" of old (unused) grains. Here is an illustration from my use case. I'm using PostgreSQL as store for grains. Whenever I want to activate a grain that never been activated before, Orleans creates the grain for me. I don't create the grain explicitly. When I persist the grain state, a row is created in the database. Then the row continues to be updated with each successive persist. At some point, the grain state is not needed anymore from my business point of view. So I need to make sure that the grain gets re-initialized again if the same ID would ever be used again, plus that I do not want the state to be "leaked" when not needed anymore (on DB). The most natural way to do this is to delete the grain instance from store (and thus the row in the database). But this is not supported outside of Azure (as pointed out earlier in this thread). What I can do (with PostgreSQL) is to clear the state. The state is simply a column in the database where the grain state resides. However, not deleting the rows will leave the rows "dangling" with cleared state (column data in the rows are set to NULL). I could write a simple SQL statement, looking up the rows with cleared state. But this is not a simple operation to do. Orleans owns the state management, and if I change the rows while Orleans is busy managing the same set, things can go wrong (classical race condition case). So a simple resolution can be that, knowing in my case that the number of "dangling" database rows will be far less than the number of grains with state, I simply don't care. But this is in my case. In other cases the number of "dangling" grains can be much greater (so then it can become an issue). Please let me know if you want more information from me, and I'll do my best. BR, Jan |
I'm removing the documentation tag from this, as we can't really document this for each provider. We need to have a uniform approach first, then we can document that approach. |
Okay, thanks Jason-Brag. |
@jan-johansson-mr, Reopening this because this is definitely a limitation in our storage patterns we should address. We have the work scheduled for the 2.1.0 release, thought it's lower priority than other items in the release, so it may be pushed. Hopefully not. |
To be on the safe side here (/cc @JillHeaden, @jan-johansson-mr), the For ADO.NET both In addition the index would be fragemented also (but it doesn't look like being such a problem). So though Orleans owns the state, one could start a scrubbing operation in-database that'll purge the rows that are in This pattern is widely used in ADO.NET as soft delete, here adopted just for performance. I'm not sure, but maybe @JillHeaden can work on some notes at #1682 (comment). This might be something you're interested too, @jan-johansson-mr. Another longer point of discussion is at #1682 (comment). @jason-bragg Mostly the providers seem to provide a flag to do a soft delete or a hard delete. The ADO.NET one doesn't provide the option since likely people just get into trouble with "singular deletes", but it could be added. :) One take-away here is that "batch delete" could be a useful thing, maybe even so that one could interecept the operation just prior commencing and augment the list as seen fit. This also relates to ADO.NET documentation, so linking #4771. |
@jan-johansson-mr you wrote:
It depends on what your grains are storing, of course, but in the financial industry we have regulatory requirements to delete older data, and in our case older data is rarely in active use, so we find it's safe to trim rows based on the storage table's |
I can confirm the state does not need to be nulled before removing the rows. It is a handy way of seeing which states are "deleted" by Orleans. There can be application specific knowledge about this too, like seeing last operation over some threshold time ago. If there happens to be an active grain that tries to interact with state that has been removed underneath, it causes a versioning exception and eventually in this Orleans recreates an initial state for the grain. |
Thanks for the information! Even though we may want data to be present "forever", because removing stuff can be harder than adding (due to implied dependencies etc), there are some legal concerns keeping sensitive data around. E.g. GDPR requires you to remove ALL sensitive information about individuals in the system, if there is no particular reason granting you exemption (after some grace period - I think one month or so). In other words, there is at least one business case to remove grains (with sensitive data) from the store :-) There is however one way to handle this case though, without removing the grain per se, and that is to simply overwrite the sensitive data in the grain, e.g.
This strategy can actually be safer than removing from the store, since the removed data may not actually be removed from the store, just references dropped (maybe some villain takes the hard disc and restore all the deleted data anyway in some future). |
@jan-johansson-mr Maybe. You need to ensure the IDs aren't pseydonymous either so they do not act as surrogates to the real person. Backups may also be a weak spot. If you want to keep the "blob semantics", you can also modify the script so that sensitive data goes to a schema and filegroup and a table of their own and is perhaps even encrypted. For that you need to add a table appropriately and add something like |
Thanks for contacting us. We believe that the question you've raised has been answered. If you still feel a need to continue the discussion, feel free to reopen the issue and add your comments. |
Hi,
What is the best way to remove a grain from data store (when the grain is not needed anymore)? The documentation describes how to persist, read and write - but not to remove from store.
Kind regards,
Jan
The text was updated successfully, but these errors were encountered: