Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Removing grain from data store #4475

Closed
jan-johansson-mr opened this issue Apr 16, 2018 · 17 comments
Closed

Question: Removing grain from data store #4475

jan-johansson-mr opened this issue Apr 16, 2018 · 17 comments
Labels
✔️ Resolution: Answered Resolved because the question asked by the original author has been answered. question Status: Resolved
Milestone

Comments

@jan-johansson-mr
Copy link
Contributor

Hi,

What is the best way to remove a grain from data store (when the grain is not needed anymore)? The documentation describes how to persist, read and write - but not to remove from store.

Kind regards,
Jan

@jan-johansson-mr
Copy link
Contributor Author

Hi again,

Is ClearStateAsync the operation I'm looking for? Going to test... :-)

//Jan

@jan-johansson-mr
Copy link
Contributor Author

Nah,

It only cleared the state, it didn't remove the grain per se from storage.

//Jan

@sergeybykov
Copy link
Contributor

Behavior of ClearStateAsync is provider specific. For example, AzureTableGrainStorage has a bool property DeleteStateOnClear in AzureTableStorageOptions to control that.

@sergeybykov sergeybykov added this to the Triage milestone Apr 16, 2018
@jan-johansson-mr
Copy link
Contributor Author

Thanks Sergey,

I'll check DeleteStateOnClear and see what it does. I'm using PostgreSQL.

//Jan

@veikkoeeva
Copy link
Contributor

@jan-johansson-mr Currently the ADO.NET provider just clears the state, as you have seen. The reason is that removing potentially many rows from the index one-by-one will cause trouble in busy systems and clearing state saves space (likely). One option is a special, in-storage user-initiated scrubbing operation that batch-removes cleared rows and maybe fixes the index with presumably the best knowledge when this shuld be done. Though the index should work well even when severely fragmented.

A question: Would it feel sensible to add a flag like in some other providers? Maybe with information about the downsides and potential options (the actual removal could perhaps be a batch removal). It seems we need documentation either way.

@jan-johansson-mr
Copy link
Contributor Author

To be honest, I don't know.

It's very easy to find which rows have the cleared state (set to NULL) in the database, and perhaps an external 'clean up' might be needed now and then. But the amount of memory "leaked" by not removing the rows in the database is kind of negligible compared to the rows with populated state.

But I think it's good to document clearing the state and what it implies :-)

//Jan

@JillHeaden
Copy link
Contributor

HELP WANTED: Can someone provide me with some information about clearing the state and what it implies?
(Also, can someone please add the Help Wanted label to this issue? @jason-bragg)

@jan-johansson-mr
Copy link
Contributor Author

Hi Jill,

State management is kind of complicated. In my case I wanted to remove unused grains, to have a clean data set with no "leak" of old (unused) grains.

Here is an illustration from my use case. I'm using PostgreSQL as store for grains. Whenever I want to activate a grain that never been activated before, Orleans creates the grain for me. I don't create the grain explicitly. When I persist the grain state, a row is created in the database. Then the row continues to be updated with each successive persist.

At some point, the grain state is not needed anymore from my business point of view. So I need to make sure that the grain gets re-initialized again if the same ID would ever be used again, plus that I do not want the state to be "leaked" when not needed anymore (on DB). The most natural way to do this is to delete the grain instance from store (and thus the row in the database). But this is not supported outside of Azure (as pointed out earlier in this thread).

What I can do (with PostgreSQL) is to clear the state. The state is simply a column in the database where the grain state resides. However, not deleting the rows will leave the rows "dangling" with cleared state (column data in the rows are set to NULL).

I could write a simple SQL statement, looking up the rows with cleared state. But this is not a simple operation to do. Orleans owns the state management, and if I change the rows while Orleans is busy managing the same set, things can go wrong (classical race condition case).

So a simple resolution can be that, knowing in my case that the number of "dangling" database rows will be far less than the number of grains with state, I simply don't care. But this is in my case. In other cases the number of "dangling" grains can be much greater (so then it can become an issue).

Please let me know if you want more information from me, and I'll do my best.

BR, Jan

@jason-bragg
Copy link
Contributor

I'm removing the documentation tag from this, as we can't really document this for each provider. We need to have a uniform approach first, then we can document that approach.

@jason-bragg jason-bragg added the P3 label Aug 2, 2018
@jan-johansson-mr
Copy link
Contributor Author

Okay, thanks Jason-Brag.

@jason-bragg
Copy link
Contributor

jason-bragg commented Aug 3, 2018

@jan-johansson-mr, Reopening this because this is definitely a limitation in our storage patterns we should address.

We have the work scheduled for the 2.1.0 release, thought it's lower priority than other items in the release, so it may be pushed. Hopefully not.

@veikkoeeva
Copy link
Contributor

veikkoeeva commented Aug 4, 2018

To be on the safe side here (/cc @JillHeaden, @jan-johansson-mr), the "dangling" is indeed in quotes, because the storage is salvaged, but the ID information is left. It might be also that although a grain state has been removed for some period of time, a new grain with the same cluster ID, grain type and grain ID would be created. In this case it's one very fast UPDATE instead of an INSERT.

For ADO.NET both INSERT and DELETE are heavy operations and here especially DELETE would make deadlocking more prevalent. Currently the design creates a heap index. The linked article makes a good argument for using this construct for an insert heavy table. Especially so in the case of Orleans it wouldn't be a good move to use database generated sequential IDs. With the current construct even severe index fragmentation shouldn't be much of a concern (since even billions of IDs in the index will fit in memory and even if not, warm index data should).

In addition the index would be fragemented also (but it doesn't look like being such a problem). So though Orleans owns the state, one could start a scrubbing operation in-database that'll purge the rows that are in NULL state. If done in transaction, the case that a state would be "resurrected" shouldn't cause logically erroneous action. If one deletes states in multiple transactional batches with some percentages at a time, then maybe there's a window for an error (I'm not sure, maybe not).

This pattern is widely used in ADO.NET as soft delete, here adopted just for performance. I'm not sure, but maybe @JillHeaden can work on some notes at #1682 (comment). This might be something you're interested too, @jan-johansson-mr. Another longer point of discussion is at #1682 (comment).

@jason-bragg Mostly the providers seem to provide a flag to do a soft delete or a hard delete. The ADO.NET one doesn't provide the option since likely people just get into trouble with "singular deletes", but it could be added. :) One take-away here is that "batch delete" could be a useful thing, maybe even so that one could interecept the operation just prior commencing and augment the list as seen fit.

This also relates to ADO.NET documentation, so linking #4771.

@sergeybykov sergeybykov modified the milestones: 2.1.0, 2.2.0 Aug 20, 2018
@sergeybykov sergeybykov modified the milestones: 2.2.0, 2.3.0 Dec 12, 2018
@sergeybykov sergeybykov removed this from the 2.3.0 milestone Feb 25, 2019
@sergeybykov sergeybykov added this to the 2.4.0 milestone Feb 25, 2019
@sergeybykov sergeybykov modified the milestones: 2.4.0, 3.0.0 Aug 15, 2019
@sergeybykov sergeybykov modified the milestones: 3.0.0, Backlog Oct 1, 2019
@MV10
Copy link

MV10 commented Oct 26, 2019

@jan-johansson-mr you wrote:

I could write a simple SQL statement, looking up the rows with cleared state. But this is not a simple operation to do. Orleans owns the state management, and if I change the rows while Orleans is busy managing the same set, things can go wrong (classical race condition case).

It depends on what your grains are storing, of course, but in the financial industry we have regulatory requirements to delete older data, and in our case older data is rarely in active use, so we find it's safe to trim rows based on the storage table's ModifiedDate column. In our case a daily job will trim data more than 90 days old. We don't even need to null the payload first.

@veikkoeeva
Copy link
Contributor

veikkoeeva commented Oct 27, 2019

I can confirm the state does not need to be nulled before removing the rows. It is a handy way of seeing which states are "deleted" by Orleans. There can be application specific knowledge about this too, like seeing last operation over some threshold time ago.

If there happens to be an active grain that tries to interact with state that has been removed underneath, it causes a versioning exception and eventually in this Orleans recreates an initial state for the grain.

@jan-johansson-mr
Copy link
Contributor Author

jan-johansson-mr commented Oct 30, 2019

Thanks for the information!

Even though we may want data to be present "forever", because removing stuff can be harder than adding (due to implied dependencies etc), there are some legal concerns keeping sensitive data around. E.g. GDPR requires you to remove ALL sensitive information about individuals in the system, if there is no particular reason granting you exemption (after some grace period - I think one month or so).

In other words, there is at least one business case to remove grains (with sensitive data) from the store :-)

There is however one way to handle this case though, without removing the grain per se, and that is to simply overwrite the sensitive data in the grain, e.g.

** REDACTED **

This strategy can actually be safer than removing from the store, since the removed data may not actually be removed from the store, just references dropped (maybe some villain takes the hard disc and restore all the deleted data anyway in some future).

@veikkoeeva
Copy link
Contributor

@jan-johansson-mr Maybe. You need to ensure the IDs aren't pseydonymous either so they do not act as surrogates to the real person. Backups may also be a weak spot.

If you want to keep the "blob semantics", you can also modify the script so that sensitive data goes to a schema and filegroup and a table of their own and is perhaps even encrypted. For that you need to add a table appropriately and add something like switch-case to reading and writing that chooses the right target.

@ReubenBond ReubenBond removed the P3 label Sep 3, 2021
@rafikiassumani-msft rafikiassumani-msft added the ✔️ Resolution: Answered Resolved because the question asked by the original author has been answered. label Nov 18, 2021
@ghost
Copy link

ghost commented Nov 18, 2021

Thanks for contacting us. We believe that the question you've raised has been answered. If you still feel a need to continue the discussion, feel free to reopen the issue and add your comments.

@ghost ghost added the Status: Resolved label Nov 18, 2021
@ghost ghost closed this as completed Nov 18, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
✔️ Resolution: Answered Resolved because the question asked by the original author has been answered. question Status: Resolved
Projects
None yet
Development

No branches or pull requests

8 participants