Question: Removing grain from data store #4475

jan-johansson-mr · 2018-04-16T17:07:00Z

Hi,

What is the best way to remove a grain from data store (when the grain is not needed anymore)? The documentation describes how to persist, read and write - but not to remove from store.

Kind regards,
Jan

jan-johansson-mr · 2018-04-16T17:11:12Z

Hi again,

Is ClearStateAsync the operation I'm looking for? Going to test... :-)

//Jan

jan-johansson-mr · 2018-04-16T17:19:28Z

Nah,

It only cleared the state, it didn't remove the grain per se from storage.

//Jan

sergeybykov · 2018-04-16T17:45:27Z

Behavior of ClearStateAsync is provider specific. For example, AzureTableGrainStorage has a bool property DeleteStateOnClear in AzureTableStorageOptions to control that.

jan-johansson-mr · 2018-04-16T18:11:38Z

Thanks Sergey,

I'll check DeleteStateOnClear and see what it does. I'm using PostgreSQL.

//Jan

veikkoeeva · 2018-04-28T12:38:20Z

@jan-johansson-mr Currently the ADO.NET provider just clears the state, as you have seen. The reason is that removing potentially many rows from the index one-by-one will cause trouble in busy systems and clearing state saves space (likely). One option is a special, in-storage user-initiated scrubbing operation that batch-removes cleared rows and maybe fixes the index with presumably the best knowledge when this shuld be done. Though the index should work well even when severely fragmented.

A question: Would it feel sensible to add a flag like in some other providers? Maybe with information about the downsides and potential options (the actual removal could perhaps be a batch removal). It seems we need documentation either way.

jan-johansson-mr · 2018-04-28T13:56:13Z

To be honest, I don't know.

It's very easy to find which rows have the cleared state (set to NULL) in the database, and perhaps an external 'clean up' might be needed now and then. But the amount of memory "leaked" by not removing the rows in the database is kind of negligible compared to the rows with populated state.

But I think it's good to document clearing the state and what it implies :-)

//Jan

JillHeaden · 2018-07-18T15:51:07Z

HELP WANTED: Can someone provide me with some information about clearing the state and what it implies?
(Also, can someone please add the Help Wanted label to this issue? @jason-bragg)

jan-johansson-mr · 2018-07-18T17:48:42Z

Hi Jill,

State management is kind of complicated. In my case I wanted to remove unused grains, to have a clean data set with no "leak" of old (unused) grains.

Here is an illustration from my use case. I'm using PostgreSQL as store for grains. Whenever I want to activate a grain that never been activated before, Orleans creates the grain for me. I don't create the grain explicitly. When I persist the grain state, a row is created in the database. Then the row continues to be updated with each successive persist.

At some point, the grain state is not needed anymore from my business point of view. So I need to make sure that the grain gets re-initialized again if the same ID would ever be used again, plus that I do not want the state to be "leaked" when not needed anymore (on DB). The most natural way to do this is to delete the grain instance from store (and thus the row in the database). But this is not supported outside of Azure (as pointed out earlier in this thread).

What I can do (with PostgreSQL) is to clear the state. The state is simply a column in the database where the grain state resides. However, not deleting the rows will leave the rows "dangling" with cleared state (column data in the rows are set to NULL).

I could write a simple SQL statement, looking up the rows with cleared state. But this is not a simple operation to do. Orleans owns the state management, and if I change the rows while Orleans is busy managing the same set, things can go wrong (classical race condition case).

So a simple resolution can be that, knowing in my case that the number of "dangling" database rows will be far less than the number of grains with state, I simply don't care. But this is in my case. In other cases the number of "dangling" grains can be much greater (so then it can become an issue).

Please let me know if you want more information from me, and I'll do my best.

BR, Jan

jason-bragg · 2018-08-01T20:21:38Z

I'm removing the documentation tag from this, as we can't really document this for each provider. We need to have a uniform approach first, then we can document that approach.

jan-johansson-mr · 2018-08-03T19:00:08Z

Okay, thanks Jason-Brag.

jason-bragg · 2018-08-03T20:19:09Z

@jan-johansson-mr, Reopening this because this is definitely a limitation in our storage patterns we should address.

We have the work scheduled for the 2.1.0 release, thought it's lower priority than other items in the release, so it may be pushed. Hopefully not.

veikkoeeva · 2018-08-04T08:09:51Z

To be on the safe side here (/cc @JillHeaden, @jan-johansson-mr), the "dangling" is indeed in quotes, because the storage is salvaged, but the ID information is left. It might be also that although a grain state has been removed for some period of time, a new grain with the same cluster ID, grain type and grain ID would be created. In this case it's one very fast UPDATE instead of an INSERT.

For ADO.NET both INSERT and DELETE are heavy operations and here especially DELETE would make deadlocking more prevalent. Currently the design creates a heap index. The linked article makes a good argument for using this construct for an insert heavy table. Especially so in the case of Orleans it wouldn't be a good move to use database generated sequential IDs. With the current construct even severe index fragmentation shouldn't be much of a concern (since even billions of IDs in the index will fit in memory and even if not, warm index data should).

In addition the index would be fragemented also (but it doesn't look like being such a problem). So though Orleans owns the state, one could start a scrubbing operation in-database that'll purge the rows that are in NULL state. If done in transaction, the case that a state would be "resurrected" shouldn't cause logically erroneous action. If one deletes states in multiple transactional batches with some percentages at a time, then maybe there's a window for an error (I'm not sure, maybe not).

This pattern is widely used in ADO.NET as soft delete, here adopted just for performance. I'm not sure, but maybe @JillHeaden can work on some notes at #1682 (comment). This might be something you're interested too, @jan-johansson-mr. Another longer point of discussion is at #1682 (comment).

@jason-bragg Mostly the providers seem to provide a flag to do a soft delete or a hard delete. The ADO.NET one doesn't provide the option since likely people just get into trouble with "singular deletes", but it could be added. :) One take-away here is that "batch delete" could be a useful thing, maybe even so that one could interecept the operation just prior commencing and augment the list as seen fit.

This also relates to ADO.NET documentation, so linking #4771.

MV10 · 2019-10-26T10:29:36Z

@jan-johansson-mr you wrote:

I could write a simple SQL statement, looking up the rows with cleared state. But this is not a simple operation to do. Orleans owns the state management, and if I change the rows while Orleans is busy managing the same set, things can go wrong (classical race condition case).

It depends on what your grains are storing, of course, but in the financial industry we have regulatory requirements to delete older data, and in our case older data is rarely in active use, so we find it's safe to trim rows based on the storage table's ModifiedDate column. In our case a daily job will trim data more than 90 days old. We don't even need to null the payload first.

veikkoeeva · 2019-10-27T18:31:35Z

I can confirm the state does not need to be nulled before removing the rows. It is a handy way of seeing which states are "deleted" by Orleans. There can be application specific knowledge about this too, like seeing last operation over some threshold time ago.

If there happens to be an active grain that tries to interact with state that has been removed underneath, it causes a versioning exception and eventually in this Orleans recreates an initial state for the grain.

jan-johansson-mr · 2019-10-30T10:33:54Z

Thanks for the information!

Even though we may want data to be present "forever", because removing stuff can be harder than adding (due to implied dependencies etc), there are some legal concerns keeping sensitive data around. E.g. GDPR requires you to remove ALL sensitive information about individuals in the system, if there is no particular reason granting you exemption (after some grace period - I think one month or so).

In other words, there is at least one business case to remove grains (with sensitive data) from the store :-)

There is however one way to handle this case though, without removing the grain per se, and that is to simply overwrite the sensitive data in the grain, e.g.

** REDACTED **

This strategy can actually be safer than removing from the store, since the removed data may not actually be removed from the store, just references dropped (maybe some villain takes the hard disc and restore all the deleted data anyway in some future).

veikkoeeva · 2019-10-30T13:16:37Z

@jan-johansson-mr Maybe. You need to ensure the IDs aren't pseydonymous either so they do not act as surrogates to the real person. Backups may also be a weak spot.

If you want to keep the "blob semantics", you can also modify the script so that sensitive data goes to a schema and filegroup and a table of their own and is perhaps even encrypted. For that you need to add a table appropriately and add something like switch-case to reading and writing that chooses the right target.

ghost · 2021-11-18T05:01:12Z

Thanks for contacting us. We believe that the question you've raised has been answered. If you still feel a need to continue the discussion, feel free to reopen the issue and add your comments.

sergeybykov added the question label Apr 16, 2018

sergeybykov added this to the Triage milestone Apr 16, 2018

ReubenBond added the Needs: documentation 📄 label Jun 25, 2018

ReubenBond assigned JillHeaden Jun 25, 2018

ReubenBond modified the milestones: Triage, 2.0.0-docs-tests, 2.1.0 Jun 25, 2018

jason-bragg removed the Needs: documentation 📄 label Aug 1, 2018

jason-bragg unassigned JillHeaden Aug 1, 2018

jason-bragg added the P3 label Aug 2, 2018

jan-johansson-mr closed this as completed Aug 3, 2018

jason-bragg reopened this Aug 3, 2018

sergeybykov modified the milestones: 2.1.0, 2.2.0 Aug 20, 2018

sergeybykov modified the milestones: 2.2.0, 2.3.0 Dec 12, 2018

sergeybykov removed this from the 2.3.0 milestone Feb 25, 2019

sergeybykov added this to the 2.4.0 milestone Feb 25, 2019

sergeybykov modified the milestones: 2.4.0, 3.0.0 Aug 15, 2019

sergeybykov modified the milestones: 3.0.0, Backlog Oct 1, 2019

ReubenBond removed the P3 label Sep 3, 2021

rafikiassumani-msft added the ✔️ Resolution: Answered Resolved because the question asked by the original author has been answered. label Nov 18, 2021

ghost added the Status: Resolved label Nov 18, 2021

ghost closed this as completed Nov 18, 2021

ghost locked as resolved and limited conversation to collaborators Dec 18, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Removing grain from data store #4475

Question: Removing grain from data store #4475

jan-johansson-mr commented Apr 16, 2018

jan-johansson-mr commented Apr 16, 2018

jan-johansson-mr commented Apr 16, 2018

sergeybykov commented Apr 16, 2018

jan-johansson-mr commented Apr 16, 2018

veikkoeeva commented Apr 28, 2018

jan-johansson-mr commented Apr 28, 2018

JillHeaden commented Jul 18, 2018

jan-johansson-mr commented Jul 18, 2018

jason-bragg commented Aug 1, 2018

jan-johansson-mr commented Aug 3, 2018

jason-bragg commented Aug 3, 2018 •

edited

Loading

veikkoeeva commented Aug 4, 2018 •

edited

Loading

MV10 commented Oct 26, 2019

veikkoeeva commented Oct 27, 2019 •

edited

Loading

jan-johansson-mr commented Oct 30, 2019 •

edited

Loading

veikkoeeva commented Oct 30, 2019

ghost commented Nov 18, 2021

Question: Removing grain from data store #4475

Question: Removing grain from data store #4475

Comments

jan-johansson-mr commented Apr 16, 2018

jan-johansson-mr commented Apr 16, 2018

jan-johansson-mr commented Apr 16, 2018

sergeybykov commented Apr 16, 2018

jan-johansson-mr commented Apr 16, 2018

veikkoeeva commented Apr 28, 2018

jan-johansson-mr commented Apr 28, 2018

JillHeaden commented Jul 18, 2018

jan-johansson-mr commented Jul 18, 2018

jason-bragg commented Aug 1, 2018

jan-johansson-mr commented Aug 3, 2018

jason-bragg commented Aug 3, 2018 • edited Loading

veikkoeeva commented Aug 4, 2018 • edited Loading

MV10 commented Oct 26, 2019

veikkoeeva commented Oct 27, 2019 • edited Loading

jan-johansson-mr commented Oct 30, 2019 • edited Loading

veikkoeeva commented Oct 30, 2019

ghost commented Nov 18, 2021

jason-bragg commented Aug 3, 2018 •

edited

Loading

veikkoeeva commented Aug 4, 2018 •

edited

Loading

veikkoeeva commented Oct 27, 2019 •

edited

Loading

jan-johansson-mr commented Oct 30, 2019 •

edited

Loading