Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any progress on bucket-level expiry #1090

Open
macmarcdhas opened this issue Dec 2, 2021 · 9 comments
Open

any progress on bucket-level expiry #1090

macmarcdhas opened this issue Dec 2, 2021 · 9 comments

Comments

@macmarcdhas
Copy link

Hi,

We are using riak 3.0.4 multi-cluster deployment. We are currently having 3 buckets for various functional usecases, and would like to maintain custom expiration for these buckets and not as global level.

As per doc:
https://riak.com/posts/technical/riak-kv-2-2-release-highlights/index.html?p=13017.html

We have started development on bucket-level expiry which is expected to be available in a future release of Riak KV.

Any progress on this?

@martinsumner
Copy link
Contributor

martinsumner commented Dec 2, 2021

The original idea about how to implement this, hit some issues when trying to expire concurrently from AAE stores as well as backend stores. Essentially it was hard to stop there being a performance spike after AAE caches were rebuilt. The work then got shelved.

I have an alternative approach in mind now, to explicitly exclude buckets from AAE caches where those buckets have an expiry. This will mean no AAE on these buckets, as we consider them to be temporary. This work is not currently prioritised though.

Can you explain a bit more about your cluster - node count, backend, object counts etc. Would not having AAE for expiring buckets be an issue for you? If you use AAE at the moment, is it the original hashtree-based AAE or the new tictactree-based version? What sort of expiry times are you looking for (i.e. hours, days, months, or years?).

If I know there's a willing user, and I think there's a relatively easy to implement solution - then I'm happy to consider re-prioritising this again.

@martinsumner
Copy link
Contributor

Some other questions:

  • what is the rate at which objects would expire? (for objects with a relatively slow/steady rate of expiry there may be an alternative AAE-friendly root using riak_kv_eraser).
  • is it preferred that the expiry time be absolute from insert, or relative to the last time modified?

@shhnwz
Copy link

shhnwz commented Dec 2, 2021

The original idea about how to implement this, hit some issues when trying to expire concurrently from AAE stores as well as backend stores. Essentially it was hard to stop there being a performance spike after AAE caches were rebuilt. The work then got shelved.

I have an alternative approach in mind now, to explicitly exclude buckets from AAE caches where those buckets have an expiry. This will mean no AAE on these buckets, as we consider them to be temporary. This work is not currently prioritised though.

Can you explain a bit more about your cluster - node count, backend, object counts etc. Would not having AAE for expiring buckets be an issue for you? If you use AAE at the moment, is it the original hashtree-based AAE or the new tictactree-based version? What sort of expiry times are you looking for (i.e. hours, days, months, or years?).

If I know there's a willing user, and I think there's a relatively easy to implement solution - then I'm happy to consider re-prioritising this again.

Thanks for the quick response,
Our use case is pretty simple, the bucket is not CRDT and our retention period is for 7 Days max. We do have CRDT buckets also, but their retention period is for months.

@shhnwz
Copy link

shhnwz commented Dec 2, 2021

Some other questions:

  • what is the rate at which objects would expire? (for objects with a relatively slow/steady rate of expiry there may be an alternative AAE-friendly root using riak_kv_eraser).
  • is it preferred that the expiry time be absolute from insert, or relative to the last time modified?

Ok to consider insert time as point of reference.
Slow rate will also solve the purpose.

@martinsumner
Copy link
Contributor

What backend are you using - bitcask, memory, leveldb, leveled? Do you currently enable anti-entropy e.g.

anti_entropy = active

or

tictacaae_active = active

@shhnwz
Copy link

shhnwz commented Dec 2, 2021

What backend are you using - bitcask, memory, leveldb, leveled? Do you currently enable anti-entropy e.g.

anti_entropy = active

or

tictacaae_active = active

Bitcask with anti_entropy = active

@martinsumner
Copy link
Contributor

@martincox - is there anything you have done with per-bucket expiry in a bitcask/aae setup?

@martincox
Copy link
Contributor

martincox commented Dec 3, 2021

So we had added per-key expiry into bitcask, but it is only currently utilised as part of the delete path. The absolute expiration timestamp is encoded as part of the key - determined by an arbitrary value, defined in seconds, that is added to the insert timestamp of the tombstone. At the point of deletion, the keys are removed from the AAE store as with a normal delete. During a bitcask merge, the expiry timestamps are inspected and merged out where less than current time.

Within bitcask, there are options (not_found_expiring and not_found_expired) to control the visibility of KVs that are pending deletion via expiry - which prevents an AAE rebuild from re-inserting a bunch of deleted keys. These options are also used in fallback vnodes to preserve expired objects until handing off to the owner.

We talked about extending this to bucket-level properties, but didn't have the use-case for it, so work was never picked up. Although, I think it'd be fairly trivial to plumb it in.

@martinsumner
Copy link
Contributor

Not sure I understand how it is working.

Is this more about reaping tombstones than actually expiring keys? So you're running in keep mode, but add a special hidden timestamp to the tombstone so that eventually it is removed (via bitcask merge), rather than staying there forever?

The AAE issue I struggled with is with rebuilds. We don't want rebuilds to be co-ordinated, but is one vnode rebuilds in a preflist and the rebuild fold now no longer includes a lot of entries that were lost during merges - doesn't that lead to a lot of AAE mismatches at the next exchange? Do you accept this, and change the read repair process so that it reaps the expired tombstones rather than re-replicate them?

Or have I got the wrong end of the stick?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants