Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow clients to be given key range restrictions #1419

Open
alexmiller-apple opened this issue Apr 4, 2019 · 11 comments
Open

Allow clients to be given key range restrictions #1419

alexmiller-apple opened this issue Apr 4, 2019 · 11 comments

Comments

@alexmiller-apple
Copy link
Contributor

alexmiller-apple commented Apr 4, 2019

To try and provide some layer of safety for use cases that have multiple different applications sharing an FDB cluster, it'd be nice to be able to forcefully restrict clients to only be able to interact with keys owned by their application.

I'm envisioning something like adding to the environment of clients
FDB_NETWORK_OPTION_MINIMUM_KEY_PERMITTED=\x02
FDB_NETWORK_OPTION_MAXIMUM_KEY_PERMITTED=\x03

It's possible that for layers, we'd need to allow multiple ranges of restrictions, to allow the layer metadata subspace to be accessed as well as the data subspace.

And we would likely need to always permit access to \xff keys if ACCESS_SYSTEM_KEYS is set, and potentially also the directory layer subspace as well.

Clients that access a key outside of [min, max) would receive a non-retryable error restricted_key_accessed().

@xumengpanda
Copy link
Contributor

xumengpanda commented Apr 4, 2019

Does this mean that we want multi-tenant support?
If we are going to the multi-tenant direction, it may be better to first define the problem scope.
For example, in a multi-tenant environment, do we allow any of the following things to happen:

  1. When a tenant got rate limited, will other tenant be affected?
  2. Should we limit the space (normal key space and system key space) a tenant uses?
    We probably don't want a crazy tenant uses all space.
  3. Should we assign different priorities to different types of tenants?
    etc...

@alexmiller-apple
Copy link
Contributor Author

Does this mean that we want multi-tenant support?

I am not proposing multi-tenant support, but that doesn't stop use cases from existing where one ends up packing disjoint sets of clients into one cluster, and having an extra layer of enforced safety there to prevent a stray clearrange \x00 \xff is always nice.

@ajbeamon
Copy link
Contributor

ajbeamon commented Apr 5, 2019

Would this restriction be just for writes or would it also include reads?

And we would likely need to always permit access to \xff keys if ACCESS_SYSTEM_KEYS is set, and potentially also the directory layer subspace as well.

If reads are restricted, we also have READ_SYSTEM_KEYS that would probably want to apply the same logic.

For the directory layer, it may get a little tricky. If you wanted to restrict a client to using the directory /foo, then you could set their range to be the one with the prefix assigned to the /foo directory. However, if the client opens a subdirectory /foo/bar, it will be assigned a disjoint prefix and would require specifying an additional range. There exist ways to work around this (non-directory subspaces, directory partitions, or even separate directory layer instances), though they have tradeoffs and may not be the typical patterns used with the directory layer.

To support arbitrary uses of the directory layer, then, you'd probably need to be able to specify multiple ranges, and you should probably expect these ranges to be added programmatically in response to opening directories. Done this way, though, you probably lose some of the protection benefits (though certainly not all).

Another question is what behavior key selectors would have when they extend past the legal ranges. Currently, we clamp the result at the legal range boundary (e.g. a key selector that resolves into system key space returns \xff if system key access is disallowed), which seems like it could be a reasonable choice in this case as well. Presumably we wouldn't want key selectors to span across different ranges.

One possible way we could manage the options is to have this be a transaction option with a database option for the default behavior. That would still provide the capability to set the ranges globally that you've described while offering the flexibility for a particular transaction to request more if needed.

@alexmiller-apple
Copy link
Contributor Author

However, if the client opens a subdirectory /foo/bar, it will be assigned a disjoint prefix and would require specifying an additional range.

Well.... darn. I was hoping for something external to the process, as an extra enforcement of safety, but I'll now agree that Database/Transaction option sounds far more usable.

@gregwebs
Copy link

why is enforcing 2 ranges more difficult than one?

you should probably expect these ranges to be added programmatically in response to opening directories. Done this way, though, you probably lose some of the protection benefits (though certainly not all).

But if the goal is to restrict a client to just its key space then this would work well and there is no loss of protection benefits.

@KrzysFR
Copy link
Contributor

KrzysFR commented Jan 28, 2020

FYI, I just removed such a key range restriction feature from the .NET binding, based on directory partitions (the easy case, with a constant top-level prefix) because this is not possible to do this efficiently and safely (at least client side) when directory subspaces and partitions can be renamed and/or moved at anytime (maintenance, emergency backup restore into a new cluster).

It used to work by resolving the partition prefix when opening a database, and then forbidding any key outside that prefix (with exceptions for the \xFE directory layer prefix, and \xFF system keys). I would even "bound check" key selectors to return \xFF instead of a key that would fall outside the allowed range to give the illusion that there is nothing outside the restricted prefix.

But since partitions can move, and even with an efficient cache (build on top of the metadata version key), there is still at least one asynchronous lookup at the start of each transaction to revalidate the prefix. Since all the APIs around key serialization where non async, and based on cached instances (created during layer init), this was impossible to salvage.

I removed this feature, and instead tried to guide the API into using subspace "locations" (that are only the path to a subspace) that get "resolved" inside a transaction into the actual key binary prefix (using a cache), through an async API.

In my experience, the vast majority of transaction would always work inside a single top-level directory partition, so a single restricted key range would probably be enough... but since the value of the prefix can change (infrequently but without notice), I'm not sure how the API would look like.

I guess if there was some way to create key "aliases" that would be resolved by the cluster into the actual prefix of that particular directory? But that means that the Directory Layer would need to be a first level concept understood deeply by all the components of the database, not a layer build on top of it.

I wish I had a time machine to kick myself 6 years ago when I added this feature, because removing it and going through ALL the codebase to update to the new pattern was a huge pain... >_<

@gregwebs
Copy link

@KrzysFR Am I right that these problems occur because you are implementing it in a client, and that if it were implemented directly in FDB none of this would be an issue? Note that this would require the client to authenticate.

@KrzysFR
Copy link
Contributor

KrzysFR commented Jan 28, 2020

Without some sort of key aliasing, I don't think that's an issue of client-side vs db-side: there are some layers that reference other keys in the values, so if the prefix can change, this makes it a lot more difficult.

If you are using the Directory Layer, you will have to pay some latency cost at each transaction, somehow, to re-validate the key prefixes against concurrent change.

If you are NOT using the directory layer, and instead rely on your own system, then this is much easier, and a key range restriction is very easy to do (even at the binding level). But when you have to mix with the Directory Layer (or any other way of have dynamic prefixes), I'm not sure how you'd do it with a single global key space.

Maybe with key aliases, or having sub-trees like some other key value stores do? (I'm thinking of the Voron storage engine used by RavenDB, see https://www.slideshare.net/ayenderahien/voron starting from slide 13).

Maybe Redwood, with key prefix compression can solve this by simply making the issue go away, but we will miss some of the features of the Directory Layer (temporary directories that are atomically swapped with the "active" directory).

@gregwebs
Copy link

The directory layer is implemented in the client? But if it were implemented directly in FDB the latency cost could approach 0?

@KrzysFR
Copy link
Contributor

KrzysFR commented Jan 28, 2020

You'd need a way to explain to fdb which part of the key (or value) is the prefix for a specific directory, and which part is regular data, which would make the API much more complicated.

Though I guess that if we had such a feature, this would make a lot of things way easier :) This is somewhat related to pushdown predicates, but focused on creating key aliases...

@gregwebs
Copy link

I agree that resolving a directory alias that is mutable from the client in a pessimistic implementation adds latency and in an optimistic implementation requires retrying failed requests.

We are getting into backwards compatibility now. A simple API would be to add a new field for the directory alias. If empty, someone is using the default global directory. Note that this is in fact backwards compatible in that someone can keep using the directory layer and leave the directory alias field empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants