-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache canonicalization #485
Cache canonicalization #485
Conversation
6a68f48
to
35014ab
Compare
35014ab
to
4fb4e05
Compare
internal/dispatch/caching/caching.go
Outdated
requestKey := dispatch.CheckRequestToKey(req) | ||
if cd.nsm != nil && req.ObjectAndRelation.Namespace != req.Subject.Namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking if the nsm is nil, can there just be variants of the dispatcher that can do canonicalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose, although the only reason we have this is to remove the need for a mock namespace manager in the caching testing. Do you prefer I just mock out the nsm there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not just use a real namespace manager in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but then we also need a real datastore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's value to a "KeyProvider" abstraction, and the disptacher just takes a reference to a key provider.
When there's a datastore available, we use a key provider backed w/ a nsm so that we get the pre-calculated cache keys, but for testing we can inject a simple static keyer, etc
This would also make it easier to get the canonical keys outside of the dispatcher, though I don't have any particular use case for that in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good idea
internal/dispatch/caching/caching.go
Outdated
requestKey := dispatch.CheckRequestToKey(req) | ||
if cd.nsm != nil && req.ObjectAndRelation.Namespace != req.Subject.Namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed in slack, but we should determine if this branch needs to happen for any checks to the same namespace, or if we can only worry about checks with relations to the same object (or ideally avoid it entirely, though I'm not sure that's possible)
internal/dispatch/dispatch.go
Outdated
|
||
// NOTE: canonical cache keys are only unique *within* a version of a namespace, so they used | ||
// essentially as a new relation name in the cache key. | ||
return fmt.Sprintf("check//canonical/%s:%s#%s@%s@%s", req.ObjectAndRelation.Namespace, req.ObjectAndRelation.ObjectId, possibleCanonicalKey, tuple.StringONR(req.Subject), req.Metadata.AtRevision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is arguably a separate issue entirely, but why not share cache with lookup/expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it would require a significant amount of reworking of the results from those caches (not to mention that lookup is not necessary going to have the full results set since it supports limits)
internal/dispatch/dispatch.go
Outdated
|
||
// CheckRequestToKeyWithPossibleCanonical converts a check request into a cache key based possibly | ||
// on the canonical key, or the relation name if the canonical key is empty | ||
func CheckRequestToKeyWithPossibleCanonical(req *v1.DispatchCheckRequest, possibleCanonicalKey string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we unify this with CheckRequestToKey
?
i.e. pass in the relation name as the canonical key if we don't have a real canonical key, and then compute the cache keys the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can't have the same key as there is a non-zero possibility someone creates a relation with a name that matches the computed hash, which would result in the cache keys overlapping incorrectly. I explicitly have different prefixes to prevent that from occurring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always compute the cache keys ourselves ahead of time, if that's a concern why not just detect it and either reject the schema or change the canonical key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons:
(1) Its not backwards compatible and limiting the names of relations to prevent overlap is a user facing error for an internal implementation
(2) Why do so when we can simply just construct a different key to guarantee no overlap? The cache itself is still shared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the canonical key if there's an existing relation with that hash (which I would be surprised if anyone, ever, hits) would also guarantee no overlap
we can pick a deterministic way to avoid conflicts: if hash(rudd) is the same as some user-provided name, calculate hash(hash(rudd)), etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're hashing the bdd output ourselves, and it's fnv which is much more likely to collide than a crypto hash fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it is isolated to the BDD input, vs the relation and BDD. I very much do not want to mix them. Why the push to have a single cache key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just trying to push runtime decisions into compile-time (since we're making exactly this type of decision at compile time already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree. We have perfect information at schema compile time. We can just straight up check if there are any collisions if that's a real concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "upside" of the design is to remove a conditional (?) and yet the downsides are:
- If we ever mess it up, the cache will return incorrect results
- If there is a collision, we either have to issue an error to the user (bad) or pick a new revision (which could completely cascade forward)
I really don't see any upside of real value and major downsides in additional complexity at annotation time OR user error... plus, we'll still need the conditional anyway because we won't necessarily have a cache key for older schemas anyway
4fb4e05
to
4b85d5a
Compare
Updated |
ed566c0
to
1db92b8
Compare
// We begin by assigning a unique integer index to each relation and arrow found for all | ||
// expressions in the namespace: | ||
// definition somenamespace { | ||
// relation first: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are extra spaces before relation and the arrow points to the middle of the word relation.
// ^ index 0 | ||
// relation second: ... | ||
// ^ index 1 | ||
// permission someperm = second + (first - third->something) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
third
should probably be shown as a permission or a relation.
// } | ||
// | ||
// These indexes are then used with the rudd library to build the expression: | ||
// someperm => `bdd.And(bdd.Ithvar(1), bdd.Or(bdd.Ithvar(0), bdd.NIthvar(2)))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ops seem wrong for the example. +
shouldn't be And
, right?
}, varMap) | ||
|
||
case *core.UsersetRewrite_Exclusion: | ||
return convertToBdd(relation, bdd, rw.Exclusion, bdd.Or, func(childIndex int, varIndex int) rudd.Node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't Exclusion be bdd.And
, i.e. it's A and not B?
internal/dispatch/dispatch.go
Outdated
return CheckRequestToKey(req) | ||
} | ||
|
||
// NOTE: canonical cache keys are only unique *within* a version of a namespace, so they used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar
internal/dispatch/caching/caching.go
Outdated
return nil, err | ||
} | ||
|
||
requestKey = dispatch.CheckRequestToKeyWithPossibleCanonical(req, relation.CanonicalCacheKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no canonical cache key, this method calls requestKey to re-compute the key for the check request. Seems wasteful to do it twice...
// computePermissionAliases computes a map of aliases between the various permissions in a | ||
// namespace. A permission is considered an alias if it *directly* refers to another permission | ||
// or relation without any other form of expression. | ||
func computePermissionAliases(typeSystem *ValidatedNamespaceTypeSystem) (map[string]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect this to save us enough time to be worth it? If there were more than one alias of the same relation or permission, they would share a cache key and only 1/n calls would actually be invoked. Am I missing something that makes this worth adding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where there isn't any caching yet, and where there is a chain of aliasing (2-3 at least), it absolutely could save time because we'd be redispatching from node to node for each aliasing.
internal/namespace/aliasing.go
Outdated
} | ||
|
||
// Otherwise, add the permission to the working set. | ||
workingSet[rel.Name] = aliasedPermOrRel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this unresolvedAliases
or something? workingSet
could mean anything.
b89560f
to
9bb0db2
Compare
9bb0db2
to
b6b18dc
Compare
Rebased and updated for |
3f0c9c5
to
f511335
Compare
f511335
to
0c70e8c
Compare
d68cfe1
to
ec59897
Compare
@ecordell Updated as discussed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As discussed offline: we'll want this in a release before issuing a migration that re-writes existing namespaces.
56c142c
to
59f53ea
Compare
// | ||
// For example, for the namespace: | ||
// definition somenamespace { | ||
// relation first: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this line has different spacing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…calized cache key
…red resource and subject type
…tate all namespaces
Also makes use of a new key handler interface in both caching and dispatching
59f53ea
to
dd3bfdf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add support for aliasing and cache canonicalization on dispatch.
Aliasing means that any permission which refers directly to another permission or relation will be marked as an "alias" of that permission/relation, and dispatch will skip the unnecessary intermediate step(s).
Cache canonicalization generates a canonical key for each permission, where the key is shared amongst permissions on the same namespace that have the same expression. The key (if available) is then used for the cache, to ensure that permissions referencing the same expressions share the same cache.
Fixes AUTHZ-459