Cache canonicalization #485

josephschorr · 2022-03-17T18:39:07Z

Add support for aliasing and cache canonicalization on dispatch.

Aliasing means that any permission which refers directly to another permission or relation will be marked as an "alias" of that permission/relation, and dispatch will skip the unnecessary intermediate step(s).

Cache canonicalization generates a canonical key for each permission, where the key is shared amongst permissions on the same namespace that have the same expression. The key (if available) is then used for the cache, to ensure that permissions referencing the same expressions share the same cache.

Fixes AUTHZ-459

ecordell · 2022-03-21T13:23:31Z

internal/dispatch/caching/caching.go

 	requestKey := dispatch.CheckRequestToKey(req)
+	if cd.nsm != nil && req.ObjectAndRelation.Namespace != req.Subject.Namespace {


instead of checking if the nsm is nil, can there just be variants of the dispatcher that can do canonicalization?

I suppose, although the only reason we have this is to remove the need for a mock namespace manager in the caching testing. Do you prefer I just mock out the nsm there?

Can we not just use a real namespace manager in the tests?

We could, but then we also need a real datastore

Maybe there's value to a "KeyProvider" abstraction, and the disptacher just takes a reference to a key provider.

When there's a datastore available, we use a key provider backed w/ a nsm so that we get the pre-calculated cache keys, but for testing we can inject a simple static keyer, etc

This would also make it easier to get the canonical keys outside of the dispatcher, though I don't have any particular use case for that in mind.

A good idea

internal/dispatch/graph/graph.go

internal/namespace/aliasing_test.go

internal/namespace/aliasing.go

internal/namespace/canonicalization.go

ecordell · 2022-03-21T14:47:26Z

internal/dispatch/caching/caching.go

 	requestKey := dispatch.CheckRequestToKey(req)
+	if cd.nsm != nil && req.ObjectAndRelation.Namespace != req.Subject.Namespace {


discussed in slack, but we should determine if this branch needs to happen for any checks to the same namespace, or if we can only worry about checks with relations to the same object (or ideally avoid it entirely, though I'm not sure that's possible)

ecordell · 2022-03-21T14:53:13Z

internal/dispatch/dispatch.go

+
+	// NOTE: canonical cache keys are only unique *within* a version of a namespace, so they used
+	// essentially as a new relation name in the cache key.
+	return fmt.Sprintf("check//canonical/%s:%s#%s@%s@%s", req.ObjectAndRelation.Namespace, req.ObjectAndRelation.ObjectId, possibleCanonicalKey, tuple.StringONR(req.Subject), req.Metadata.AtRevision)


this is arguably a separate issue entirely, but why not share cache with lookup/expand?

Because it would require a significant amount of reworking of the results from those caches (not to mention that lookup is not necessary going to have the full results set since it supports limits)

internal/dispatch/dispatch.go

ecordell · 2022-03-21T14:59:49Z

internal/dispatch/dispatch.go

+
+// CheckRequestToKeyWithPossibleCanonical converts a check request into a cache key based possibly
+// on the canonical key, or the relation name if the canonical key is empty
+func CheckRequestToKeyWithPossibleCanonical(req *v1.DispatchCheckRequest, possibleCanonicalKey string) string {


Could we unify this with CheckRequestToKey?

i.e. pass in the relation name as the canonical key if we don't have a real canonical key, and then compute the cache keys the same?

They can't have the same key as there is a non-zero possibility someone creates a relation with a name that matches the computed hash, which would result in the cache keys overlapping incorrectly. I explicitly have different prefixes to prevent that from occurring

We always compute the cache keys ourselves ahead of time, if that's a concern why not just detect it and either reject the schema or change the canonical key?

Two reasons:
(1) Its not backwards compatible and limiting the names of relations to prevent overlap is a user facing error for an internal implementation
(2) Why do so when we can simply just construct a different key to guarantee no overlap? The cache itself is still shared

Changing the canonical key if there's an existing relation with that hash (which I would be surprised if anyone, ever, hits) would also guarantee no overlap

we can pick a deterministic way to avoid conflicts: if hash(rudd) is the same as some user-provided name, calculate hash(hash(rudd)), etc

We're hashing the bdd output ourselves, and it's fnv which is much more likely to collide than a crypto hash fn

Yeah, but it is isolated to the BDD input, vs the relation and BDD. I very much do not want to mix them. Why the push to have a single cache key?

I'm just trying to push runtime decisions into compile-time (since we're making exactly this type of decision at compile time already)

I'm inclined to agree. We have perfect information at schema compile time. We can just straight up check if there are any collisions if that's a real concern.

The "upside" of the design is to remove a conditional (?) and yet the downsides are:

If we ever mess it up, the cache will return incorrect results

If there is a collision, we either have to issue an error to the user (bad) or pick a new revision (which could completely cascade forward)

I really don't see any upside of real value and major downsides in additional complexity at annotation time OR user error... plus, we'll still need the conditional anyway because we won't necessarily have a cache key for older schemas anyway

josephschorr · 2022-03-21T16:14:06Z

Updated

jakedt · 2022-03-22T14:22:24Z

internal/namespace/canonicalization.go

+// We begin by assigning a unique integer index to each relation and arrow found for all
+// expressions in the namespace:
+//   definition somenamespace {
+//	    relation first: ...


Indentation.

What's wrong?

There are extra spaces before relation and the arrow points to the middle of the word relation.

jakedt · 2022-03-22T14:23:07Z

internal/namespace/canonicalization.go

+//               ^ index 0
+//      relation second: ...
+//               ^ index 1
+//      permission someperm = second + (first - third->something)


third should probably be shown as a permission or a relation.

jakedt · 2022-03-22T14:24:35Z

internal/namespace/canonicalization.go

+//   }
+//
+// These indexes are then used with the rudd library to build the expression:
+//    someperm => `bdd.And(bdd.Ithvar(1), bdd.Or(bdd.Ithvar(0), bdd.NIthvar(2)))`


These ops seem wrong for the example. + shouldn't be And, right?

jakedt · 2022-03-22T14:26:42Z

internal/namespace/canonicalization.go

+		}, varMap)
+
+	case *core.UsersetRewrite_Exclusion:
+		return convertToBdd(relation, bdd, rw.Exclusion, bdd.Or, func(childIndex int, varIndex int) rudd.Node {


Shouldn't Exclusion be bdd.And, i.e. it's A and not B?

jakedt · 2022-03-22T19:07:05Z

internal/dispatch/dispatch.go

+		return CheckRequestToKey(req)
+	}
+
+	// NOTE: canonical cache keys are only unique *within* a version of a namespace, so they used


jakedt · 2022-03-22T21:21:18Z

internal/dispatch/caching/caching.go

+			return nil, err
+		}
+
+		requestKey = dispatch.CheckRequestToKeyWithPossibleCanonical(req, relation.CanonicalCacheKey)


If there is no canonical cache key, this method calls requestKey to re-compute the key for the check request. Seems wasteful to do it twice...

jakedt · 2022-03-22T21:47:06Z

internal/namespace/aliasing.go

+// computePermissionAliases computes a map of aliases between the various permissions in a
+// namespace. A permission is considered an alias if it *directly* refers to another permission
+// or relation without any other form of expression.
+func computePermissionAliases(typeSystem *ValidatedNamespaceTypeSystem) (map[string]string, error) {


Do we expect this to save us enough time to be worth it? If there were more than one alias of the same relation or permission, they would share a cache key and only 1/n calls would actually be invoked. Am I missing something that makes this worth adding?

In the case where there isn't any caching yet, and where there is a chain of aliasing (2-3 at least), it absolutely could save time because we'd be redispatching from node to node for each aliasing.

jakedt · 2022-03-22T21:48:39Z

internal/namespace/aliasing.go

+		}
+
+		// Otherwise, add the permission to the working set.
+		workingSet[rel.Name] = aliasedPermOrRel


Can we call this unresolvedAliases or something? workingSet could mean anything.

josephschorr · 2022-04-04T12:16:45Z

Rebased and updated for nil support

evan is the real reviewer

josephschorr · 2022-04-12T20:44:31Z

@ecordell Updated as discussed

ecordell

LGTM

As discussed offline: we'll want this in a release before issuing a migration that re-writes existing namespaces.

ecordell · 2022-04-14T17:58:03Z

internal/namespace/canonicalization.go

+//
+// For example, for the namespace:
+//   definition somenamespace {
+//	    relation first: ...


nit: this line has different spacing

ecordell

LGTM

…ermissions

…calized cache key

…red resource and subject type

…tate all namespaces

…r feedback

Also makes use of a new key handler interface in both caching and dispatching

ecordell

LGTM

josephschorr requested a review from ecordell March 17, 2022 18:39

github-actions bot added area/api v0 Affects the v0 API area/api v1 Affects the v1 API area/dependencies Affects dependencies area/dispatch Affects dispatching of requests area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Mar 17, 2022

josephschorr force-pushed the cache-canonicalization branch 3 times, most recently from 6a68f48 to 35014ab Compare March 17, 2022 22:35

jzelinskie linked an issue Mar 17, 2022 that may be closed by this pull request

Canonicalize cache keys for shared subproblems #388

Closed

josephschorr force-pushed the cache-canonicalization branch from 35014ab to 4fb4e05 Compare March 20, 2022 08:12

ecordell reviewed Mar 21, 2022

View reviewed changes

josephschorr force-pushed the cache-canonicalization branch from 4fb4e05 to 4b85d5a Compare March 21, 2022 16:01

josephschorr force-pushed the cache-canonicalization branch from ed566c0 to 1db92b8 Compare March 21, 2022 16:14

jakedt previously requested changes Mar 22, 2022

View reviewed changes

josephschorr force-pushed the cache-canonicalization branch 3 times, most recently from b89560f to 9bb0db2 Compare March 23, 2022 09:31

josephschorr force-pushed the cache-canonicalization branch from 9bb0db2 to b6b18dc Compare April 4, 2022 12:16

josephschorr force-pushed the cache-canonicalization branch from 3f0c9c5 to f511335 Compare April 6, 2022 15:51

josephschorr mentioned this pull request Apr 10, 2022

Implement a reachability graph and use for lookup #517

Merged

josephschorr force-pushed the cache-canonicalization branch from f511335 to 0c70e8c Compare April 10, 2022 18:15

josephschorr force-pushed the cache-canonicalization branch 3 times, most recently from d68cfe1 to ec59897 Compare April 12, 2022 20:44

ecordell previously approved these changes Apr 12, 2022

View reviewed changes

josephschorr dismissed ecordell’s stale review via 56c142c April 14, 2022 15:55

josephschorr force-pushed the cache-canonicalization branch 2 times, most recently from 56c142c to 59f53ea Compare April 14, 2022 16:26

authzed deleted a comment from github-actions bot Apr 14, 2022

ecordell reviewed Apr 14, 2022

View reviewed changes

ecordell previously approved these changes Apr 14, 2022

View reviewed changes

josephschorr added 13 commits April 14, 2022 14:30

Add functions for computing the aliases and canonical cache keys of p…

41ec700

…ermissions

Add alias and cache key fields to relation and add code to fill them in

2e00a53

Add namespace manager to caching dispatcher and use to get the canoni…

a6dff9c

…calized cache key

Disable canonical cache key usage and aliasing when checking on a sha…

abdd4c6

…red resource and subject type

Have the validationfile loader also validate the type system and anno…

1ab5615

…tate all namespaces

Lint fixes

2e94c97

Regenerate with the correct version

278081c

Switch annotation to require a validated type system and address othe…

4276ea5

…r feedback

Add extended description of canonicalization

b711706

Feedback adjustments

4d48253

Update canonicalization for nil support

39d8472

Tidy up go.mod

efc98fe

Change to all relations having canonical cache keys

dd3bfdf

Also makes use of a new key handler interface in both caching and dispatching

josephschorr dismissed ecordell’s stale review via dd3bfdf April 14, 2022 18:31

josephschorr force-pushed the cache-canonicalization branch from 59f53ea to dd3bfdf Compare April 14, 2022 18:31

josephschorr requested a review from a team as a code owner April 14, 2022 18:31

josephschorr enabled auto-merge April 14, 2022 18:32

ecordell approved these changes Apr 14, 2022

View reviewed changes

josephschorr merged commit d9bd4b0 into authzed:main Apr 14, 2022

github-actions bot locked and limited conversation to collaborators Apr 14, 2022

		requestKey := dispatch.CheckRequestToKey(req)
		if cd.nsm != nil && req.ObjectAndRelation.Namespace != req.Subject.Namespace {

Cache canonicalization #485

Cache canonicalization #485

Conversation

josephschorr commented Mar 17, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Mar 22, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Mar 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephschorr commented Mar 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josephschorr commented Apr 4, 2022

josephschorr commented Apr 12, 2022

ecordell left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell left a comment

Choose a reason for hiding this comment

ecordell left a comment

Choose a reason for hiding this comment

josephschorr commented Mar 17, 2022 •

edited

ecordell Mar 22, 2022 •

edited

ecordell Mar 28, 2022 •

edited

ecordell left a comment •

edited