-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relationship integrity #1980
base: main
Are you sure you want to change the base?
Relationship integrity #1980
Conversation
internal/datastore/crdb/crdb.go
Outdated
colRelation = "relation" | ||
colUsersetNamespace = "userset_namespace" | ||
colUsersetObjectID = "userset_object_id" | ||
colUsersetRelation = "userset_relation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe put a new line here to keep all the caveat stuff together
) | ||
|
||
func init() { | ||
err := CRDBMigrations.Register("add-integrity-columns", "add-relationship-counters-table", addIntegrityColumns, noAtomicMigration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a must register? i swear it's been written at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, we never added it. We could, but this seems fine
@@ -36,6 +36,9 @@ var ( | |||
colUsersetRelation, | |||
colCaveatContextName, | |||
colCaveatContext, | |||
colIntegrityKeyID, | |||
colIntegrityHash, | |||
colTimestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the timestamp if we have the key id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to ensure that if a key is expired, we can still accept relationships signed from before the expiration time
var sb strings.Builder | ||
sb.WriteString(tpl.ResourceAndRelation.Namespace) | ||
sb.WriteString(":") | ||
sb.WriteString(tpl.ResourceAndRelation.ObjectId) | ||
sb.WriteString("#") | ||
sb.WriteString(tpl.ResourceAndRelation.Relation) | ||
sb.WriteString("@") | ||
sb.WriteString(tpl.Subject.Namespace) | ||
sb.WriteString(":") | ||
sb.WriteString(tpl.Subject.ObjectId) | ||
sb.WriteString("#") | ||
sb.WriteString(tpl.Subject.Relation) | ||
|
||
if tpl.Caveat != nil && tpl.Caveat.CaveatName != "" { | ||
sb.WriteString(" with ") | ||
sb.WriteString(tpl.Caveat.CaveatName) | ||
sb.WriteString(":") | ||
sb.WriteString(tpl.Caveat.Context.String()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something we can put in the tuple package? maybe something similar already exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could put it into there
func (r *relationshipIntegrityProxy) CheckRevision(ctx context.Context, revision datastore.Revision) error { | ||
return r.ds.CheckRevision(ctx, revision) | ||
} | ||
|
||
func (r *relationshipIntegrityProxy) Close() error { | ||
return r.ds.Close() | ||
} | ||
|
||
func (r *relationshipIntegrityProxy) Features(ctx context.Context) (*datastore.Features, error) { | ||
return r.ds.Features(ctx) | ||
} | ||
|
||
func (r *relationshipIntegrityProxy) HeadRevision(ctx context.Context) (datastore.Revision, error) { | ||
return r.ds.HeadRevision(ctx) | ||
} | ||
|
||
func (r *relationshipIntegrityProxy) OptimizedRevision(ctx context.Context) (datastore.Revision, error) { | ||
return r.ds.OptimizedRevision(ctx) | ||
} | ||
|
||
func (r *relationshipIntegrityProxy) ReadyState(ctx context.Context) (datastore.ReadyState, error) { | ||
return r.ds.ReadyState(ctx) | ||
} | ||
|
||
func (r *relationshipIntegrityProxy) RevisionFromString(serialized string) (datastore.Revision, error) { | ||
return r.ds.RevisionFromString(serialized) | ||
} | ||
|
||
func (r *relationshipIntegrityProxy) Statistics(ctx context.Context) (datastore.Stats, error) { | ||
return r.ds.Statistics(ctx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like to make functions like these 1 single line so they all stack together without newlines between. it makes it more clear that they're just boilerplate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried but go is reformatting it back to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely because the receiver is named relationshipIntegrityProxy
and thus its making the lines too long. Shall I rename the struct?
currentKeyHMAC := hmacConfig{ | ||
keyID: currentKey.ID, | ||
expiredAt: currentKey.ExpiredAt, | ||
hmacPool: &sync.Pool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure it depends, but for tiger I found that a buffered channel performed better than sync.Pool (select
on retrieving from the pool to return a new instance if the pool is empty, similar to sync.Pool)
keyID: key.ID, | ||
expiredAt: key.ExpiredAt, | ||
hmacPool: &sync.Pool{ | ||
New: func() any { return hmac.New(alg, key.Bytes) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably worth benchmarking sha3 here, since you can use it as a MAC directly instead of through an HMAC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also learned recently about https://github.com/google/highwayhash (and its Go native impl, https://github.com/minio/highwayhash), theoretically 1 order of magnitude faster than SHA256 thanks to SIMD. The go implementation says:
HighwayHash is not a general-purpose cryptographic hash function (such as Blake2b, SHA-3 or SHA-2) and should not be used if strong collision resistance is required.
But the reference C++ implementation from Google says:
Given a strong hash function and secret seed, it appears infeasible for attackers to generate hash collisions because s and/or R are unknown. However, they can still observe the timings of data structure operations for various m. With typical table sizes of 2^10 to 2^17 entries, attackers can detect some 'bin collisions' (inputs mapping to the same bin). Although this will be costly for the attacker, they can then send many instances of such inputs, so we need to limit the resulting work for our data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some prelim tests and highwayhash is faster by about ~50% while SHA3 seemed to be slower.
However, I'm concerned about the lack of strong collision resistance. We'll need to investigate the implications of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated branch with the highwayhash impl for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario this attempts to protect from is unauthorized modification of the state of the database. This requires sophisticated attacks and computing running for days to find a collision that could reveal the seed. If someone has access to the database and the machinery to run such an attack, the hash is not going to get in the way honestly, they probably have enough incentives to find other ways. I don't think remote attacks, with all the level of indirections, can give a clear signal to an attacker.
This is a decision the customer could themselves make and we could make it configurable via a flag. If the customer considers only collision-resistant hashes acceptable and are willing to accept the tradeoff in computational cost, then they can choose to do so. Highwayhash is by no means a "weak" hash, but one that makes the attack cost be sufficiently high to be unfeasible to most attackers.
By contrast, 'strong' hashes such as SipHash or HighwayHash require infeasible attacker effort to find a hash collision (an expected 2^32 guesses of m per the birthday paradox) or recover the seed (2^63 requests). These security claims assume the seed is secret. It is reasonable to suppose s is initially unknown to attackers, e.g. generated on startup or even per-connection.
Consider the authors describe how complicated an attack is:
A timing attack by Wool/Bar-Yosef recovers 13-bit seeds by testing all 8K possibilities using millions of requests, which takes several days (even assuming unrealistic 150 us round-trip times). It appears infeasible to recover 64-bit seeds in this way.
… they are written, and matches hashes when they are read
7ed1e6d
to
2308290
Compare
type relationshipIntegrityProxy struct { | ||
ds datastore.Datastore | ||
primaryKey hmacConfig | ||
keysByID map[string]hmacConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's usually a good idea to encode the hash algorithm/parameters in the key id for later retrieval (password hashing usually does this for example) so that the algorithm / parameters can be rotated alongside the keys themselves.
I.e. it's sha256 now but we decide we need to rotate to sha512 or sha3 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why we have the algorithm version byte; we can map from that to the alg used
NOTE: the new columns are not yet implemented outside of memdb and CRDB, so those tests will fail
Addresses: #1953