Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship integrity #1980

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

josephschorr
Copy link
Member

@josephschorr josephschorr commented Jul 8, 2024

NOTE: the new columns are not yet implemented outside of memdb and CRDB, so those tests will fail

Addresses: #1953

@josephschorr josephschorr requested a review from a team as a code owner July 8, 2024 21:23
@josephschorr josephschorr marked this pull request as draft July 8, 2024 21:23
@github-actions github-actions bot added area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jul 8, 2024
colRelation = "relation"
colUsersetNamespace = "userset_namespace"
colUsersetObjectID = "userset_object_id"
colUsersetRelation = "userset_relation"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put a new line here to keep all the caveat stuff together

)

func init() {
err := CRDBMigrations.Register("add-integrity-columns", "add-relationship-counters-table", addIntegrityColumns, noAtomicMigration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a must register? i swear it's been written at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, we never added it. We could, but this seems fine

@@ -36,6 +36,9 @@ var (
colUsersetRelation,
colCaveatContextName,
colCaveatContext,
colIntegrityKeyID,
colIntegrityHash,
colTimestamp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the timestamp if we have the key id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to ensure that if a key is expired, we can still accept relationships signed from before the expiration time

Comment on lines 105 to 123
var sb strings.Builder
sb.WriteString(tpl.ResourceAndRelation.Namespace)
sb.WriteString(":")
sb.WriteString(tpl.ResourceAndRelation.ObjectId)
sb.WriteString("#")
sb.WriteString(tpl.ResourceAndRelation.Relation)
sb.WriteString("@")
sb.WriteString(tpl.Subject.Namespace)
sb.WriteString(":")
sb.WriteString(tpl.Subject.ObjectId)
sb.WriteString("#")
sb.WriteString(tpl.Subject.Relation)

if tpl.Caveat != nil && tpl.Caveat.CaveatName != "" {
sb.WriteString(" with ")
sb.WriteString(tpl.Caveat.CaveatName)
sb.WriteString(":")
sb.WriteString(tpl.Caveat.Context.String())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something we can put in the tuple package? maybe something similar already exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could put it into there

Comment on lines +147 to +164
func (r *relationshipIntegrityProxy) CheckRevision(ctx context.Context, revision datastore.Revision) error {
return r.ds.CheckRevision(ctx, revision)
}

func (r *relationshipIntegrityProxy) Close() error {
return r.ds.Close()
}

func (r *relationshipIntegrityProxy) Features(ctx context.Context) (*datastore.Features, error) {
return r.ds.Features(ctx)
}

func (r *relationshipIntegrityProxy) HeadRevision(ctx context.Context) (datastore.Revision, error) {
return r.ds.HeadRevision(ctx)
}

func (r *relationshipIntegrityProxy) OptimizedRevision(ctx context.Context) (datastore.Revision, error) {
return r.ds.OptimizedRevision(ctx)
}

func (r *relationshipIntegrityProxy) ReadyState(ctx context.Context) (datastore.ReadyState, error) {
return r.ds.ReadyState(ctx)
}

func (r *relationshipIntegrityProxy) RevisionFromString(serialized string) (datastore.Revision, error) {
return r.ds.RevisionFromString(serialized)
}

func (r *relationshipIntegrityProxy) Statistics(ctx context.Context) (datastore.Stats, error) {
return r.ds.Statistics(ctx)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like to make functions like these 1 single line so they all stack together without newlines between. it makes it more clear that they're just boilerplate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried but go is reformatting it back to this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely because the receiver is named relationshipIntegrityProxy and thus its making the lines too long. Shall I rename the struct?

currentKeyHMAC := hmacConfig{
keyID: currentKey.ID,
expiredAt: currentKey.ExpiredAt,
hmacPool: &sync.Pool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it depends, but for tiger I found that a buffered channel performed better than sync.Pool (select on retrieving from the pool to return a new instance if the pool is empty, similar to sync.Pool)

keyID: key.ID,
expiredAt: key.ExpiredAt,
hmacPool: &sync.Pool{
New: func() any { return hmac.New(alg, key.Bytes) },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably worth benchmarking sha3 here, since you can use it as a MAC directly instead of through an HMAC

Copy link
Contributor

@vroldanbet vroldanbet Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also learned recently about https://github.com/google/highwayhash (and its Go native impl, https://github.com/minio/highwayhash), theoretically 1 order of magnitude faster than SHA256 thanks to SIMD. The go implementation says:

HighwayHash is not a general-purpose cryptographic hash function (such as Blake2b, SHA-3 or SHA-2) and should not be used if strong collision resistance is required.

But the reference C++ implementation from Google says:

Given a strong hash function and secret seed, it appears infeasible for attackers to generate hash collisions because s and/or R are unknown. However, they can still observe the timings of data structure operations for various m. With typical table sizes of 2^10 to 2^17 entries, attackers can detect some 'bin collisions' (inputs mapping to the same bin). Although this will be costly for the attacker, they can then send many instances of such inputs, so we need to limit the resulting work for our data structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some prelim tests and highwayhash is faster by about ~50% while SHA3 seemed to be slower.

However, I'm concerned about the lack of strong collision resistance. We'll need to investigate the implications of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated branch with the highwayhash impl for testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario this attempts to protect from is unauthorized modification of the state of the database. This requires sophisticated attacks and computing running for days to find a collision that could reveal the seed. If someone has access to the database and the machinery to run such an attack, the hash is not going to get in the way honestly, they probably have enough incentives to find other ways. I don't think remote attacks, with all the level of indirections, can give a clear signal to an attacker.

This is a decision the customer could themselves make and we could make it configurable via a flag. If the customer considers only collision-resistant hashes acceptable and are willing to accept the tradeoff in computational cost, then they can choose to do so. Highwayhash is by no means a "weak" hash, but one that makes the attack cost be sufficiently high to be unfeasible to most attackers.

By contrast, 'strong' hashes such as SipHash or HighwayHash require infeasible attacker effort to find a hash collision (an expected 2^32 guesses of m per the birthday paradox) or recover the seed (2^63 requests). These security claims assume the seed is secret. It is reasonable to suppose s is initially unknown to attackers, e.g. generated on startup or even per-connection.

Consider the authors describe how complicated an attack is:

A timing attack by Wool/Bar-Yosef recovers 13-bit seeds by testing all 8K possibilities using millions of requests, which takes several days (even assuming unrealistic 150 us round-trip times). It appears infeasible to recover 64-bit seeds in this way.

@github-actions github-actions bot added the area/dependencies Affects dependencies label Jul 11, 2024
type relationshipIntegrityProxy struct {
ds datastore.Datastore
primaryKey hmacConfig
keysByID map[string]hmacConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's usually a good idea to encode the hash algorithm/parameters in the key id for later retrieval (password hashing usually does this for example) so that the algorithm / parameters can be rotated alongside the keys themselves.

I.e. it's sha256 now but we decide we need to rotate to sha512 or sha3 etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we have the algorithm version byte; we can map from that to the alg used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datastore Affects the storage system area/dependencies Affects dependencies area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants