Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is a hash needed for design 2 in the cuckoo filter #145

Closed
dirkx opened this issue Apr 12, 2020 · 8 comments
Closed

Is a hash needed for design 2 in the cuckoo filter #145

dirkx opened this issue Apr 12, 2020 · 8 comments
Labels
protocol Questions about the protocol/cryptography

Comments

@dirkx
Copy link

dirkx commented Apr 12, 2020

It seems that; for typical EU country sizes; the Cuckoo filter (when serialised along the lines of https://github.com/dirkx/DP-3T-Documents/blob/implementation-profile-start/implementation-profiles/profile.md) - only reveals 40-50 of the 256 bytes of H(TRUNCH128(H(seed))||i).

So quite possibly it is not required for the cuckoo filter itself to hash the keys using its own (e.g. SHA256) hash.

@burdges
Copy link

burdges commented Apr 12, 2020

If we've d days in the reveal period, then daily downloads might contain d-1 cuckoo filters, one for each possible meeting day during the reveal period. A meeting day filter in one daily download has a different seed than the same meeting day filter in a daily download on a different day. This prevents collision attacks on the filter.

It's doubt separate daily filters harm privacy since users devices could retain the meeting day information anyways.

Infected people could just upload all their individual bluetooth ids, not some seed from which you generate them. In this way, they can omit and sensitive time periods freely. As a compromise, a tree-like hash ratchet would permit them to edit the time periods but not upload everything separately: TCNCoalition/TCN#40

@lbarman lbarman added the protocol Questions about the protocol/cryptography label Apr 13, 2020
@kennypaterson
Copy link
Collaborator

Cuckoo filters and similar data structures tend to use very fast but non-cryptographic hash functions internally. In general, one cannot ensure that in any given data structure library, a cryptographic hash function such as SHA-256 (in the original post) will be used inside the Cuckoo filter. To get the privacy guarantees that we desire, an extra cryptographic hashing step is used before data is presented to the filter.

@dirkx
Copy link
Author

dirkx commented Apr 13, 2020

@ kennypaterson apologies - but the question here was a different one - and, with all due respect, quite key to bottom out on if we want Design2 used in interoperable european cross-border implementations.

And an actual implementor will need more guidance - as the CF filters need to be distributed on the wire and hence need an interoperable serialisation format. And that cannot rely on an arbitrary non-cryptographic hash.

So the two options I would enquire about are:

Option 1: use the value from the protocol (H(TRUNCH128(H(seed))||i) - i.e. those 32 bytes as the 'key' (i.e. the value of the hash in the cuckoo filter) . The cuckoo filter does simply use (part) of this hash directly, without further hashing.

Option 2: insist that the H(TRUNCH128(H(seed))||i) is hashed AGAIN with a SHA256 before it us used as the 'key' (i.e. the value of the hash in the cuckoo filter). I.e use H(H(TRUNCH128(H(seed))||i)). And the cuckoo filter does simply use (part) of this hash directly, without further hashing.

As the third - letting a CF do some ill-defined has does not make cross app/cross border/cross technology implementation stable.

In relation to this I submit that given the sizes involved (few million infected people, size of H(TRUNCH128(H(seed))||i) that for the daily CF's around 1/4 to 1/3 of that original H(TRUNCH128(H(seed))||i) has is exposed in the CF. So that it is therefore not needed to SHA256 it again.

And that this value can directly be used by the CF filter - and serialized 'raw' (e.g. as described in https://github.com/dirkx/DP-3T-Documents/blob/implementation-profile-start/implementation-profiles/profile.md and shown as rudimentary code in https://github.com/dirkx/DP-3T-Documents/tree/editable-version/impl/design-2-openssl-C).

Would you mind re-opening this ticket until we've got this resolved( or Design 2 is dropped from the proposal).

@kennypaterson
Copy link
Collaborator

We want to be able to use the CF as a blackbox with only the following guarantees: if an object is present in the CF then the CF query API will always confirm this; if not the CF query API may still confirm that it is present with small false positive rate. What happens inside the blackbox is library-dependent, and we do not want to rely on any particular behaviour that it might exhibit. This is the safe way to use an object like a CF in a security-critical application; it is consistent with a modular approach to programming; and we don't want to be delving inside a CF library to make it do things it doesn't already do.

We apply a cryptographic hash to our data before presenting to the CF "add item" API because we want additional properties of the CF; these are discussed in the whitepaper.

For your other remarks, and relating to many other issues you've filed: I am delighted that you are so enthusiastic about implementing, but I don't recommend ever trying to implement from a whitepaper. Wait for the spec and the reference implementation. They will be along soon enough.

@dirkx
Copy link
Author

dirkx commented Apr 13, 2020

@kennypaterson - with all due respect (and I personally do not like this situation either) - reality is that implementations have already started in many countries; and in some EU countries - tenders are due tomorrow - 12:00 noon with Nation size production following before the end of the month.

So to some extend this desire for detail is an un-avoidable side effect of that. Also note that the official EU guidelines are expected to be signed off tomorrow as well; carving things in stone. Wether we like it or not. But is also means that any improvement we can make in the next hours and days is now is very material. And will help keep this an open, interoperable standard. Because once 2 or 3 neighbouring countries have gone life - cross border interoperability starts driving a network effect and introduce stagnation in protocol design.

So I would really like to get as much detail here as possible. And given that in your Design 2 - the serialisation of the CF filter is required (as it needs to be distributed to the mobile device) - one could observe that that makes it part of the spec.

With that in mind - and given your answer; can I conclude that:

  1. It is required by the DP3T design 2 that the value H(TRUNCH128(H(seed))||i) is securely hashed again (and that another SHA256 is sufficient for this) prior to it being handed to any CF filter ? Regardless as to how this filter works - as it would expose too many bits of each H(TRUNCH128(H(seed))||i) ? (For DE/all of NL/BE/border-DE - my estimate is currently around 21 - expected to peak off at 50 to 60 bits of the 256 bits).

And that you will document as to why this is in the whitepaper ?

@kennypaterson
Copy link
Collaborator

The whitepaper says that H(TRUNCH128(H(seed))||i) is inserted into the CF; that already includes an "extra" hash beyond what you may have otherwise expected, namely the outer hash. No additional hashing step is done beyond this (outside of whatever hashing operations the CF does internally). Please look back to your original post and my reply and maybe this will help you understand why I wrote what I wrote in reply.

@dirkx
Copy link
Author

dirkx commented Apr 13, 2020

Thank you - Apologies for having trouble to read that in your first reply - but this reply is very clear.

We'll update our implementation profile details document to incorporate/track this.

Your quick response very much appreciated !

@kennypaterson
Copy link
Collaborator

You are very welcome - good luck!

dirkx added a commit to dirkx/DP-3T-Documents that referenced this issue Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol Questions about the protocol/cryptography
Projects
None yet
Development

No branches or pull requests

4 participants