Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSI integration #25

Open
bcebere opened this issue Jun 7, 2020 · 2 comments
Open

PSI integration #25

bcebere opened this issue Jun 7, 2020 · 2 comments
Assignees
Labels
Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues

Comments

@bcebere
Copy link
Member

bcebere commented Jun 7, 2020

Description

The current PSI implementation supports 2 operations:

  • Get intersection elements.
  • Get Intersection size.

In order to finish the PSI integration, we need:

On completion, it should be straightforward to integrate PIR in https://github.com/OpenMined/PSI

The current PSI flow:

1. Setup phase

The server encrypts all its elements x under a commutative encryption scheme, computing H(x)^s where s is its secret key. The encrypted elements are then inserted in a Bloom filter, which is sent to the client encoded as JSON. The message has the following form:

{
  "num_hash_functions": <int>,
  "bits": <string>
}

Here, bits is a Base64-encoded string.

2. Client request

The client encrypts all their elements x using the commutative encryption scheme, computing H(x)^c, where c is the client's secret key. The encoded elements are sent to the server as a JSON array of Base64 strings, together with a boolean reveal_intersection that indicates whether the client wants to learn the elements in the intersection or only its size.

{
  "reveal_intersection": <bool>,
  "encrypted_elements": [ Base64(H(x_1)^c), Base64(H(x_2)^c), ... ]
}

3. Server response

For each encrypted element H(x)^c received from the client, the server encrypts it again under the commutative encryption scheme with its secret key s, computing (H(x)^c)^s = H(x)^(cs). The result is sent back to the client as a JSON array of strings:

{
  "encrypted_elements": [ Base64(H(x_1)^c), Base64(H(x_2)^c), ... ]
}

If reveal_intersection is false, the array is sorted to hide the order of entries from the client.

4. Client computes intersection

The client decrypts each element received from the server's response using its secret key c, computing (H(x)^(cs))^(1/c) = H(x)^s. It then checks if each element is present in the Bloom filter, and reports the number of matches as the intersection size.

@bcebere bcebere added the Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues label Jun 7, 2020
@bcebere bcebere self-assigned this Jun 7, 2020
@kshehata
Copy link
Contributor

kshehata commented Jun 9, 2020

A couple of thoughts here:

  1. I don't think we should do cardinality within PIR. Whatever integrates PIR should do it on that end.
  2. For our current use case, I think that bucketing might be better than directly supporting sparse databases. The latter will require cuckoo hashing or something similar, which is inefficient. See Support bucketing in PIR Database #27

@bcebere
Copy link
Member Author

bcebere commented Jun 9, 2020

A couple of thoughts here:

  1. I don't think we should do cardinality within PIR. Whatever integrates PIR should do it on that end.
  2. For our current use case, I think that bucketing might be better than directly supporting sparse databases. The latter will require cuckoo hashing or something similar, which is inefficient. See Support bucketing in PIR Database #27
  1. The cardinality is useful for not revealing the actual database content to the user. This should be done on server side, as in current PSI implementation.
  2. As far as I understand the bucketing approach inflicts restrictions over the database content/indexes, and we might want a generic solution for that.
Since tokens are expected to have a uniform distribution (both before and after
transformation), tokens should be uniformly distributed across shards and buckets. As such, the
bucket addresses can simply be the first log2
(nshardsnbuckets) most significant bits of the transformed
token itself.

I don't think we should enforce this assumption in a generic purpose library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Epic 🤙 Describes a large amount of functionality that will likely be broken down into smaller issues
Projects
None yet
Development

No branches or pull requests

2 participants