Skip to content
Karan Bedi edited this page Dec 14, 2018 · 1 revision

Problem

When a user first signs up for a chat application, it is useful if the user can see which of his or her contacts have already signed up and so which of their contacts the user can message without having to send an invite. Such a feature may impact virality/adoption significantly. This functionality is typically implemented by uploading a users contacts to the server where phone number/email info is compared with registered user accounts in order to then inform the user which contacts are available to chat with on the app. This uploading of a user's contacts database violates GDPR b/c the contacts themselves have not had a chance to provide consent for their information to be used.

Proposed Solution Summary

Instead of sending contact info in the raw, the client sends weak hashes of the contact info. These weak hashes are small enough such that they do not uniquely identify the contact. The server responds by indicating which weak hashes match existing chat users who have consented to being discoverable through this mechanism. For each such match, the server provides a strong hash (which is PII, but of a consenting user) of the chat user's contact info. The client compares these strong hashes to their phone contacts to determine which contacts are discoverable chat users.

Example and Algorithm Details

User's Phone Contacts

Name Email Phone
Johny j@ex.com +918888888888
Jane d +919999999999

Chat User Database

Username Name Email Phone
jhn Johny johny@example.com +918888888888

Algorithm

  1. The chat client performs a weak hash on each phone number and email address in the user's contacts, and sends each token to the server. In this example, let's say:
    1. j@ex.com maps to weak hash 0a4b91
    2. +918888888888 maps to weak hash 1e83e
    3. +919999999999 maps to weak hash c11a42
  2. The server which stores a mapping weak hash -> list[(strong hash, username)] , responds with strong hashes & usernames corresponding to each of the weak hashes. In this case, union of strong hashes corresponding to 0a4b91, 1e83e, c11a42 and usernames corresponding to them will be sent back. Among this union, there will be one strong hash which corresponds to +918888888888.
  3. The client performs a strong hash on each phone number and email that it sent, and checks if it belongs to the returned set of strong hashes, and realises that for its contact "Johnny", there is a discoverable chat user having a matching phone number with username jhn. The result is persisted in DB.

Implementation Details

  • Strong hash algorithm : SHA1
  • Weak hash algorithm : substring of SHA1 of length 4 starting at position 3.

Payload Calculation

  • sw : Size of weak hash in bytes
  • ss : Size of strong hash in bytes
  • su : Average size of a username in bytes
  • n : Total number of users on the server
  • a : Average number of phone+email per user
  • f : The fraction of users who marked themselves as discoverable
  • u : Number of users on android client
  • b : Fraction of client's contacts who have account on RC which is marked as discoverable
  • p : Possible number of weak hashes
  • y : Probability that a contact not having account on RC matches with colliding cases.
  • d : Number of discoverable users on server = (n x f)
  • c : Average number of collision per weak hash = max ( (d x a)/p, 1 )
  • Approx. request payload : u x sw
  • Approx. response payload : u x (b + (1-b) x y ) x c x (sw + ss + su)

Example calculation

  • sw : 6 bytes
  • ss : 40 bytes
  • su : 10 bytes
  • n : 100000
  • a : 2.5
  • f : 0.8
  • u : 1000
  • b : 0.1
  • p : 16^4
  • d : 80000
  • c : 3.05
  • y : (c-1) x (d/(total number of possible phone+email)) taken as 0 for calculations
  • Approx. request payload : 6000 bytes
  • Approx. response payload : 17080 bytes

Flows

Server side

Client side

Clone this wiki locally