#Suspicious person list Private Set Intersection using Homomorphic Encryption

In this notebook we are going to demonstrate a simple experiment on the use case suspicious person list problem.

In [None]:
#Install homomorphic encryption package
!pip install phe

Collecting phe
  Downloading phe-1.5.0-py2.py3-none-any.whl (53 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.7/53.7 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: phe
Successfully installed phe-1.5.0


We will upload suspects list synthetic dataset generated by Brett Youngman.

In [None]:
#Check first 5 rows suspect list dataset
import pandas as pd
df = pd.read_csv("/content/suspects.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Name,DOB,ID,Address,Country
0,0,Vanessa Gallagher,1948-04-11,36932642,"81505 Brandon Ways\nWest Francisco, AR 75131",Qatar
1,1,Michelle Robinson,1914-06-04,43865869,"2842 Page Square Suite 853\nCassandrahaven, HI...",Kuwait
2,2,Jessica Beasley,1926-02-12,39914562,"32291 Yesenia View Suite 026\nHoffmanberg, AR ...",Mozambique
3,3,Carolyn Gibson,1951-11-27,65637513,Unit 9582 Box 6765\nDPO AE 01099,Saint Vincent and the Grenadines
4,4,Samuel Buckley,1952-04-30,67370992,7654 Garcia Station Suite 147\nSouth Stephensh...,Tunisia


In [None]:
#Check dataset information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  100 non-null    int64 
 1   Name        100 non-null    object
 2   DOB         100 non-null    object
 3   ID          100 non-null    int64 
 4   Address     100 non-null    object
 5   Country     100 non-null    object
dtypes: int64(2), object(4)
memory usage: 4.8+ KB


In [None]:
#Check the last 5 rows
df.tail()

Unnamed: 0.1,Unnamed: 0,Name,DOB,ID,Address,Country
95,95,John Knight,1927-08-09,46283837,"1172 Jake Brooks Apt. 141\nPort Brianmouth, AS...",Colombia
96,96,Paige Wang,1909-03-31,56348757,Unit 8248 Box 4403\nDPO AA 28808,Norfolk Island
97,97,Danielle Wells,1952-12-04,55347633,"568 Clark Hill Suite 706\nEstradastad, LA 62456",Vietnam
98,98,Kelly Riley,1993-06-18,16089513,"437 Marie Mews\nLake Davidburgh, IL 11733",Paraguay
99,99,Dalton White,1938-11-10,34610753,"20833 Justin Parkways\nSouth Dana, TX 15223",San Marino


In [None]:
#import libraries
import pandas as pd
from phe import paillier
import hashlib

# Key Generation
public_key, private_key = paillier.generate_paillier_keypair()

# Data Encryption
def encrypt_data(data, public_key):
    encrypted_data = []
    for x in data:
        encrypted_data.append(public_key.encrypt(int(hashlib.sha256(str(x).encode()).hexdigest(), 16)))
    return encrypted_data

# Intersection Computation
def compute_intersection(enc_data1, enc_data2):
    intersection = []
    for x in enc_data1:
        for y in enc_data2:
            if private_key.decrypt(x) == private_key.decrypt(y):
                intersection.append(x)
                break
    return intersection

# Result Decryption
def decrypt_result(intersection, private_key):
    decrypted_result = [private_key.decrypt(x) for x in intersection]
    return decrypted_result

# Read CSV file
def read_csv_file(filename):
    df = pd.read_csv(filename)
    return df

# File path to your CSV
csv_filename = "/content/suspects.csv"

# Read CSV file
df = read_csv_file(csv_filename)

# Get ID from user input
id_to_check = input("Check Customer ID: ")

try:
    id_to_check = int(id_to_check)
    encrypted_row = df[df['ID'] == id_to_check]
    if not encrypted_row.empty:
        encrypted_row_data = encrypted_row.values.flatten().tolist()
        print("Encrypted Row Customer ID found in the suspect list:")
        print(encrypt_data(encrypted_row_data, public_key))

        # Decrypt encrypted row
        decrypted_row = decrypt_result(encrypt_data(encrypted_row_data, public_key), private_key)

    else:
        print("ID not found in the suspect list.")
except ValueError:
    print("Please enter a valid integer for ID.")


Check Customer ID: 56348757
Encrypted Row Customer ID found in the suspect list:
[<phe.paillier.EncryptedNumber object at 0x78d307bcfd60>, <phe.paillier.EncryptedNumber object at 0x78d33673cca0>, <phe.paillier.EncryptedNumber object at 0x78d307bced40>, <phe.paillier.EncryptedNumber object at 0x78d307bcfbb0>, <phe.paillier.EncryptedNumber object at 0x78d307bcfaf0>, <phe.paillier.EncryptedNumber object at 0x78d307bcfbe0>]


#Conclusion

This program will allow us to check if a customer is on the suspect list. While keeping their personal information private. It takes two lists of encrypted data and computes the intersection of the decrypted values using the private key. The general idea behind private set intersection is that each party encrypts their set of data in such a way that the other party cannot decrypt it directly. Then, they perform a cryptographic protocol that allows them to compute the intersection of their encrypted sets without revealing anything about the individual elements. In this simple we do not complete implement this method and further improvement is needed.