# I. Full text search using homorphic encryption

The use case is the following:
1. I have a string and I want to compare it againts another string. 
2. Both strings are secret and they belong to two different parties 
3. I would like to check how similar the strings are without seeing the other string

First, lets define a sentence to vector function. Lets create a dictionary of all the english letters and map the given senstence letter by letter to this dictonary. The resulting vector is a fixed size vector which contains the letters and number  of occurences of this letters in the given sentence.

In [302]:
from collections import Counter
import string
english_letters = dict.fromkeys(string.ascii_lowercase, 0)

def sentence2Vec(sentence, vocabulary = english_letters):
    cw = Counter(sentence)
    return list(map(lambda a: cw[a[0]], vocabulary.items()))

Lets test it

In [291]:
v1 = sentence2Vec("hello there")
v1

[0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 0, 2, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0]

In [292]:
v2 = sentence2Vec("hello everyone", vocabulary)
v2

[0, 0, 0, 0, 4, 0, 0, 1, 0, 0, 0, 2, 0, 1, 2, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0]

## Now we can use the fuction cosine_similarity to see how close one sentence to another sentense

In [58]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(np.array(v1).reshape(1,-1),
                  np.array(v2).reshape(1,-1))

array([[0.87197754]])

## Lets implement this fuction

In [195]:
def dot(K, L):
    if len(K) != len(L):
        return 0

    return sum(i[0] * i[1] for i in zip(K, L))

In [196]:
import math
dot(v1,v2)/( math.sqrt(dot(v1,v1)) * math.sqrt(dot(v2,v2)))

0.8719775384642696

# Lets define homorphic operations in python. I am using the homomorphic kernel I built in the past based on MS SEAL project

<b><i>(!) Please note the homomorphic encryption supports only 2 operations: addition and multiplication.</i></b>

In [280]:
import requests
baseUrl = 'http://localhost:8080/v2'

def encrypt_array(array):
    return requests.post(baseUrl + '/encrypt-array', params=dict(pubKey='alex', value=array)).json()['encrypted']
def encrypt(value):
    return requests.post(baseUrl + '/encrypt', params=dict(pubKey='alex', value=value)).json()['encrypted']
def emul(encrypted1, encrypted2):
    return requests.post(baseUrl + '/mul', params=dict(pubKey='alex', c1=encrypted1, c2=encrypted2)).json()['encrypted']
def eproduct(eArray1, eArray2):
    pairs= list(map (lambda a: dict(v1=a[0], v2=a[1]), zip(eArray1, eArray2)))
    return requests.post(baseUrl + '/product', json=dict(pubKey='alex', pairs=pairs)).json()['encrypted']
def decrypt(value):
    return requests.post(baseUrl + '/decrypt', params=dict(key='alex', value=value)).json()['value']

## Now lets encrypt two vectors

In [281]:
ev1 = encrypt_array(v1)
ev2 = encrypt_array(v2)

## Lets encrypt partial denominator
<b><i>(!) it is devided by one because homomorphic encryption does not support division. So once we invert it we will be using multiplication when computing cos</i></b>

In [282]:
pd1 = encrypt( 1./math.sqrt(dot(v1,v1)) )

## Now lets caluclate the dot product of v1 and v2

In [284]:
ev12 = eproduct(ev1, ev2)

## Calculating encrypted partial cosin without v1 distance

In [298]:
e_cos_sim_part = emul(ev12,pd1)

## Now we can decrypt the result and divide it by v1 distance to obtain the consin
(!) It was not nessesery to encrypt v1 part of denominator. We do have access to this vector and we can compute it after we decrypt the value to get cos

In [286]:
decrypt(e_cos_sim_part)/math.sqrt(dot(v1,v1))

0.8719775384642696

Violà.  We can see the result is the same as we did it without encryption.
Cos(0) = 1 the smallest angle and Cos(90)=0 for the biigest angle.
So bigger cos, more similarity between string.

# II. Database search

In [354]:
database = [
         "we walked to the store",
         "my father is an excellent cook",
         "we will have fun at the beach",
         "the brothers fought but loved each other anyway"]

In [355]:
database2vec = np.array(list(map(sentence2Vec,database)))
database2vec

array([[1, 0, 0, 1, 4, 0, 0, 1, 0, 0, 1, 1, 0, 0, 2, 0, 0, 1, 1, 3, 0, 0,
        2, 0, 0, 0],
       [2, 0, 2, 0, 4, 1, 0, 1, 1, 0, 1, 2, 1, 2, 2, 0, 0, 1, 1, 2, 0, 0,
        0, 1, 1, 0],
       [3, 1, 1, 0, 4, 1, 0, 3, 1, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 2, 1, 1,
        2, 0, 0, 0],
       [3, 2, 1, 1, 5, 1, 1, 5, 0, 0, 0, 1, 0, 1, 4, 0, 0, 3, 1, 5, 2, 1,
        1, 0, 2, 0]])

In [356]:
database2vect_w_denom = np.array(list(
    map(lambda v: (v, 1./math.sqrt(dot(v,v))),database2vec)
))
database2vect_w_denom

array([[array([1, 0, 0, 1, 4, 0, 0, 1, 0, 0, 1, 1, 0, 0, 2, 0, 0, 1, 1, 3, 0, 0,
       2, 0, 0, 0]),
        0.15811388300841897],
       [array([2, 0, 2, 0, 4, 1, 0, 1, 1, 0, 1, 2, 1, 2, 2, 0, 0, 1, 1, 2, 0, 0,
       0, 1, 1, 0]),
        0.14285714285714285],
       [array([3, 1, 1, 0, 4, 1, 0, 3, 1, 0, 0, 2, 0, 1, 0, 0, 0, 0, 0, 2, 1, 1,
       2, 0, 0, 0]),
        0.13736056394868904],
       [array([3, 2, 1, 1, 5, 1, 1, 5, 0, 0, 0, 1, 0, 1, 4, 0, 0, 3, 1, 5, 2, 1,
       1, 0, 2, 0]),
        0.08770580193070293]], dtype=object)

## Let's Encrypt the Database

In [366]:
db_encrypted = list(map(
                        lambda a: ( encrypt_array(a[0]), encrypt(a[1]) ), 
                        database2vect_w_denom
                ))

In [367]:
## Create a search function

In [368]:
def search(text, db = db_encrypted):
    # test to vector
    v = sentence2Vec(text)
    d_v = math.sqrt(dot(v,v))
    # encrypt the vector
    e_v = encrypt_array(v)
    #calcuclate encrypted product
    e_results = list(map(
        lambda db_v: emul(db_v[1], eproduct(db_v[0], e_v)) ,db_encrypted
    ))
    # decrypt the final result and normilize it to v's denominator
    return np.array(list(map(lambda a: decrypt(a)/d_v,  e_results)) )

# Lets run some searches against encrypted database
Cos(0) = 1 the smallest angle and Cos(90)=0 for the biigest angle.
So bigger cos, more similarity between string.

In [373]:
search("we walked to the store")

array([1.        , 0.76798172, 0.73843281, 0.81818279])

In [370]:
search("store")

array([0.77781746, 0.63887656, 0.36857707, 0.70601809])

In [371]:
search("brother")

array([0.63245553, 0.52380952, 0.45786855, 0.78935222])

In [372]:
search("brothers fought")

array([0.64048987, 0.54823041, 0.4978513 , 0.86015123])