# Privacy Preserving Search

## Introduction
This example demonstrates a privacy preserving search against an encrypted database and shows that you can return the matching encrypted value without learning anything about the database and without the server learning anything about your query. The database is a key value store prepopulated with the English names of countries and their capital cities from around the world. Selecting the country will use HElib to perform a search of the matching capital. BGV refers to the encryption scheme used in the demo. This specific demo will walk you through how to perform an encrypted query on an encrypted data set to find out the capital city of the country of Albania.

**NOTE: this demo implements a simple database search algorithm for demonstration purposes.**

## Use case
In a cyber attack, databases are often the most crucial infrastructure to protect. It may cost an organization a considerable fine if data were leaked and may damage a company's reputation in the process. With FHE, in the case of a breach, the database would remain encrypted at all times and can only be decrypted by the private key held by the data owner - maintaining confidentiality at all times and preventing unauthorized access to the database.

Privacy preserving search is a common scenario to demonstrate the benefits of homomorphic encryption. Being able to perform a query while preserving the privacy and confidentiality of the parameters of the query has many applications across various industry segments spanning from genomics to finance. This example uses a simple and easy to follow algorithm that demonstrates how one can use homomorphic encryption based techniques to generate a mask to retrieve data from a key-value pair database. It uses a dataset that is understandable for users across all industries and expertise levels.

With respect to realism of data, the dataset takes into account countries of the world. In a real use case, this could be information on customers or financial records for example. This is an educational example so a small dataset was needed to ensure timely responses and that it was relevant for all users.

<br>

In [14]:
def string_to_ascii(str):
    return [ord(c) for c in str]

<br>

## Step 1. Import and initialize

In [15]:
import os
import csv
import copy
import pyhelayers
import utils
print("misc. init ready")

In [16]:

db_filename = os.path.join(utils.get_data_sets_dir(), "countries", "countries.csv") # input database file name
country_name = "Albania" # country to get its capital
print("input parameters ready.")

<br>

## Step 2. Initialize FHE parameters
Note: Although we can hide them away, for demonstration purposes, we show you the parameters (e.g. cyclotomic polynomial) here. The parameters have been chosen to provide a somewhat faster running time with a non-realistic security level. Do not use these parameters in real applications.

In [17]:
conf = pyhelayers.HelibConfig()
conf.p = 127 # Plaintext prime modulus
# this will give 32 slots
conf.m = 128 # Cyclotomic polynomial - defines phi(m)
conf.r = 1 # Hensel lifting
conf.L = 1000 # Number of bits of the modulus chain
conf.c = 2 # Number of columns of Key-Switching matrix
print("configruation ready")

<br>

## Step 3. Initialize HElib BGV Context

In [18]:
utils.start_timer()
he = pyhelayers.HelibBgvContext()
he.init(conf)
print(he)
utils.end_timer("Initializing HE context")

In [19]:
assert(he.get_traits().is_modular_arithmetic)
assert(he.get_traits().arithmetic_modulus >= 127)
print ("asserts passed")

<br>

## Step 4. Read world countries database from file
The code below will make sure no string is longer than he.slot_count().

In [20]:
country_db = []
with open(db_filename, encoding="utf8") as db_file_csv:
    csv_reader = csv.reader(db_file_csv, delimiter=',')
    for row in csv_reader:
        ascii_country = string_to_ascii(row[0])
        ascii_capital = string_to_ascii(row[1])

        if len(row[0]) > he.slot_count():
            raise RunTimeError("Country name ", row[0], " too long")
        if len(row[1]) > he.slot_count():
            raise RunTimeError("Capital name ", row[1], " too long")
        country_db.append((ascii_country, ascii_capital))
print("finished reading database")

<br>

## Step 5. Encrypt the database

In [22]:
utils.start_timer()
enc = pyhelayers.Encoder(he)
encrypted_country_db = []
for country_str, capital_str in country_db:
    country_ctxt = enc.encode_encrypt(country_str)
    capital_ctxt = enc.encode_encrypt(capital_str)
    encrypted_country_db.append((country_ctxt, capital_ctxt))
utils.end_timer("Encrypting DB")

<br>

## Step 6. Encrypt the query

In [23]:
utils.start_timer()
country_name_ascii = string_to_ascii(country_name)
encrypted_query = enc.encode_encrypt(country_name_ascii)
utils.end_timer("Encrypting Query")

<br>

## Step 7. Perform the encrypted database search

In [24]:
utils.start_timer()
eval = pyhelayers.NativeFunctionEvaluator(he)
mask = []

n = he.slot_count()
is_power_of_2 = (n & (n-1) == 0)

# For every entry in the database we perform the following calculation:
for country,capital in encrypted_country_db:
    # Copy of database key: a country name
    mask_entry = country
    # Calculate the difference.
    # In each slot now we'll have 0 when characters match,
    # or non-zero when there's a mismatch.
    mask_entry.sub(encrypted_query)

    # Fermat's little theorem:
    # Since the underlying plaintext are in modular arithmetic,
    # Raising to the power of modulusP convers all non-zero values to 1.
    eval.power_in_place(mask_entry, conf.p - 1)

    # Negate the ciphertext.
    # Now we'll have 0 for match, -1 for mismatch.
    mask_entry.negate()

    # Add 1.
    # Now we'll have 0 for match, -1 for mismatch.
    mask_entry.add_scalar(1)

    # We'll now multiply all slots together, since
    # we want a complete match across all slots
        # If slot count is a power of 2 there's an efficient way to do it:
        # we'll do a rotate-and-multiply algorithm, similar to
        # a rotate-and-sum one.
    if is_power_of_2:
        rot = 1
        while rot < he.slot_count():
            tmp = copy.deepcopy(mask_entry)
            tmp.rotate(-rot)
            mask_entry.multiply(tmp)
            rot *= 2 
    else:
        # Otherwise we'll create all possible rotations, and multiply all of
        # them.
        # Note that for non powers of 2 a rotate-and-multiply algorithm
        # can still be used as well, though it's more complicated and
        # beyond the scope of this example.
        rotated_masks = pyhelayers.CTileVector([mask_entry] * he.slot_count())
        for i in range(1, he.slot_count()):
            rotated_masks[i].rotate(-i)
        mask_entry = eval.total_product(rotated_masks, he)

    # mask_entry is now either all 1s if query==country,
    # or all 0s otherwise.
    # After we multiply by capital name it will be either
    # the capital name, or all 0s.
    mask_entry.multiply(capital)
    # We collect all our findings
    mask.append(mask_entry)

# Aggregate results into a single ciphertext
value = mask[0]
for i in range(1, len(mask)):
    value.add(mask[i])
utils.end_timer("Query search")
        
    

<br>

## Step 8. Decrypt the result

In [25]:
utils.start_timer()
ascii_result = enc.decrypt_decode_int(value)
utils.end_timer("Decrypting result")

<br>

## Step 9. Print the result

In [26]:
string_result = ''.join(chr(c) for c in ascii_result)

if string_result[0] == 0x00:
    string_result = "Country name not in the database.\n*** Please make sure "
    "to enter the name of a country\n*** with the "
    "first letter in upper case."
print("\nQuery result: ", string_result)
