<a href="https://colab.research.google.com/github/HanSeoulOh/mathproblems/blob/master/privacyMath.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysis of Privacy Techniques

The purpose of this notebook is to analyze the performance of various privacy techniques.

##### Helpers

In [0]:
import numpy as np
import pandas as pd
import seaborn as sns

### Naive Dataset 1

We will generate a naive dataset that consists of rows of tuples in the following format: $(id_1, id_2, amt, des)$

Where: 
- $id_1$ is the sender
- $id_2$ is the receiver
- $amt$ is the amount transferred
- $des$ is a description

For purposes of simplicity we will start with $amt = 1$ and $des$ being a random real number from $\mathbb{R}^{[0,1]}$; $id_1, id_2 \in \mathbb{Z}$


array([[491, 790, 468, 850, 584, 875, 173, 193, 432, 955, 331, 914, 825,
        707, 201, 141, 328, 239, 747,  95, 618, 570, 882, 203, 209, 521,
        824, 128, 662, 268, 167, 803, 596, 650, 256, 683, 395, 782, 484,
        245, 734, 387, 486, 424, 543, 119, 436,  60, 447,   4, 876, 889,
        619, 375, 559, 464, 497, 121, 362,  43,   9, 851, 584,  62, 649,
        314, 144, 834, 889, 101, 685, 483,   0, 237, 233, 350, 833, 455,
        308, 233, 216,  74, 230, 281, 923, 737, 198, 131, 570, 198, 603,
        497,  49, 385, 208, 641, 363,  49, 641, 930],
       [302, 710, 349, 389, 232,   7, 128, 349, 474, 705, 665, 809, 876,
        791, 250, 253, 322, 219, 379, 721, 124, 396, 617, 855, 893, 116,
        767, 355, 665, 176, 138, 335, 799, 315, 147, 688, 801, 382, 574,
        398, 334, 184, 405, 990, 137, 571, 202, 146, 666,  91, 180, 870,
         15, 149, 560, 499, 865, 613, 674, 607, 606,  78, 188, 678, 820,
        222, 218, 949, 780, 271, 344, 431, 535, 535, 815, 968, 409, 78

In [0]:
class naiveDataSet:
  def __init__(self, population = 1000, tx = 3000):
    self.data = pd.DataFrame(data = np.array([np.random.randint(population, size=tx), np.random.randint(population, size=tx), np.ones(tx), np.random.rand(tx)]).T, columns = ['sender', 'receiver', 'amount', 'description'])

Visualizing generated dataset

In [73]:
nds = naiveDataSet()
nds.data.head()

Unnamed: 0,sender,receiver,amount,description
0,513.0,284.0,1.0,0.718012
1,151.0,724.0,1.0,0.352845
2,466.0,885.0,1.0,0.513737
3,311.0,693.0,1.0,0.417908
4,646.0,626.0,1.0,0.165068


In [77]:
nds.data.groupby(['sender']).sum().mean()

receiver       1574.314286
amount            3.174603
description       1.587842
dtype: float64

### Homomorphic Encryption

### [Paillier Cryptosystem](https://en.wikipedia.org/wiki/Paillier_cryptosystem)

Unnamed: 0,sender,receiver,amount,description
0,134.0,606.0,1.0,0.985679
1,421.0,399.0,1.0,0.947905
2,671.0,400.0,1.0,0.260391
3,587.0,702.0,1.0,0.929521
4,815.0,443.0,1.0,0.946756
...,...,...,...,...
9995,555.0,19.0,1.0,0.205084
9996,578.0,854.0,1.0,0.061327
9997,772.0,325.0,1.0,0.896216
9998,711.0,472.0,1.0,0.054223
