# Exponential mechanism

In [30]:
import pandas as pd
import numpy as np
import scipy.stats as stat
adult = pd.read_csv('../../datasets/adult_with_pii.csv')
adult.dropna(subset = ['Occupation'], inplace=True) 

Consider the following queries for the Adult dataset:

- What is the most popular employment type?
- Which employment has the maximum hours of work on average?

These queries do not return any numeric values. Rather, they return an object with some property: in the first case, the object is an employment type that has the highest popularity, and in the secon case, the object, also an employment type, that has the maximum working hours.

Let's assuem that $R$ denotes the set of objects from which to return one, and $u: DxR -> R$ denotes a scoring function. The query is to return the "best" object from $R$ according to the scores of each objects. For example, if the scoring function returns the popularity of the employments, then the first query will return the employment type that the highest number of people (in the dataset) have. However, to protect privacy, instead of always returning the true most popular employment, sometimes, we might (randomly) return another job type that is not the most popular. The probability of returning an object (employment) is proportional to the score it receives from $u$, thus, the highest scored item (most popular job, in this case) also has the higest probability to be returned, but the exponential mechanism is probabilisticc and thus may also return object that is not the highest scored. Formally,

---

If $R$ is the set of objects from which to return one, and $u$ is the scoring function with sensitivity $\Delta u$, then the exponential mechanism will return an object $r\in R$ with probability proportional to $e^{\frac{\epsilon u(D,r)}{2\Delta u}}$. Concretely, the probability:

$P(M(D)=r) = \frac{e^{\frac{\epsilon u(D,r)}{2\Delta u}}} {\sum_{\bar{r}\in R}e^{\frac{\epsilon u(D,\bar{r})}{2\Delta u}}}$.

---

Two things to remember
- The exponential mechanism returns an item from $R$, not some noisy value based on $D$.
- The privacy cost of the mechanism is always $\epsilon$ and independent of the size of $R$.

In [46]:
# the set of options
R = adult.Occupation.unique()
#the scoring function, which just returns how many times an item appears in the data divided by the max count
def score(data, option):
    counts= data.value_counts()
    return counts[option]/counts.max()
score(adult.Occupation, 'Tech-support')

0.22415458937198068

In [52]:
def exponential(D, R, u, sensitivity, epsilon):
    
    scores = [u(D, r) for r in R]
    # Calculate the probability for each element, which is proportional to the score
    probs = [np.exp(epsilon * score / (2 * sensitivity)) for score in scores]
    probs = probs / np.linalg.norm(probs, ord=1) #normalize
    # return an element based on the probabilities

    return  np.random.choice(R, 1, p=probs)[0]

exponential(adult.Occupation,  R, score, 1, 1)

'Handlers-cleaners'

In [58]:
items = [exponential(adult.Occupation, R, score, 1, 1) for i in range(200)]
pd.Series(items).value_counts()

Craft-repair         19
Exec-managerial      17
Handlers-cleaners    16
Farming-fishing      16
Sales                15
Other-service        15
Prof-specialty       15
Armed-Forces         14
Tech-support         12
Transport-moving     12
Machine-op-inspct    12
Adm-clerical         11
Priv-house-serv      10
Baby                  9
Protective-serv       7
dtype: int64

In [59]:
adult.Occupation.value_counts()

Prof-specialty       4140
Craft-repair         4100
Exec-managerial      4066
Adm-clerical         3770
Sales                3650
Other-service        3295
Machine-op-inspct    2002
Transport-moving     1597
Handlers-cleaners    1370
Farming-fishing       994
Tech-support          928
Protective-serv       649
Priv-house-serv       149
Armed-Forces            9
Baby                    1
Name: Occupation, dtype: int64

From above, you can see that the number of times an item was returned roughly follows that items popularity in the dataset.