# Exercise session 3: Query-based systems

In this session, you will be given access to a private dataset through queries. You will be confronted to different noise addition mechanisms that attempt to preserve the privacy of people in the datasets, and will develop attacks to obtain private information on users. 

## The dataset

This dataset containst records for 32561 people from a census. The columns of this dataset are:
`age`, `workclass`, `fnlwgt`, `education`, `education-num`, `marital-status`, `occupation`, `relationship`, `race`, `sex`, `capital-gain`, `capital-loss`, `hours-per-week`, `native-country`, `salaryclass`. All of the columns are categorical, with values replaced by integers.

This dataset is hosted on a server (whose IP will be given to you). The different data protection mechanisms are hosted available on two different ports. The queries that you are allowed to send belong to a subset of SQL: 

$$\texttt{condition} \ \ \text{and} \ \ \ldots \ \ \text{and} \ \ \texttt{condition}$$

where `condition := columnname '<'|'>'|'<>'|'='|'<='|'>=' value`, and `value` is either an integer or an arithmetic expression that evaluates to an integer (e.g. `(2*2)`). Each of these queries returns the _count_ of users who satisfy all conditions in the query. 

In this notebook, you will try and find the `salaryclass` of some users,  using only queries. For simplicity, `salaryclass` is a binary attribute, where 0 corresponds to standard income and 1 is higher income.

In [1]:
import socket
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
def remote_query(query, host, port):
    '''This function uses a very simple socket protocol to send 
       queries to the database system.'''
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.connect((host, port))
        s.sendall(query.encode('utf-8'))
        response = s.recv(1024).decode('utf-8')
        if response.startswith('ERROR '):
            raise Exception(response[6:])
    return eval(response)

The user you are asked to "attack" (let's call this user Alex), has the following attributes:

In [3]:
# Here are the attributes in Python form ;-)

Alex = { 'age':            39,
         'workclass':       7,
         'education':       9,
         'education_num':  13,
         'marital_status':  4,
         'occupation':      1,
         'relationship':    1,
         'race':            4,
         'sex':             1,
         'capital_gain': 2174,
         'capital_loss':    0,
         'hours_per_week': 40,
         'native_country': 39 }

These attributes will constitute your _background knowledge_ on the victim.

## Exercise 1: Two simple mechanisms

In this exercise, you will break two simple mechanisms: noise addition, and query set size restriction. Each mechanism requires a different attack. Then, we will combine these mechanisms, and show that (at least in this case) they do no effectively protect the sensitive information.

All of these queries will be sent to the same server, whose IP will be provided by the TA. The first character of each query will specify which mechanism to use (but we have already taken care of this by defining convenient functions for you to use).

In [4]:
HOST  = "146.169.41.52"
PORT1 = 42420

### Exercise 1a | Noise addition

Firstly, you will interact with a very simple privacy mechanism, noise addition: the query-based system adds a noise $N \sim \mathcal{N}(0, 5)$ to the query's response. This noise is independent for each query.

The following code defines the `query_1a` function that queries this specific system.

In [5]:
# you can use a direct call to `query_1a("your query")` for exercise 1.a
query_1a = lambda query: remote_query('a'+query, HOST, PORT1)

**Step 1:** Execute the following cell several times to see how the mechanism works. Feel free to change the query.

In [6]:
query_1a('age = 32 and sex = 1')

571.080233122161

Since all you have access to are `COUNT` queries, you can't directly query the system to ask for Alex's `salaryclass`. There is a workaround, however: you can count the number of users who are Alex, _and_ have `salaryclass = 1`.

**Step 2 _(action required)_**: Write and perform a query that targets specifically Alex and tests for `salaryclass = 1`. Is the result informative?

_Your code should define a `query` variable._

In [7]:
# YOUR CODE HERE
query = '' # YOUR QUERY HERE

# ANSWER
query = ' and '.join('%s = %s' % x for x in Alex.items())
query += ' and salaryclass = 1'

# WHAT HAPPENS WHEN YOU QUERY IT?
print( query_1a(query) )

-3.9048387115905068


As you learned in the class, such a simple mechanism falls to simple _averaging attacks_. By the central limit theorem, repeating the query several times and taking the average reduces the variance of the noise, thus converging to the true answer.

**Step 3 _(action required)_**: Given the distribution of the noise, propose a simple criterion to determine Alex's value, given the noisy samples. Compute how many queries you would need to have an error probability of less than 5%. This is a paper-and-pencil exercise.

_Your answer here._

`SAMPLE ANSWER`

Let's start with some notations. Let $b \in \{0,1\}$ the true value for Alex. Each sample is thus $X_i = b + N_i \sim \mathcal{N}(b,5)$. The test we propose to use, if given $n$ results $x_i$, is to estimate Alex's value as $1$ if $\frac{1}{n} \sum_{i=1}^n x_i \geq \frac{1}{2}$ (one can show this is the maximum likelihood estimator). Denote by $\hat{B}$ this estimate. We also denote $Y = \frac{1}{n} \sum_{i=1}^n N_i \sim \mathcal{N}(0, \frac{5}{n})$, the sum of all noise. This implies that $\hat{B}$ is 1 if $b + Y \geq 1/2$ and is 0 otherwise.

There are two possible errors: predicting 1 for a 0, and vice versa: 
1. $P[\hat{B} = 1~|~b=0] = P[Y \geq 1/2]$
2. $P[\hat{B} = 0~|~b=1] = P[1 + Y \leq 1/2] = P[Y \leq -1/2]$

$Y$ is normally distributed, so by symmetry of the Gaussian these two errors are the same.

We define "error" to mean that the prediction we make is different from the true value $b$. This means:
$$P[error] = P\left[\hat{B} = 0, b = 1\right] + P\left[\hat{B} = 1, b = 0\right]$$

This means that the probability of error can be written as follows (independently of the probability distribution of $b$):
$$P[error] = P[\hat{B} = 1~|~b=0] \cdot P[b=0] + P[\hat{B} = 0~|~b=1] \cdot P[b=1] = P[Y \geq 1/2] \cdot (P[b=0] + P[b=1]) = P[Y \geq 1/2]$$

Finally, we impose that this probability is less than 5% (denote $Z \sim \mathcal{N}(0,1)$ the standard normal distribution):
$$P[Y \geq 1/2] = P[Z \geq \frac{1}{2 \sqrt{5/n}}] \leq 0.05$$

Using the CDF of the Gaussian, we find the following contraint on $n$:
$$\frac{\sqrt{n}}{2 \sqrt{5}} \geq z_{0.95} = 1.96,$$
where $z_{0.95}$ is the 95th percentile of the standard normal distribution.

And thus $n \geq 4 \cdot 1.96^2 \cdot 5 = 76.83$. Hence, **77 queries are enough**.

**Step 4 _(action required)_**: Perform the attack, using the `query` defined in step 2 and the number of queries you found in step 3.

In [8]:
# YOUR CODE HERE

# ANSWER
samples = []
for x in range( 77 ):
    samples.append(query_1a(query))

average = np.mean(samples)
decision = average >= 0.5

print('Samples collected:', samples)
print('Average:', average)
print("Alex's value is likely to be", int(decision))

Samples collected: [0.23201370544411057, -0.3015259117498421, -0.7898421997883777, -3.424961825077621, 4.953804656218172, 0.7720031376083756, -4.316016092247198, -1.3463533397882264, -0.5945770784526034, 0.39004590947729834, 0.1972707587689222, -2.518205463121084, 0.5509251775375861, -3.280021324305288, -4.917168370718214, 0.13178283963679424, 2.3049616230990977, 3.8817133507045942, 1.3437758304353806, -1.4686301505858737, 0.7681679052013052, -0.2658627364568269, -0.6017648779634516, 2.4262394692023235, 1.6548385019286274, -0.12971759658284038, -4.050078583968898, -3.6438289767780474, 3.7925478679854545, -0.9365383906513686, 0.8861672451414787, 1.326755100410307, -3.32294976065766, -2.3388721692006866, -1.4957859340659798, -3.85657558802101, -0.427986176964712, -2.012460053143236, 0.8668764252229783, 4.6900277098710905, 0.2512033586130026, 4.782554672325066, -0.0745019956869695, 1.528818912084902, -0.47023269004025753, 1.313593962612844, -3.4242734660853804, 0.8204571777999554, 4.52925

### Exercise 1b | Query set size restriction

The previous mechanism fails easily, because it is easy to target Alex directly. Another option is to _suppress every query that selects too few users_. So we implemented another privacy mechanism that enforces a query set size restriction (without noise addition). Specifically, if the user count in the response is less than **5**, the query returns __-1__.

In [9]:
# as before, you can use a direct call to `query_1b("your query")` for exercise 1.b
query_1b = lambda query: remote_query('b'+query, HOST, PORT1)

**Step 1**: what happens now, when you try your query from exercise 1.a ?

In [10]:
query_1b( query )

-1

In this exercise, you will be asked to attack Bob, whose characteristics are as follows:

In [11]:
# Here are the attributes in Python form ;-)

Bob = { 'age':            50,
        'workclass':       6,
        'education':       9,
        'education_num':  13,
        'marital_status':  2,
        'occupation':      4,
        'relationship':    0,
        'race':            4,
        'sex':             1,
        'capital_gain':    0,
        'capital_loss':    0,
        'hours_per_week': 13,
        'native_country': 39 }

It doesn't work anymore. However, it does not mean that query set size restriction is the solution for privacy. Indeed, this technique is vulnerable to _intersection attacks_. The idea of an intersection attack is to perform two queries whose answers select several users, but with only one user as the difference between them. Thus, the difference of the two queries gives the exact answer for that user.

Typically, they take the form:
1. `Query1 = (condition1 and condition2 and ...)`
2. `Query2 = Query1 and Discriminative condition`

Where `Query1` relates to many users, and `Query2` relates to the same users as `Query1` **except for the target user** (for whom the `Discriminative condition` is false). To find these queries, use the target's values, and assert that the count for `Query2` is the same as for `Query1` _minus 1_.

**Step 2  _(action required)_**: Using queries to the server, find (by trial and error) a pair `(Query1, Query2)` using the data _you know_ about Bob.

_Your code should define two variables, called `query1` and `query2`._

In [12]:
### YOUR CODE HERE
query1 = '' # YOUR QUERY HERE
query2 = '' # YOUR QUERY HERE

### ANSWER
query1 = 'age = 50 and education = 9'
query2 = query1 + ' and hours_per_week <> 13'

# print the content
print(query_1b(query1), query_1b(query2))

# we get: 103 and 102 --> Bob is the only user with (age=50, education=9, hours_per_week=13) and many users
#  have age=50 and education=9

# The idea here is that we used hours_per_week, a high entropy attribute, to discriminate Bob.

103 102


**Step 3 _(action required)_**: Using these queries, find Bob's (exact) secret value (you will need to adapt these queries to take into account the `salaryclass`).

In [13]:
### YOUR CODE HERE

### ANSWER
guess = ' and salaryclass = 1'
difference = query_1b(query1+guess) - query_1b(query2+guess)
print('my guess is: salaryclass = %d' % (difference)) # if query1 = query2, then Bob is not part of query1 or query2 --> Bob's salaryclass is 0

my guess is: salaryclass = 0


### Exercise 1c | Noise addition + Query set size restriction

Finally, we could combine these two mechanisms to have a stronger mechanism. Mechanism 1c returns either (-1) if the query concerns less than 5 users, and otherwise the true answer + independent noise according to $\mathcal{N}(0,5)$.

In [14]:
# you can use a direct call to `query_1c(your_query)` for exercise 1.
query_1c = lambda query: remote_query('c'+query, HOST, PORT1)

**Step 1**: Use your query from exercise 1a. Is the result informative?

In [15]:
query_1c(query)

-1

**Step 2**: Use `query1` and `query2` from exercise 1b. Are the results informative?

In [16]:
print(query_1c(query1), query_1c(query2))

100.92534438768934 103.44155683876646


**Step 3  _(action required)_**: Adapt your attack from exercise 1b to attack this mechanism, and (again) find Bob's secret value. Does combining the mechanisms make them stronger?

In [17]:
### YOUR CODE HERE

### ANSWER
samples = [query_1b(query1+guess) - query_1b(query2+guess) for _ in range(154)]
print('my guess is: salaryclass = %d' % (np.mean(samples) > 0.5))
# Note: we need twice as many queries because the variance of a difference of gaussians is equal to the sum of variances.

my guess is: salaryclass = 0


## Exercise 2

In this second exercise, the query-based system uses _static sticky noise_: it adds noise that depends on the conditions. That is, if a query $Q \equiv$ `C1 and C2 and C3` is issued, our mechanism adds one fixed noise value per condition. The output to Q is then:
$\newcommand{\static}{\operatorname{static}}$

$$\widetilde{Q}(D) = Q(D) + \static[\text{C1}] + \static[\text{C2}] + \static[\text{C3}],$$

where each $\static[\text{C}x]$ is a noise value drawn from a _seeded_ normal distribution. In this exercise, the mechanism draws from $\mathcal{N}(0,2)$. This means that the noise level for one condition is low, but if you have many conditions in your query, the overall noise will be larger than for the first mechanism. The mechanism is seeded such that if a same condition is used several times (say, in subsequent queries), the same noise will be used every time for that condition.

That is, when you query condition `C1`, a noise `n1` is added to the result (always the same noise), and so on. For instance, you would have the following behaviour, if you do these queries (in whatever order):
- `C1` ==> answer1 + `n1`
- `C1 and C2` ==>  answer2 + `n1` + `n2`
- `C1 and C3` ==> answer3 + `n1` + `n3`
- `C1` (again) ==> answer1 + `n1`

In practice, this is implemented by [seeding](https://en.wikipedia.org/wiki/Random_seed) the random number generator (RNG) with a hash of the condition (XORed with some salt). The details of the implementation are not important to solve the exercise, but for more details you can look at the slides from last week.

This mechanism thwarts the attack you developed in exercise 1a: repeating the query will always yield the same result.
To make the mechanism more robust, it also implements query set size restriction, and will return -1 for every query whose user set contains less than 5 users.

In [18]:
# this is running on a different port of the server!
PORT2 = 42422

In [19]:
# same as for exercise 1, the function `query_2` is defined for simplicity of use.
query_2 = lambda query: remote_query(query, HOST, PORT2)

In this exercise, you will attack Carl, whose characteristics are given below.

In [20]:
Carl = {'age':            43,
        'workclass':       4,
        'education':      15,
        'education_num':  10,
        'marital_status':  2,
        'occupation':     13,
        'relationship':    0,
        'race':            2,
        'sex':             1,
        'capital_gain':    0,
        'capital_loss':    0,
        'hours_per_week': 40,
        'native_country': 23 }

**Background knowledge**: from your expert knowledge of the situation, you know that Carl is uniquely identified by his `age`, `marital_status`, and `native_country`. Furthermore, you suspect that _many_ people in the dataset share the same `age` and `marital_status` (hence, the `native_country` seems to be a good discriminative condition).

_Why do we have to assume that the adversary knows that?_

**Step 1**: Perform the following queries. Observe that the result you obtain is the same as your neighbour's. Why is that important?

In [21]:
### try a few queries here
print('Users aged 42:  \t', query_2('age = 42'))
print('Users aged 1000:\t', query_2('age = 1000'))
print('Number of men:  \t', query_2('sex = 0'))

Users aged 42:  	 781.3844738173221
Users aged 1000:	 -1
Number of men:  	 10770.164862533984


**Step 2  _(action required)_**: Perform your attack 1c here (on Carl). Do you think the result is reliable?

In [22]:
## YOUR CODE HERE

## ANSWER
c1 = ['age', 'marital_status']
discr = 'native_country'

query1 = ' and '.join( '%s = %s' % (c, Carl[c]) for c in c1)
query2 = query1 + ' and %s <> %s' % (discr, Carl[discr])

samples = [query_2(query1+guess) - query_2(query2+guess) for _ in range(9)]
print(samples)
print('my guess is: salaryclass = %d' % (np.mean(samples) > 0.5)) # ow no it's wrong (in this particular case)

[-0.9182497779437142, -0.9182497779437142, -0.9182497779437142, -0.9182497779437142, -0.9182497779437142, -0.9182497779437142, -0.9182497779437142, -0.9182497779437142, -0.9182497779437142]
my guess is: salaryclass = 0


_Spoiler alert_ : this mechanism is still not secure. As you see, this mechanism is still vulnerable to **intersection attacks**. Even if the repeated queries return the same result, the noise on the response is actually _very low_. Why is that?

**Step 3  _(action required)_**: What is the total noise on `query_2(query1) - query_2(query2)` ? Why is it low?

_Your answer here._

`SAMPLE ANSWER`

Because `query1` and `query2` differ by only one condition, the difference of the results is thus only the noise corresponding to the `Discriminative condition`. This noise is distributed according to $\mathcal{N}(0,2)$.

While the resulting noise is indeed _small_ , the confidence on your result is low if you can only perform one query. Finding many `(query1, query2)` pairs can be difficult, especially using this more secure interface: it is hard to make sure that Carl is the only user in one query but not the other when the count you find is noisy. In this case, we assume that the pair you already have `(query1, query2)` is _background knowledge_ : something you know about the dataset that you could not figure out by using queries. 

However, there is a trick to get more queries, using this single piece of background knowledge: modifying the queries so that they are _semantically equivalent_ but _syntactically different_. For instance, `sex=0` is equivalent to `sex <> 1`. This is called a **semantic attack**.

**Step 4  _(action required)_**: How many queries do you need for a 95% confidence, given that the noise on one condition is $\mathcal{N}(0,2)$ ?

_Your answer here._

`SAMPLE ANSWER`

The difference between two queries is distributed according to $\mathcal{N}(0,2)$. The development of exercise 1a still holds. Thus $n \geq 4 \cdot 1.96^2 \cdot 2 = 30.7$. Hence, **31 queries are enough**.

**Step 5  _(action required)_**: Vary _syntactically_ the `Discriminative condition` (while keeping the same _semantic_ meaning) to obtain independent samples in the difference attack (by obtaining different expressions for `query2`), in order to have enough queries for a confidence of 95%. Perform the attack. Do you obtain the correct result?

_Hint: this SQL database supports arithmetic operations._

In [23]:
### YOUR CODE HERE

### ANSWER

# we need to find 30 queries ==> 29 equivalents to query2
query2s = [query2]
for x in range(30):
    query2_tmp = query1 + ' and native_country <> (%d + %d)' % (Carl['native_country'] - x, x)
    query2s.append(query2_tmp)

# using these equivalent samples, we can perform the difference attack
samples = [(query_2(query1 + guess) - query_2(q + guess)) for q in query2s]
print(samples)
print('my guess is: salaryclass = %d' % (np.mean(samples) > 0.5)) # yes, it works!

[-0.9182497779437142, -1.1775853129173015, 0.7095874197762555, -0.5668460242313813, 3.2787899648506595, -0.1936358564389309, 1.3724589039698571, 0.926042560092867, 1.4299838025924032, -0.06351527577209026, 0.47380789838010173, -0.09022312850507319, 1.090486626972421, 4.295033910911002, 1.6932596689377988, 1.0551009884456164, -0.5958249367825772, 2.841814828186102, 0.5022953935353769, 2.86763186750008, 3.2170199637105554, 1.9152609460104486, 2.751460954849165, 1.506039091187688, 1.2338554631363934, -0.3992219606829792, 2.6362393886879545, 0.7204307730117421, -0.7546799654964786, 1.3505038337480073, 1.243136992593378]
my guess is: salaryclass = 1


In [24]:
query1

'age = 43 and marital_status = 2'

In [25]:
query2

'age = 43 and marital_status = 2 and native_country <> 23'

### Bonus exercise: find the age of a record without background knowledge on uniqueness

In this last part, you will attack the same mechanism as for question 2, but _without information on the attributes to use_. That is, in the last part, we told you that the target was unique according to some attributes, but many other users shared all but one of these attributes. Here, you will have to figure out how to get these attributes.

Furthermore, to make it a bit harder, this time we ask you to obtain the `age` (and optionally `salaryclass`) of this record. As always, you are only allowed to query the system through the limited syntax used in this class. Let us know when you have the age (so we can confirm your result)!

In [26]:
Deborah = {'workclass':       4,
           'education':      15,
           'education_num':  10,
           'marital_status':  0,
           'occupation':      3,
           'relationship':    4,
           'race':            4,
           'sex':             0,
           'capital_gain':    0,
           'capital_loss':    0,
           'hours_per_week': 20,
           'native_country': 39 }

### A solution

Here is one way you could attack this system. Many more exist, and feel free to play with this and come up with a more efficient solution!

The key issue is that we don't know which subset of attributes can be used to uniquely pinpoint Deborah. The goal is thus to first find a set of attributes $A$ and an attribute $u$ such that $Query(A = a^{Deborah}) = C \gg 5$ and $Query(A = a^{Deborah} \land U \neq u^{Deborah}) = C - 1$.

For this, we can iterate over subsets, and use the trick from the previous exercise to _denoise_ the result sufficiently, so we can make sure that the difference between the two queries is indeed one (and so, $U \neq u^{Deborah}$ is a discriminative condition).

Then, we can simply perform $Query(A = a^{Deborah} \land Age = x)$ and $Query(A = a^{Deborah} \land Age = x \land U \neq u^{Deborah})$ for $x = 1, \dots, 100$, with _denoising_ , until we find a pair for which the difference is exactly 1 (which means that this is Deborah's age). In fact, this doesn't work as is, because $Age = x$ is a very rare condition, so the queries are bucket suppressed. Instead, we use $Age \neq x$ and check for the one query where the difference is 0.

Step 1: Iterating over subsets $(A',u)$.  **This performs a lot of queries. Can you optimise it?**

In [28]:
import itertools  # combination of subsets

attributes = list(Deborah.keys())

found = False

# Let's iterate over all subsets of attributes (in fact, we iterate over all pairs X,y).
# We iterate in increasing size (to prevent bucket suppression).
for subset_size in range(2, len(attributes)):
    if found: break
    # Iterate over all subsets of k+1 elements, for k >= 1.
    for subset in itertools.combinations(attributes, subset_size):
        if found: break
        # Iterate over all attributes (setting each as u).
        for i in range(subset_size):
            Aprime = subset[:i] + subset[i+1:]
            u = subset[i]
            # We want to do a difference attack between these queries, to check that
            # the true difference is 1.
            query_1_usr = ' and '.join('%s = %d' % (a, Deborah[a]) for a in Aprime)
            query_2_usr = query_1_usr + ' and %s <> %d' % (u, Deborah[u])
            c1 = query_2(query_1_usr)
            c2 = query_2(query_2_usr)
            diff = c1 - c2
            print('Query: [Aprime = %s, u=%s]: c1=%.3f, c2=%.3f' % (Aprime, u, c1, c2))
            # We want to check if diff ~ N(1, 2). Hence, if diff is too large, then
            #  we can reject. Otherwise, we need more samples.
            if diff > 10:  # Some arbitrary threshold (many standard deviations).
                continue
            # This is a candidate! Let's do many more queries (just to be sure).
            print('Good candidate! Denoising more.')
            q2s = [query_1_usr + ' and %s <> (%d+%d-%d)' % (u, Deborah[u], i, i) for i in range(50)]
            diffs = [diff] + [c1 - query_2(q) for q in q2s]
            diff = np.mean(diffs)
            print('Query: [Aprime = %s, u=%s]: c1=%.3f, diff=%.3f' % (Aprime, u, c1, diff))
            if c1 > 10 and abs( np.mean(diff) - 1 ) < 0.5:  # If the difference is approx ~ 1.
                print('Found!')
                found = True
                break

Query: [Aprime = ('education',), u=workclass]: c1=7290.282, c2=2197.457
Query: [Aprime = ('workclass',), u=education]: c1=22695.336, c2=17601.748
Query: [Aprime = ('education_num',), u=workclass]: c1=7291.990, c2=2199.164
Query: [Aprime = ('workclass',), u=education_num]: c1=22695.336, c2=17601.954
Query: [Aprime = ('marital_status',), u=workclass]: c1=4442.675, c2=1324.850
Query: [Aprime = ('workclass',), u=marital_status]: c1=22695.336, c2=19575.418
Query: [Aprime = ('occupation',), u=workclass]: c1=4100.794, c2=906.969
Query: [Aprime = ('workclass',), u=occupation]: c1=22695.336, c2=19500.128
Query: [Aprime = ('relationship',), u=workclass]: c1=3445.622, c2=968.796
Query: [Aprime = ('workclass',), u=relationship]: c1=22695.336, c2=20217.082
Query: [Aprime = ('race',), u=workclass]: c1=27814.877, c2=8412.052
Query: [Aprime = ('workclass',), u=race]: c1=22695.336, c2=3292.352
Query: [Aprime = ('sex',), u=workclass]: c1=10770.165, c2=3019.339
Query: [Aprime = ('workclass',), u=sex]: c1

Query: [Aprime = ('race',), u=hours_per_week]: c1=27814.877, c2=26805.127
Query: [Aprime = ('native_country',), u=race]: c1=29168.840, c2=3548.856
Query: [Aprime = ('race',), u=native_country]: c1=27814.877, c2=2195.807
Query: [Aprime = ('capital_gain',), u=sex]: c1=29848.352, c2=19700.591
Query: [Aprime = ('sex',), u=capital_gain]: c1=10770.165, c2=623.430
Query: [Aprime = ('capital_loss',), u=sex]: c1=31042.804, c2=20641.043
Query: [Aprime = ('sex',), u=capital_loss]: c1=10770.165, c2=366.228
Query: [Aprime = ('hours_per_week',), u=sex]: c1=1226.353, c2=582.592
Query: [Aprime = ('sex',), u=hours_per_week]: c1=10770.165, c2=10127.414
Query: [Aprime = ('native_country',), u=sex]: c1=29168.840, c2=19487.079
Query: [Aprime = ('sex',), u=native_country]: c1=10770.165, c2=1090.095
Query: [Aprime = ('capital_loss',), u=capital_gain]: c1=31042.804, c2=2714.069
Query: [Aprime = ('capital_gain',), u=capital_loss]: c1=29848.352, c2=1516.415
Query: [Aprime = ('hours_per_week',), u=capital_gain]:

Query: [Aprime = ('marital_status', 'hours_per_week'), u=workclass]: c1=100.028, c2=42.203
Query: [Aprime = ('workclass', 'hours_per_week'), u=marital_status]: c1=817.689, c2=757.771
Query: [Aprime = ('workclass', 'marital_status'), u=hours_per_week]: c1=3118.012, c2=3060.261
Query: [Aprime = ('marital_status', 'native_country'), u=workclass]: c1=4160.515, c2=1232.690
Query: [Aprime = ('workclass', 'native_country'), u=marital_status]: c1=20133.176, c2=17203.258
Query: [Aprime = ('workclass', 'marital_status'), u=native_country]: c1=3118.012, c2=190.942
Query: [Aprime = ('occupation', 'relationship'), u=workclass]: c1=248.416, c2=39.590
Query: [Aprime = ('workclass', 'relationship'), u=occupation]: c1=2476.958, c2=2266.750
Query: [Aprime = ('workclass', 'occupation'), u=relationship]: c1=3196.131, c2=2985.876
Query: [Aprime = ('occupation', 'race'), u=workclass]: c1=3694.672, c2=832.846
Query: [Aprime = ('workclass', 'race'), u=occupation]: c1=19402.213, c2=16539.006
Query: [Aprime = (

Query: [Aprime = ('education', 'relationship'), u=education_num]: c1=787.904, c2=-1.000
Query: [Aprime = ('education', 'education_num'), u=relationship]: c1=7291.272, c2=6502.018
Query: [Aprime = ('education_num', 'race'), u=education]: c1=6206.867, c2=-1.000
Query: [Aprime = ('education', 'race'), u=education_num]: c1=6205.160, c2=-1.000
Query: [Aprime = ('education', 'education_num'), u=race]: c1=7291.272, c2=1085.288
Query: [Aprime = ('education_num', 'sex'), u=education]: c1=2806.155, c2=-1.000
Query: [Aprime = ('education', 'sex'), u=education_num]: c1=2804.447, c2=-1.000
Query: [Aprime = ('education', 'education_num'), u=sex]: c1=7291.272, c2=4485.511
Query: [Aprime = ('education_num', 'capital_gain'), u=education]: c1=6818.342, c2=-1.000
Query: [Aprime = ('education', 'capital_gain'), u=education_num]: c1=6816.634, c2=-1.000
Query: [Aprime = ('education', 'education_num'), u=capital_gain]: c1=7291.272, c2=474.537
Query: [Aprime = ('education_num', 'capital_loss'), u=education]: 

Query: [Aprime = ('education', 'race'), u=capital_loss]: c1=6205.160, c2=247.223
Query: [Aprime = ('race', 'hours_per_week'), u=education]: c1=1012.230, c2=682.642
Query: [Aprime = ('education', 'hours_per_week'), u=race]: c1=400.635, c2=71.651
Query: [Aprime = ('education', 'race'), u=hours_per_week]: c1=6205.160, c2=5876.409
Query: [Aprime = ('race', 'native_country'), u=education]: c1=25618.717, c2=19747.129
Query: [Aprime = ('education', 'native_country'), u=race]: c1=6738.122, c2=867.138
Query: [Aprime = ('education', 'race'), u=native_country]: c1=6205.160, c2=335.090
Query: [Aprime = ('sex', 'capital_gain'), u=education]: c1=10146.517, c2=7462.929
Query: [Aprime = ('education', 'capital_gain'), u=sex]: c1=6816.634, c2=4132.873
Query: [Aprime = ('education', 'sex'), u=capital_gain]: c1=2804.447, c2=121.712
Query: [Aprime = ('sex', 'capital_loss'), u=education]: c1=10401.969, c2=7670.380
Query: [Aprime = ('education', 'capital_loss'), u=sex]: c1=7006.086, c2=4274.325
Query: [Aprim

Query: [Aprime = ('education_num', 'capital_gain'), u=relationship]: c1=6818.342, c2=6058.087
Query: [Aprime = ('education_num', 'relationship'), u=capital_gain]: c1=789.611, c2=30.877
Query: [Aprime = ('relationship', 'capital_loss'), u=education_num]: c1=3356.425, c2=2581.043
Query: [Aprime = ('education_num', 'capital_loss'), u=relationship]: c1=7007.793, c2=6231.539
Query: [Aprime = ('education_num', 'relationship'), u=capital_loss]: c1=789.611, c2=11.675
Query: [Aprime = ('relationship', 'hours_per_week'), u=education_num]: c1=117.975, c2=95.592
Query: [Aprime = ('education_num', 'hours_per_week'), u=relationship]: c1=402.343, c2=379.088
Query: [Aprime = ('education_num', 'relationship'), u=hours_per_week]: c1=789.611, c2=767.861
Query: [Aprime = ('relationship', 'native_country'), u=education_num]: c1=3031.462, c2=2306.079
Query: [Aprime = ('education_num', 'native_country'), u=relationship]: c1=6739.830, c2=6013.575
Query: [Aprime = ('education_num', 'relationship'), u=native_co

Query: [Aprime = ('marital_status', 'capital_loss'), u=relationship]: c1=4278.479, c2=2725.225
Query: [Aprime = ('marital_status', 'relationship'), u=capital_loss]: c1=1600.297, c2=45.360
Query: [Aprime = ('relationship', 'hours_per_week'), u=marital_status]: c1=117.975, c2=86.056
Query: [Aprime = ('marital_status', 'hours_per_week'), u=relationship]: c1=100.028, c2=68.774
Query: [Aprime = ('marital_status', 'relationship'), u=hours_per_week]: c1=1600.297, c2=1570.547
Query: [Aprime = ('relationship', 'native_country'), u=marital_status]: c1=3031.462, c2=1548.543
Query: [Aprime = ('marital_status', 'native_country'), u=relationship]: c1=4160.515, c2=2678.261
Query: [Aprime = ('marital_status', 'relationship'), u=native_country]: c1=1600.297, c2=120.227
Query: [Aprime = ('race', 'sex'), u=marital_status]: c1=8640.042, c2=6413.124
Query: [Aprime = ('marital_status', 'sex'), u=race]: c1=2670.840, c2=445.856
Query: [Aprime = ('marital_status', 'race'), u=sex]: c1=3795.553, c2=1569.792
Quer

Query: [Aprime = ('occupation', 'capital_loss'), u=sex]: c1=3904.598, c2=3691.837
Query: [Aprime = ('occupation', 'sex'), u=capital_loss]: c1=222.959, c2=8.023
Query: [Aprime = ('sex', 'hours_per_week'), u=occupation]: c1=645.518, c2=638.310
Good candidate! Denoising more.
Query: [Aprime = ('sex', 'hours_per_week'), u=occupation]: c1=645.518, diff=7.084
Query: [Aprime = ('occupation', 'hours_per_week'), u=sex]: c1=54.147, c2=47.386
Good candidate! Denoising more.
Query: [Aprime = ('occupation', 'hours_per_week'), u=sex]: c1=54.147, diff=7.120
Query: [Aprime = ('occupation', 'sex'), u=hours_per_week]: c1=222.959, c2=217.209
Good candidate! Denoising more.
Query: [Aprime = ('occupation', 'sex'), u=hours_per_week]: c1=222.959, diff=7.295
Query: [Aprime = ('sex', 'native_country'), u=occupation]: c1=9680.005, c2=9483.797
Query: [Aprime = ('occupation', 'native_country'), u=sex]: c1=3685.634, c2=3489.873
Query: [Aprime = ('occupation', 'sex'), u=native_country]: c1=222.959, c2=28.890
Query:

Query: [Aprime = ('capital_loss', 'hours_per_week'), u=race]: c1=1192.157, c2=212.173
Query: [Aprime = ('race', 'hours_per_week'), u=capital_loss]: c1=1012.230, c2=29.294
Query: [Aprime = ('race', 'capital_loss'), u=hours_per_week]: c1=26469.681, c2=25489.930
Query: [Aprime = ('capital_loss', 'native_country'), u=race]: c1=27790.644, c2=3442.660
Query: [Aprime = ('race', 'native_country'), u=capital_loss]: c1=25618.717, c2=1267.781
Query: [Aprime = ('race', 'capital_loss'), u=native_country]: c1=26469.681, c2=2122.611
Query: [Aprime = ('hours_per_week', 'native_country'), u=race]: c1=1125.193, c2=175.209
Query: [Aprime = ('race', 'native_country'), u=hours_per_week]: c1=25618.717, c2=24668.967
Query: [Aprime = ('race', 'hours_per_week'), u=native_country]: c1=1012.230, c2=63.160
Query: [Aprime = ('capital_gain', 'capital_loss'), u=sex]: c1=28330.156, c2=18551.395
Query: [Aprime = ('sex', 'capital_loss'), u=capital_gain]: c1=10401.969, c2=624.234
Query: [Aprime = ('sex', 'capital_gain')

Query: [Aprime = ('workclass', 'marital_status', 'sex'), u=education]: c1=1902.176, c2=1399.588
Query: [Aprime = ('workclass', 'education', 'sex'), u=marital_status]: c1=2017.783, c2=1513.865
Query: [Aprime = ('workclass', 'education', 'marital_status'), u=sex]: c1=753.294, c2=250.533
Query: [Aprime = ('education', 'marital_status', 'capital_gain'), u=workclass]: c1=1017.310, c2=300.484
Query: [Aprime = ('workclass', 'marital_status', 'capital_gain'), u=education]: c1=2936.363, c2=2218.775
Query: [Aprime = ('workclass', 'education', 'capital_gain'), u=marital_status]: c1=4773.970, c2=4055.052
Query: [Aprime = ('workclass', 'education', 'marital_status'), u=capital_gain]: c1=753.294, c2=36.559
Query: [Aprime = ('education', 'marital_status', 'capital_loss'), u=workclass]: c1=1040.761, c2=305.936
Query: [Aprime = ('workclass', 'marital_status', 'capital_loss'), u=education]: c1=2999.815, c2=2264.227
Query: [Aprime = ('workclass', 'education', 'capital_loss'), u=marital_status]: c1=4902.4

Query: [Aprime = ('workclass', 'education', 'race'), u=capital_gain]: c1=4354.496, c2=288.761
Query: [Aprime = ('education', 'race', 'capital_loss'), u=workclass]: c1=5954.963, c2=1771.138
Query: [Aprime = ('workclass', 'race', 'capital_loss'), u=education]: c1=18527.017, c2=14342.429
Query: [Aprime = ('workclass', 'education', 'capital_loss'), u=race]: c1=4902.422, c2=718.438
Query: [Aprime = ('workclass', 'education', 'race'), u=capital_loss]: c1=4354.496, c2=167.559
Query: [Aprime = ('education', 'race', 'hours_per_week'), u=workclass]: c1=330.512, c2=99.687
Query: [Aprime = ('workclass', 'race', 'hours_per_week'), u=education]: c1=676.566, c2=444.978
Query: [Aprime = ('workclass', 'education', 'hours_per_week'), u=race]: c1=280.971, c2=49.987
Query: [Aprime = ('workclass', 'education', 'race'), u=hours_per_week]: c1=4354.496, c2=4123.745
Query: [Aprime = ('education', 'race', 'native_country'), u=workclass]: c1=5868.999, c2=1768.174
Query: [Aprime = ('workclass', 'race', 'native_co

Query: [Aprime = ('workclass', 'education_num', 'marital_status'), u=native_country]: c1=755.001, c2=44.931
Query: [Aprime = ('education_num', 'occupation', 'relationship'), u=workclass]: c1=54.406, c2=10.580
Query: [Aprime = ('workclass', 'occupation', 'relationship'), u=education_num]: c1=210.752, c2=166.370
Query: [Aprime = ('workclass', 'education_num', 'relationship'), u=occupation]: c1=547.947, c2=502.740
Query: [Aprime = ('workclass', 'education_num', 'occupation'), u=relationship]: c1=670.120, c2=624.866
Query: [Aprime = ('education_num', 'occupation', 'race'), u=workclass]: c1=780.661, c2=187.836
Query: [Aprime = ('workclass', 'occupation', 'race'), u=education_num]: c1=2863.008, c2=2269.625
Query: [Aprime = ('workclass', 'education_num', 'race'), u=occupation]: c1=4356.203, c2=3761.995
Query: [Aprime = ('workclass', 'education_num', 'occupation'), u=race]: c1=670.120, c2=77.136
Query: [Aprime = ('education_num', 'occupation', 'sex'), u=workclass]: c1=50.949, c2=12.123
Query: 

Query: [Aprime = ('workclass', 'sex', 'capital_loss'), u=education_num]: c1=7504.305, c2=5533.922
Query: [Aprime = ('workclass', 'education_num', 'capital_loss'), u=sex]: c1=4904.129, c2=2933.369
Query: [Aprime = ('workclass', 'education_num', 'sex'), u=capital_loss]: c1=2019.491, c2=46.554
Query: [Aprime = ('education_num', 'sex', 'hours_per_week'), u=workclass]: c1=233.507, c2=66.682
Query: [Aprime = ('workclass', 'sex', 'hours_per_week'), u=education_num]: c1=443.854, c2=276.471
Query: [Aprime = ('workclass', 'education_num', 'hours_per_week'), u=sex]: c1=282.679, c2=114.918
Query: [Aprime = ('workclass', 'education_num', 'sex'), u=hours_per_week]: c1=2019.491, c2=1852.740
Query: [Aprime = ('education_num', 'sex', 'native_country'), u=workclass]: c1=2594.994, c2=735.169
Query: [Aprime = ('workclass', 'sex', 'native_country'), u=education_num]: c1=6923.341, c2=5062.958
Query: [Aprime = ('workclass', 'education_num', 'native_country'), u=sex]: c1=4677.166, c2=2816.405
Query: [Aprime =

Query: [Aprime = ('workclass', 'relationship', 'capital_loss'), u=marital_status]: c1=2412.761, c2=1331.843
Query: [Aprime = ('workclass', 'marital_status', 'capital_loss'), u=relationship]: c1=2999.815, c2=1919.561
Query: [Aprime = ('workclass', 'marital_status', 'relationship'), u=capital_loss]: c1=1115.633, c2=33.697
Query: [Aprime = ('marital_status', 'relationship', 'hours_per_week'), u=workclass]: c1=32.650, c2=12.824
Query: [Aprime = ('workclass', 'relationship', 'hours_per_week'), u=marital_status]: c1=76.311, c2=54.393
Query: [Aprime = ('workclass', 'marital_status', 'hours_per_week'), u=relationship]: c1=60.364, c2=39.110
Query: [Aprime = ('workclass', 'marital_status', 'relationship'), u=hours_per_week]: c1=1115.633, c2=1095.883
Query: [Aprime = ('marital_status', 'relationship', 'native_country'), u=workclass]: c1=1480.137, c2=446.311
Query: [Aprime = ('workclass', 'relationship', 'native_country'), u=marital_status]: c1=2148.798, c2=1112.879
Query: [Aprime = ('workclass', 

Query: [Aprime = ('workclass', 'occupation', 'relationship'), u=capital_gain]: c1=210.752, c2=17.017
Query: [Aprime = ('occupation', 'relationship', 'capital_loss'), u=workclass]: c1=244.220, c2=40.394
Query: [Aprime = ('workclass', 'relationship', 'capital_loss'), u=occupation]: c1=2412.761, c2=2207.554
Query: [Aprime = ('workclass', 'occupation', 'capital_loss'), u=relationship]: c1=3049.934, c2=2844.680
Query: [Aprime = ('workclass', 'occupation', 'relationship'), u=capital_loss]: c1=210.752, c2=3.816
Query: [Aprime = ('occupation', 'relationship', 'hours_per_week'), u=workclass]: c1=-1.000, c2=-1.000
Good candidate! Denoising more.
Query: [Aprime = ('occupation', 'relationship', 'hours_per_week'), u=workclass]: c1=-1.000, diff=0.000
Query: [Aprime = ('workclass', 'relationship', 'hours_per_week'), u=occupation]: c1=76.311, c2=74.103
Good candidate! Denoising more.
Query: [Aprime = ('workclass', 'relationship', 'hours_per_week'), u=occupation]: c1=76.311, diff=2.084
Query: [Aprime =

Query: [Aprime = ('workclass', 'relationship', 'sex'), u=race]: c1=1908.123, c2=551.139
Query: [Aprime = ('workclass', 'relationship', 'race'), u=sex]: c1=1818.835, c2=461.074
Query: [Aprime = ('relationship', 'race', 'capital_gain'), u=workclass]: c1=2345.851, c2=616.025
Query: [Aprime = ('workclass', 'race', 'capital_gain'), u=relationship]: c1=17861.565, c2=16130.311
Query: [Aprime = ('workclass', 'relationship', 'capital_gain'), u=race]: c1=2364.310, c2=634.326
Query: [Aprime = ('workclass', 'relationship', 'race'), u=capital_gain]: c1=1818.835, c2=89.100
Query: [Aprime = ('relationship', 'race', 'capital_loss'), u=workclass]: c1=2418.303, c2=654.477
Query: [Aprime = ('workclass', 'race', 'capital_loss'), u=relationship]: c1=18527.017, c2=16761.763
Query: [Aprime = ('workclass', 'relationship', 'capital_loss'), u=race]: c1=2412.761, c2=648.778
Query: [Aprime = ('workclass', 'relationship', 'race'), u=capital_loss]: c1=1818.835, c2=51.898
Query: [Aprime = ('relationship', 'race', 'h

Query: [Aprime = ('race', 'capital_loss', 'hours_per_week'), u=workclass]: c1=983.034, c2=323.208
Query: [Aprime = ('workclass', 'capital_loss', 'hours_per_week'), u=race]: c1=799.493, c2=139.509
Query: [Aprime = ('workclass', 'race', 'hours_per_week'), u=capital_loss]: c1=676.566, c2=13.630
Query: [Aprime = ('workclass', 'race', 'capital_loss'), u=hours_per_week]: c1=18527.017, c2=17867.266
Query: [Aprime = ('race', 'capital_loss', 'native_country'), u=workclass]: c1=24347.521, c2=7453.695
Query: [Aprime = ('workclass', 'capital_loss', 'native_country'), u=race]: c1=19235.980, c2=2341.996
Query: [Aprime = ('workclass', 'race', 'native_country'), u=capital_loss]: c1=17725.053, c2=828.117
Query: [Aprime = ('workclass', 'race', 'capital_loss'), u=native_country]: c1=18527.017, c2=1633.947
Query: [Aprime = ('race', 'hours_per_week', 'native_country'), u=workclass]: c1=951.070, c2=314.244
Query: [Aprime = ('workclass', 'hours_per_week', 'native_country'), u=race]: c1=754.529, c2=117.545
Qu

Query: [Aprime = ('education', 'education_num', 'native_country'), u=marital_status]: c1=6739.112, c2=5728.194
Query: [Aprime = ('education', 'education_num', 'marital_status'), u=native_country]: c1=1068.947, c2=60.878
Query: [Aprime = ('education_num', 'occupation', 'relationship'), u=education]: c1=54.406, c2=-1.000
Query: [Aprime = ('education', 'occupation', 'relationship'), u=education_num]: c1=52.698, c2=-1.000
Query: [Aprime = ('education', 'education_num', 'relationship'), u=occupation]: c1=788.894, c2=736.686
Query: [Aprime = ('education', 'education_num', 'occupation'), u=relationship]: c1=870.066, c2=817.812
Query: [Aprime = ('education_num', 'occupation', 'race'), u=education]: c1=780.661, c2=-1.000
Query: [Aprime = ('education', 'occupation', 'race'), u=education_num]: c1=778.954, c2=-1.000
Query: [Aprime = ('education', 'education_num', 'race'), u=occupation]: c1=6206.149, c2=5426.941
Query: [Aprime = ('education', 'education_num', 'occupation'), u=race]: c1=870.066, c2=

Query: [Aprime = ('education', 'education_num', 'sex'), u=capital_loss]: c1=2805.437, c2=71.500
Query: [Aprime = ('education_num', 'sex', 'hours_per_week'), u=education]: c1=233.507, c2=-1.000
Query: [Aprime = ('education', 'sex', 'hours_per_week'), u=education_num]: c1=231.800, c2=-1.000
Query: [Aprime = ('education', 'education_num', 'hours_per_week'), u=sex]: c1=401.625, c2=170.864
Query: [Aprime = ('education', 'education_num', 'sex'), u=hours_per_week]: c1=2805.437, c2=2575.686
Query: [Aprime = ('education_num', 'sex', 'native_country'), u=education]: c1=2594.994, c2=-1.000
Query: [Aprime = ('education', 'sex', 'native_country'), u=education_num]: c1=2593.287, c2=-1.000
Query: [Aprime = ('education', 'education_num', 'native_country'), u=sex]: c1=6739.112, c2=4143.351
Query: [Aprime = ('education', 'education_num', 'sex'), u=native_country]: c1=2805.437, c2=211.367
Query: [Aprime = ('education_num', 'capital_gain', 'capital_loss'), u=education]: c1=6534.145, c2=-1.000
Query: [Apri

Query: [Aprime = ('education', 'relationship', 'hours_per_week'), u=marital_status]: c1=24.257, c2=12.339
Query: [Aprime = ('education', 'marital_status', 'hours_per_week'), u=relationship]: c1=32.311, c2=21.056
Query: [Aprime = ('education', 'marital_status', 'relationship'), u=hours_per_week]: c1=420.579, c2=410.829
Good candidate! Denoising more.
Query: [Aprime = ('education', 'marital_status', 'relationship'), u=hours_per_week]: c1=420.579, diff=11.295
Query: [Aprime = ('marital_status', 'relationship', 'native_country'), u=education]: c1=1480.137, c2=1085.549
Query: [Aprime = ('education', 'relationship', 'native_country'), u=marital_status]: c1=723.744, c2=327.826
Query: [Aprime = ('education', 'marital_status', 'native_country'), u=relationship]: c1=1007.798, c2=612.543
Query: [Aprime = ('education', 'marital_status', 'relationship'), u=native_country]: c1=420.579, c2=27.510
Query: [Aprime = ('marital_status', 'race', 'sex'), u=education]: c1=2223.718, c2=1640.129
Query: [Aprime

Query: [Aprime = ('education', 'relationship', 'capital_loss'), u=occupation]: c1=775.708, c2=724.500
Query: [Aprime = ('education', 'occupation', 'capital_loss'), u=relationship]: c1=825.880, c2=774.626
Query: [Aprime = ('education', 'occupation', 'relationship'), u=capital_loss]: c1=52.698, c2=-1.000
Query: [Aprime = ('occupation', 'relationship', 'hours_per_week'), u=education]: c1=-1.000, c2=-1.000
Good candidate! Denoising more.
Query: [Aprime = ('occupation', 'relationship', 'hours_per_week'), u=education]: c1=-1.000, diff=0.000
Query: [Aprime = ('education', 'relationship', 'hours_per_week'), u=occupation]: c1=24.257, c2=23.049
Good candidate! Denoising more.
Query: [Aprime = ('education', 'relationship', 'hours_per_week'), u=occupation]: c1=24.257, diff=1.084
Found!


Step 2. We now perform a difference attack with the various ages. The idea is to count if $Age \neq x$ for $(A' = a^{Deborah})$ and $(A' = a^{Deborah}\land U \neq u^{Deborah})$. Since we know that $U$ is a discriminative condition, Deborah is the only user who can be in the first query and not the other. So, if the difference between the two counts is 1, it means Deborah was included by $Age \neq x$, and her age thus isn't $x$.

In [29]:
# We first define all the queries for Deborah and (Aprime, u) -- see exercise 2 step 4.
query1 = ' and '.join( '%s = %s' % (c, Deborah[c]) for c in Aprime)
query2s = []
for x in range(20):
    q2 = query1 + ' and %s <> (%s+%d-%d)' % (u, Deborah[u], x, x)
    query2s.append(q2)


# Perform the queries with the various ages.
for age in range(10, 100):

    age_condition = ' and age <> %d' % age

    # Difference query: the result is the true value + N(0,2).
    samples = [(query_2(query1 + age_condition) - query_2(q + age_condition)) for q in query2s]
    print('[Age = %d] difference: %.3f' % (age, np.mean(samples)))

[Age = 10] difference: 1.041
[Age = 11] difference: 1.041
[Age = 12] difference: 1.041
[Age = 13] difference: 1.041
[Age = 14] difference: 1.041
[Age = 15] difference: 1.041
[Age = 16] difference: 1.041
[Age = 17] difference: 1.041
[Age = 18] difference: 1.041
[Age = 19] difference: 1.041
[Age = 20] difference: 1.041
[Age = 21] difference: 1.041
[Age = 22] difference: 1.041
[Age = 23] difference: 1.041
[Age = 24] difference: 1.041
[Age = 25] difference: 1.041
[Age = 26] difference: 1.041
[Age = 27] difference: 1.041
[Age = 28] difference: 1.041
[Age = 29] difference: 1.041
[Age = 30] difference: 1.041
[Age = 31] difference: 0.041
[Age = 32] difference: 1.041
[Age = 33] difference: 1.041
[Age = 34] difference: 1.041
[Age = 35] difference: 1.041
[Age = 36] difference: 1.041
[Age = 37] difference: 1.041
[Age = 38] difference: 1.041
[Age = 39] difference: 1.041
[Age = 40] difference: 1.041
[Age = 41] difference: 1.041
[Age = 42] difference: 1.041
[Age = 43] difference: 1.041
[Age = 44] dif

There you have it: the answer is **31**.

Now, you may have noticed that the noises observed here are _identical_ : we get the same result (+- 1) for all queries when age changes. Why is that the case? Can you then devise a better attack that exploits this fact?