Look into the "create NULL values with JOIN using count(*)" difference attack #164

yoid2000 · 2021-05-21T11:25:24Z

Cristian raised the following difference attack:

I was under the impression this attack was already described in another discussion. But to be more explicit:

We have 2 tables: purchases, with columns product_id and client_id, and clients, with columns id and ssn.
Column ssn is labeled as the AID. Assume the following data is present:

Clients:

id	ssn
1	a
2	b
3	c
4	d
5	e
6	f

Purchases:

product_id	client_id
1	1
2	2
3	3
4	4
5	5
1	6
2	6
3	6
4	6
5	6

If we issue a query like:

select count(*) from purchases join clients on client_id = clients.id

we get the following input to the count aggregator:

count	ssn
1	a
1	b
1	c
1	d
1	e
5	f

If Ne=1 and Nt = 2, this results in a flattening of 4. The output will be 10 - 4 + noise(1) = 6 + noise(1).

For the query:

select count(*) from purchases left join clients on client_id = clients.id and ssn <> 'f'

we get the following input to the count aggregator:

count	ssn
1	a
1	b
1	c
1	d
1	e
5	NULL

The last row is put aside as unaccounted.
If Ne=1 and Nt = 2, this results in a flattening of 0 for accounted rows. The output will be 5 - 0 + 5 - 0 + noise(1) = 10 + noise(1).

The difference in the results is large enough to dominate the noise, allowing us to know the entity with ssn = 'f' is an outlier and to detect other attributes of it by conditional inclusion/exclusion in queries.

The text was updated successfully, but these errors were encountered:

sebastian · 2021-05-26T10:41:47Z

The last row is put aside as unaccounted.

The last row is not put aside as unaccounted for. The unaccounted for business only applies if there is another AID for us to attach the information to, which in this case there is not.

The output will be 5 - 0 + 5 - 0 + noise(1) = 10 + noise(1).

The result, in this example, would be: 5 + noise(1).

Presumably there would additionally be a noise layer associated with the negative condition excluding ssn f, or LED compensating for the low effect exclusion of AID values.

If you can exclude a victim, then this is clearly not safe...

I think a premise for this attack to work would be if we have a query that meets the following criteria:

with multiple AIDs (this results in the unaccounted for functionality kicking in) where at least one of the AIDs has an AID value for the offending row
a condition or query construct that excludes the victim that isn't detected or compensated for by LED.

Let's alter the above example such that both tables have an AID to make this attack work. TL;DR: The difference is now between 6 + noise(2) in the first query, and 10 + noise(2) (and presumably extra noise layers?) in the second query.

The tables would be as in the original example, only with product_id being an AID as well.

Clients:

id	ssn (AID)
1	a
2	b
3	c
4	d
5	e
6	f

Purchases:

product_id (AID)	client_id
1	1
2	2
3	3
4	4
5	5
1	6
2	6
3	6
4	6
5	6

Now the query select count(*) from purchases join clients on client_id = clients.id would yield:

Anonymized based on ssn:

ssn	count	Flattening by AID value
f	5	4
a	1
b	1
c	1
d	1
e	1

i.e. a count of 10 - 4 + noise(1) = 6 + noise(1)

Anonymized based on product_id:

product_id	count	Flattening by AID value
1	2	0
2	2
3	2
4	2
5	2

i.e. a count of 10 - 0 + noise(2) = 10 + noise(2)

The final anonymized result would be: 6 + noise(2)

For query select count(*) from purchases left join clients on client_id = clients.id and ssn <> 'f' would produce:

Anonymized based on ssn:

ssn	count	Flattening by AID value
a	1
b	1
c	1
d	1
e	1
unaccounted for	5

i.e. a count of 5 + 5 + noise(1) = 10 + noise(1)

Anonymized based on product_id:

product_id	count	Flattening by AID value
1	2
2	2
3	2
4	2
5	2

i.e. a count of 10 + noise(2)

The final anonymized result would therefore be 10 + noise(2)

cristianberneanu · 2021-05-26T10:56:15Z

The example with a single AID was for illustrative purposes.
In many joins, there would be 2 or more AIDs, with the ones on the left side being non-null.
Plus, if we have an algorithm for handling null AID values, why wouldn't we use it consistently in all cases?

a condition or query construct that excludes the victim that isn't detected or compensated for by LED.

I don't think LED can compensate for not flattening outliers.

yoid2000 · 2021-05-27T07:12:09Z

I don't think LED can compensate for not flattening outliers.

The design I've settled on will adjust for one AID, and this adjustment takes place before noise and flattening are computed. In fact it happens within the selectable containing the isolating condition.

So in the case of this attack, LED would insert the rows with ssn = f back into the computation, and there would be difference in the underlying data for both queries.

Can I suggest that we revisit this attack after LED is implemented in the reference implementation?

yoid2000 self-assigned this May 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look into the "create NULL values with JOIN using count(*)" difference attack #164

Look into the "create NULL values with JOIN using count(*)" difference attack #164

yoid2000 commented May 21, 2021

sebastian commented May 26, 2021 •

edited

Loading

cristianberneanu commented May 26, 2021

yoid2000 commented May 27, 2021

Look into the "create NULL values with JOIN using count(*)" difference attack #164

Look into the "create NULL values with JOIN using count(*)" difference attack #164

Comments

yoid2000 commented May 21, 2021

sebastian commented May 26, 2021 • edited Loading

cristianberneanu commented May 26, 2021

yoid2000 commented May 27, 2021

sebastian commented May 26, 2021 •

edited

Loading