Do we include NULL contributors in AID seed? #21

edongashi · 2021-02-10T15:24:27Z

If an AID contributes with NULL, do we include that AID in the seed material?

The text was updated successfully, but these errors were encountered:

cristianberneanu · 2021-02-10T15:48:18Z

After thinking about it a bit more, I think they should be included in the seed (but they should be ignored during aggregation).

NULL values were ignored during aggregation in the previous system, are ignored in the reference implementation and are also ignored in standard SQL aggregators.

But the seed is a property of the bucket, so any encountered AIDs have to contribute to it. If we find a way to compute the seed while we digest data, then we can drop NULL handling from the aggregators, simplifying stuff a bit more.

sebastian · 2021-02-10T15:50:00Z

On the assumption that it's the value contributed by the AID that's null, and not the AID itself, then I think it should be:

In count(*) then yes. The null value is counted and hence the AID should also be used in the seed material
For aggregates where null-values are ignored the AID does not contribute to the seed for such a value (such as count(price) for a price of null)

sebastian · 2021-02-10T15:56:27Z

But the seed is a property of the bucket, so any encountered AIDs have to contribute to it.

Why should any encountered AID contribute to the seed, if that AID doesn't otherwise contribute to the bucket?
I think exactly the "The seed is a property of the bucket" statement rings true, but I would read that as an argument for excluding the AID from the seed in the case where it didn't contribute to the aggregate.

edongashi · 2021-02-10T16:24:44Z

Why should any encountered AID contribute to the seed, if that AID doesn't otherwise contribute to the bucket?

diffix_lcf(aid) has no knowledge of what's happening in aggregates.

Hmm, let's consider this scenario:

There are 100 rows of shape (aid, col) in a bucket where AIDs are unique in interval [1...100].
Let's suppose 99 rows have col set to NULL and we want to calculate count(col).
Should we suppress this bucket? diffix_lcf(aid) says no because it has no idea that col is NULL...

cristianberneanu · 2021-02-10T16:28:02Z

As much as I would like to use this opportunity to get rid of some code, I still think it should affect the seed.
Even if it has no effect on the aggregate, it still helps the bucket pass LCF.

sebastian · 2021-02-11T09:31:16Z

There are 100 rows of shape (aid, col) in a bucket where AIDs are unique in interval [1...100].
Let's suppose 99 rows have col set to NULL and we want to calculate count(col).
Should we suppress this bucket? diffix_lcf(aid) says no because it has no idea that col is NULL...

It passes the low count filter (and all AIDs contribute to the seed for the low count filter), but we don't produce an aggregate, because we have insufficient data for the aggregate.
We can still output some value such as null, <insufficient data for an aggregate count> (assuming the query was SELECT col, count(...)).

cristianberneanu · 2021-02-11T10:35:45Z

The main question is do we need to have the same seed for all aggregators (including LCF) or not.
Previously, the seed was computed separately and per bucket. We don't have this option now, so maybe we don't have to keep the same design, unless it causes a vulnerability.

yoid2000 · 2021-02-11T12:18:46Z

There are a number of different things being discussed here, so I'm a bit confused. The set of questions seem to be:

Should a NULL AID value contribute to the seed?
Should a NULL AID value be counted as a distinct user when counting the number of distinct users?
Do we need to have the same seed for all aggregators (including LCF) or not?

Regarding 1, what would cause an AID to be NULL?

Regarding 2, can we avoid this question by always knowing what the actual AID is?

Regarding 3, this question doesn't arise for Publish AFAIK, and I think it is premature to ask it for the other variants.

In fact, none of this really matters for Publish...

edongashi · 2021-02-11T12:27:49Z

Should a NULL AID value contribute to the seed?

The question is not for NULL AID but for NULL contribution coming from a (non-null) AID.

In the example below, which AIDs will be used for the AID noise seed for count(col): 1,2,4 or 1,2,3,4?

aid	col
1	'a'
2	'b'
3	NULL
4	'c'

yoid2000 · 2021-02-11T12:39:11Z

Ah, I thought you meant the AID itself was NULL, not the aggregate.

My intuition is that the AIDs with NULL contribution to the aggregator should be included in the seed as well as the LCF computation.

And I can't think of an attack that would exploit this.

And I presume it is simpler to just include the AID in all cases (no special cases to deal with NULL).

So let's go with including all AIDs regardless of contribution to the aggregator.

cristianberneanu · 2021-02-11T16:32:30Z

And I presume it is simpler to just include the AID in all cases (no special cases to deal with NULL).

Actually, it is easier to exclude NULL contributions (simpler to ignore stuff sooner, rather than later).
But code is already written to include them in the extension, so maybe simpler to leave it so? Then we need to update the reference implementation to do the same.

cristianberneanu · 2021-03-05T11:25:16Z

Closing this as NULL contributions are already included in the seed.

edongashi added the question Further information is requested label Feb 10, 2021

sebastian mentioned this issue Feb 15, 2021

How to handle null-AIDs #25

Closed

cristianberneanu closed this as completed Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we include NULL contributors in AID seed? #21

Do we include NULL contributors in AID seed? #21

edongashi commented Feb 10, 2021

cristianberneanu commented Feb 10, 2021

sebastian commented Feb 10, 2021

sebastian commented Feb 10, 2021

edongashi commented Feb 10, 2021 •

edited

cristianberneanu commented Feb 10, 2021

sebastian commented Feb 11, 2021 •

edited

cristianberneanu commented Feb 11, 2021

yoid2000 commented Feb 11, 2021

edongashi commented Feb 11, 2021

yoid2000 commented Feb 11, 2021

cristianberneanu commented Feb 11, 2021

cristianberneanu commented Mar 5, 2021

Do we include NULL contributors in AID seed? #21

Do we include NULL contributors in AID seed? #21

Comments

edongashi commented Feb 10, 2021

cristianberneanu commented Feb 10, 2021

sebastian commented Feb 10, 2021

sebastian commented Feb 10, 2021

edongashi commented Feb 10, 2021 • edited

cristianberneanu commented Feb 10, 2021

sebastian commented Feb 11, 2021 • edited

cristianberneanu commented Feb 11, 2021

yoid2000 commented Feb 11, 2021

edongashi commented Feb 11, 2021

yoid2000 commented Feb 11, 2021

cristianberneanu commented Feb 11, 2021

cristianberneanu commented Mar 5, 2021

edongashi commented Feb 10, 2021 •

edited

sebastian commented Feb 11, 2021 •

edited