Refactor and clarify the count and count_distinct code a bit #142

pdobacz · 2022-01-05T11:43:18Z

Squashed version of #139, see there for discussion. Diff should be the same as #139, retaining #139 intact to keep the discussions up.

cristianberneanu · 2022-01-05T13:20:52Z

pg_diffix/aggregation/count.h

 {
  int64 max_flattening;
-  int64 max_flattened_count;
+  int64 max_flattened_count_with_max_flattening;


Some verbosity is good, but such a long field name is very weird to work with. Makes all the lines it appears in extra long.

cristianberneanu · 2022-01-05T13:21:41Z

src/aggregation/count.c

  if (flattening >= accumulator->max_flattening)
  {
    accumulator->max_flattening = flattening;
-    /* Get the largest flattened count from the ones with the maximum flattening. */


Putting this comment next to the field definition would have been better IMO.

I was on the fence, opted for less comments. As this is contentious, I'll revert at nearest opportunity and move/copy the comment 👍. Comment only here is definitely not enough

Another approach here would be to group related fields into nested structs. Not sure if this works, but it would shorter and clearer:

typedef struct CountResultAccumulator { struct { int64 amount; int64 count; } max_flattened; };

FWIW, it seems like the pg_diffix implementation is different from reference. Once we are able to confirm this (slack), we'll have to redo the naming accordingly. Let's postpone the naming thread till then.

cristianberneanu · 2022-01-05T13:22:53Z

src/aggregation/count_distinct.c

+static const int COUNT_DISTINCT_AIDS_OFFSET = 2;



I think we can omit the COUNT_DISTINCT_ prefix since this constant is private to this module.

And why OFFSET? We use index everywhere else.

I followed the convention from count.c https://github.com/diffix/pg_diffix/blob/master/src/aggregation/count.c#L39. So INDEX is for a single thing, OFFSET is to indicate where the series of variadic things begins.

The prefix there makes sense because we have 2 aggs in the same file. Here it's just one so it can be removed safely.

So INDEX is for a single thing, OFFSET is to indicate where the series of variadic things begins.

OK, makes sense.

edongashi · 2022-01-05T13:30:37Z

src/aggregation/count_distinct.c

 /* Maps values per-AID given the list of low-count tracker entries and an AID values set index. */
-static List *transpose_lc_values_per_aid(List *lc_entries, int aidvs_index, uint32 *lc_values_true_count)
+static List *transpose_lc_values_per_aid(List *lc_entries, int aid_index, uint32 *lc_values_true_count)


Missed these changes. The index is for the set (comment above), not for some particular AID value.

Maybe we should call the aid parameter index something else. aid_pos/aid_no/aid_offset?

wait, I think we covered this in #139 (comment). aidvs is more naturally expanded to AID valueS, so could be confused with the "vertical" dimension.

That as well. I can't tell if it's pluralized or the set.

edongashi · 2022-01-05T13:32:29Z

src/aggregation/count_distinct.c

 * The number of low count values has to be anonymized.
 */
-static CountDistinctResult count_distinct_calculate_final(DistinctTracker_hash *tracker, int aidvs_count)
+static CountDistinctResult count_distinct_calculate_final(DistinctTracker_hash *tracker, int aids_count)


Same here. this is the number of aid instances, right?

answered in the other thread

pdobacz requested a review from edongashi January 5, 2022 11:43

Refactor and clarify the count and count_distinct code a bit

503d57b

pdobacz force-pushed the piotr/clarify-aid-aidv-naming-squash branch from cc474f2 to 503d57b Compare January 5, 2022 13:01

edongashi approved these changes Jan 5, 2022

View reviewed changes

pdobacz merged commit 2a6c75f into master Jan 5, 2022

pdobacz deleted the piotr/clarify-aid-aidv-naming-squash branch January 5, 2022 13:05

pdobacz mentioned this pull request Jan 5, 2022

Rename and refactor #139

Closed

cristianberneanu reviewed Jan 5, 2022

View reviewed changes

edongashi reviewed Jan 5, 2022

View reviewed changes

Refactor and clarify the count and count_distinct code a bit #142

Refactor and clarify the count and count_distinct code a bit #142

Uh oh!

Conversation

pdobacz commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pdobacz commented Jan 5, 2022 •

edited

Loading