Skip to content

Conversation

@pdobacz
Copy link
Contributor

@pdobacz pdobacz commented Jan 5, 2022

Squashed version of #139, see there for discussion. Diff should be the same as #139, retaining #139 intact to keep the discussions up.

@pdobacz pdobacz requested a review from edongashi January 5, 2022 11:43
@pdobacz pdobacz force-pushed the piotr/clarify-aid-aidv-naming-squash branch from cc474f2 to 503d57b Compare January 5, 2022 13:01
@pdobacz pdobacz merged commit 2a6c75f into master Jan 5, 2022
@pdobacz pdobacz deleted the piotr/clarify-aid-aidv-naming-squash branch January 5, 2022 13:05
@pdobacz pdobacz mentioned this pull request Jan 5, 2022
{
int64 max_flattening;
int64 max_flattened_count;
int64 max_flattened_count_with_max_flattening;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some verbosity is good, but such a long field name is very weird to work with. Makes all the lines it appears in extra long.

if (flattening >= accumulator->max_flattening)
{
accumulator->max_flattening = flattening;
/* Get the largest flattened count from the ones with the maximum flattening. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this comment next to the field definition would have been better IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence, opted for less comments. As this is contentious, I'll revert at nearest opportunity and move/copy the comment 👍. Comment only here is definitely not enough

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach here would be to group related fields into nested structs. Not sure if this works, but it would shorter and clearer:

typedef struct CountResultAccumulator
{
  struct
  {
    int64 amount;
    int64 count;
  } max_flattened;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it seems like the pg_diffix implementation is different from reference. Once we are able to confirm this (slack), we'll have to redo the naming accordingly. Let's postpone the naming thread till then.

Comment on lines +71 to 72
static const int COUNT_DISTINCT_AIDS_OFFSET = 2;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can omit the COUNT_DISTINCT_ prefix since this constant is private to this module.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why OFFSET? We use index everywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the convention from count.c https://github.com/diffix/pg_diffix/blob/master/src/aggregation/count.c#L39. So INDEX is for a single thing, OFFSET is to indicate where the series of variadic things begins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix there makes sense because we have 2 aggs in the same file. Here it's just one so it can be removed safely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So INDEX is for a single thing, OFFSET is to indicate where the series of variadic things begins.

OK, makes sense.

Comment on lines 316 to +317
/* Maps values per-AID given the list of low-count tracker entries and an AID values set index. */
static List *transpose_lc_values_per_aid(List *lc_entries, int aidvs_index, uint32 *lc_values_true_count)
static List *transpose_lc_values_per_aid(List *lc_entries, int aid_index, uint32 *lc_values_true_count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed these changes. The index is for the set (comment above), not for some particular AID value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call the aid parameter index something else. aid_pos/aid_no/aid_offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, I think we covered this in #139 (comment). aidvs is more naturally expanded to AID valueS, so could be confused with the "vertical" dimension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That as well. I can't tell if it's pluralized or the set.

* The number of low count values has to be anonymized.
*/
static CountDistinctResult count_distinct_calculate_final(DistinctTracker_hash *tracker, int aidvs_count)
static CountDistinctResult count_distinct_calculate_final(DistinctTracker_hash *tracker, int aids_count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. this is the number of aid instances, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered in the other thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants