-
Notifications
You must be signed in to change notification settings - Fork 2
Rename and refactor #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename and refactor #139
Conversation
| { | ||
| int64 max_flattening; | ||
| int64 max_flattened_count; | ||
| int64 max_flattened_count_with_max_flattening; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is somewhat along the lines of noise_with_max_sigma. This field doesn't hold the max_flattened_count, but a max from the subset having max_flattening. I thought I could trade removing a comment for a longer name.
| { | ||
| Datum value; /* Unique value */ | ||
| List *aidvs; /* AID value sets for the unique value */ | ||
| List *aidvs; /* List of (hashes of) AID value lists, one for each AID instance */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the items are hash sets, not lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be wrong. I have no clue how count distinct works here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, they're lists, it confused me on my reading, so I prefer an explicit version
src/aggregation/count_distinct.c
Outdated
| bool insufficient_data = false; | ||
| CountResultAccumulator result_accumulator = {0}; | ||
|
|
||
| for (int aidvs_index = 0; aidvs_index < aids_count; aidvs_index++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: aidvs means AID value set. aid_index also works to know that we're referring to some particular AID type instead of an aid value in a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, this clarifies this a bit. I thought this is AID values. I think aid_index dispels the ambiguity
src/aggregation/count_distinct.c
Outdated
| if (list_length(aidv) == max_size) // set is full, value is not low-count | ||
| return aidv; | ||
| return list_append_unique_ptr(aidv, (void *)aid); | ||
| return list_append_unique_ptr(aidv, (void *)aid_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is wrong here (not your changes). list_append_unique_ptr is being called for a value type. This assumes we're running on a 64 bit system, otherwise we lose half of the AID when storing to the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need some more ramp up to figure out a fix for this - could you open an issue pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do after investigating how count distinct works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to assume 64-bit system. We should have a compile time assert for that, if one doesn't exist already.
This reverts commit 704e2ed. # Conflicts: # src/aggregation/count.c # src/aggregation/count_distinct.c
pg_diffix--0.0.1.sql
Outdated
|
|
||
| /* ---------------------------------------------------------------- | ||
| * anon_count(any, aids...) | ||
| * anon_count_any(value, aids...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep this any to mirror postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But ok since it matches with the oid comments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a back and forth here. I think value is consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but value is not a valid type. Neither is aids but we can't help it because we need the any polymorphism.
pg_diffix--0.0.1.sql
Outdated
| */ | ||
|
|
||
| CREATE FUNCTION diffix.anon_count_any_transfn(internal, "any", variadic "any") | ||
| CREATE FUNCTION diffix.anon_count_any_transfn(internal, "any", variadic aids "any") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the value name here instead: anon_count_any_transfn(internal, value "any", variadic aids "any")? Does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did so, and it worked, then I took it back to not overthink. If you like it, I'll reintroduce 👍
| */ | ||
|
|
||
| CREATE FUNCTION diffix.anon_count_distinct_transfn(internal, "any", variadic aids "any") | ||
| CREATE FUNCTION diffix.anon_count_distinct_transfn(internal, value "any", variadic aids "any") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have value here now I'd revert to any in the comments (both sql and C). I think it's better because we have no typing in C-land and that's the closest we can have to a signature. This will become noticeable when we get more UDFs and possibly overloads. Imagine we have a max function of datetime and integer, would you rather write them as
max(value)
max(value)
or
max(integer)
max(datetime)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if you put any, aids... there it doesn't make much sense, neither does any, any.... Documenting "what" the argument represents is more useful than the any type. Since those are caption comments, maybe we can drop the arguments - everything is properly documented in the CREATE ... statements anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aid has special meaning for us because we produce the aid adapters in code and we know right away what to expect. any, aids... is perfectly fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let it be, you mean like this, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit looks nice. We have experimented with making it a proper postgres type but without success and ended up with our current design with the AidSpec stuff.
|
superseded by #142 |
This is a loose collection of things I've picked up along the way, while exploring and figuring out how things work in here.
It's all split by commits, so we can pick only those which make sense. I tried to use my common sense understanding of things and (a little bit)
referencecode.review commit by commit; they should be rather atomic steps. Also, if you don't like a commit codewise, let me know if I was on the correct track.
If anything worthy of
mastercomes out of this exercise, I'll squash.