Skip to content

Conversation

@cristianberneanu
Copy link
Collaborator

No description provided.

Copy link
Contributor

@pdobacz pdobacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting taking a step further in refactoring here

}

extern hash_t hash_set_combine(List *hash_set);
extern hash_t hash_set_combine(const List *hash_set);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not up to your change, but when I read this name, I thought first it is another operation on hash sets, as it's among union and add. Only on second read I noticed the return type and realized this is a "destructive" digest of the hash set.

Having said that, I don't know what other name would work. hash_set_digest, ...collapse come to mind. For your consideration, but can also stay as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash_set_reduce?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call this a hash set? The backing implementation is an array. There is an ambiguity in wording between set of hashes and hash-table based set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backing implementation doesn't matter for the semantics. The purpose was to abstract away the common operations on sets of hashes and worry less about the implementation details.
I thought about also defining a separate type for the hash set, i.e. typedef List HashSet, but it seemed overkill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sets of hashes

When I see hash_set I immediately think of the HashSet<T> type. Could we name this something different because of this ambiguity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's literally a set of hashes (not AID), so I'm not sure. Let's leave it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's literally a set of hashes (not AID), so I'm not sure. Let's leave it as is.

I'm referring to our old discussion which keeps popping up, I think started in #139, where we decided to avoid calling it hash. I'd be in general be in favor of calling it hash, but we decided not to, so I guess we should be consistent.

Whew, going about the naming here is realllly hard, so I might be wrong overall, excuse me if that so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a set of hashes used for seeding of some sort...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base abstraction represents a set of hashes that can be reduced to a noise layer seed.
The hashes can come from AID values, producing a seed for the AID noise layer, or they can come from bucket labels and expressions, producing a seed for the bucket noise layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I forgot about bucket labels/exprs, right!

aid_t aid = get_aid_descriptor(aid_type).make_aid(args[aid_index].value);
List **aidv = (List **)&lfirst(cell);
*aidv = list_append_unique_ptr(*aidv, (void *)aid);
List **src_aid_values_set = (List **)&lfirst(cell);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src_ part looks like copypasta, as there's no dest here.

Other than that, this is the unclear code that we removed clarifying comments from earlier, saying we should refactor to make it clearer.

Can we extract some portion of this to a named function to achieve this? A function like include_aid_in_set containing the entire if would be my starting point, but can be done differently perhaps.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I incorrectly copied the naming from a different section.

List *aidvs; /* List of AID sets (represented as Lists), one for each AID instance */
char status; /* Required for hash table */
Datum value; /* Unique value */
List *aid_values_sets; /* List of AID sets, one for each AID instance */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re the removed comment; to reiterate :)

I think it's hard to learn the type of the values held in this list, given the nesting and overall complexity of this module, I wish I had such a comment when first confronted with this code. It's actually even more pressing now, as the name would indicate it's some kind of a set internally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand which part was removed and you think was valuable.
Would it be better reformulated as: "List of hashed AID values sets, one for each AID instance"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the represented as Lists part, or, if the option to typedef ... <something> prevails (other thread), represented as <something>, unless after the typedef change it becomes much better self documenting, what type concretely is held inside aid_value_sets.

This is to fill in for the type-less-ness of List, which in the case of a nested List of Lists, becomes an unnecessary hurdle, when trying to understand the code (unless you're already familiar with it). Because of this, I'd even say that doing the typedef ... HashSet is worthwhile.

@cristianberneanu cristianberneanu merged commit a633ba4 into master Mar 21, 2022
@cristianberneanu cristianberneanu deleted the cristian/misc branch March 21, 2022 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants