Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trivial Hash Collision between 0. and 0.F #5379

Closed
zwimer opened this issue Jun 30, 2021 · 9 comments
Closed

Trivial Hash Collision between 0. and 0.F #5379

zwimer opened this issue Jun 30, 2021 · 9 comments

Comments

@zwimer
Copy link
Contributor

zwimer commented Jun 30, 2021

The following ASTs have the following hashes.
(_ +zero 8 24) has a hash of 1098364533
(_ +zero 11 53) has a hash of 1098364533

The same collision happens for -0. and -0.F.

Is this just two 1/(2^32) hash collisions? Or do the hashes not consider the sort? Or is this a bug?

@zwimer zwimer changed the title Trivial Hash Collision between 0. and +0.F Trivial Hash Collision between 0. and 0.F Jun 30, 2021
@zwimer
Copy link
Contributor Author

zwimer commented Jun 30, 2021

I reported this as an issue since it feels like a bug as Z3_get_ast_hash reports:

Return a hash code for the given AST. The hash code is structural. You can use Z3_get_ast_id interchangeably with this function.

Likewise, the identical function Z3_get_ast_id reports:

Return a unique identifier for t. The identifier is unique up to structural equality. Thus, two ast nodes created by the same context and having the same children and same function symbols have the same identifiers. Ast nodes created in the same context, but having different children or different functions have different identifiers. Variables and quantifiers are also assigned different identifiers according to their structure.

Docs linked here.

@nunoplopes
Copy link
Collaborator

Of course hashes are not unique, and there are collisions. Identifiers are unique (just a bumped counter).
The question is: do you observe a performance issue because of this collision in practice? If so, can you produce a repro?

Of course the hashing function can always be improved. But there's a tradeoff between speed and quality. Changing the hashing function is not something to be done lightly.
Please reopen if you have data that backs up a change.

@wintersteiger
Copy link
Contributor

I think function type parameters (to _) are not hashed. Like Nuno said, that's not a problem in practice, but the documentation promises those IDs to be unique, for which we would have to take parameters into account, or clean up the docs.

@wintersteiger wintersteiger reopened this Jul 5, 2021
@NikolajBjorner
Copy link
Contributor

(_ +zero 8 24)
(_ +zero 11 53)
Have different sorts, the same function symbol. The sorts are not used when computing hashes.

@NikolajBjorner
Copy link
Contributor

I assume you can use the id instead of the hash.

@zwimer
Copy link
Contributor Author

zwimer commented Jul 5, 2021

@NikolajBjorner What ID are you referring to? According to the docs Z3_get_ast_id are the same thing Z3_get_ast_hash:

Return a hash code for the given AST. The hash code is structural. You can use Z3_get_ast_id interchangeably with this function.

Also, would you like me to make a PR noting that sorts are not considered 'structural', at least by the hash docs' definition of structural?

NikolajBjorner added a commit that referenced this issue Jul 6, 2021
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
@NikolajBjorner
Copy link
Contributor

I assume you can use Z3_get_ast_id instead of Z3_get_ast_hash.

@NikolajBjorner
Copy link
Contributor

NikolajBjorner commented Jul 6, 2021

I changed the hash function to use the sort as well so now you get two different hash values for your example.
Hashes are still not unique. If you have 65K nodes you should be very likely to get a hash collision but there are never collisions on the value of get_ast_id.

@zwimer
Copy link
Contributor Author

zwimer commented Jul 10, 2021

Ah, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants