Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiSimhash #70

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

MultiSimhash #70

wants to merge 4 commits into from

Conversation

orapic
Copy link

@orapic orapic commented Feb 10, 2022

Add the possibility to concatenate simhashes to make a larger one.

This way one can make a sort of "signature" where multiple simhashes are combined into one. Also made the tests for it.

Example with 4 simhashes of 32 bits:

input_to_concat = [simhash1, simhash2, simhash3, simhas4]
new_simhash =  MultiSimhash(input_to_concat)

@1e0ng
Copy link
Owner

1e0ng commented Feb 12, 2022

Hi @orapic, Thanks for the PR!
I can see one build failed. Could you check the cause and fix it?

@orapic
Copy link
Author

orapic commented Mar 6, 2022

Ok, should be fixed now.

for i in simhashes:
multi_f = multi_f + i.f
if multi_f % 8:
raise Exception('Simhashes do not the same length (f)')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could you explain here, what do you want to check?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, well looking at it twice, that code doesn't make much sense. I will change it to check them all 1 by 1 to the length of the first simshash.

if multi_f % 8:
raise Exception('Simhashes do not the same length (f)')
multi_value = self._concatenate_simhashes(simhashes)
super(MultiSimhash, self).__init__(value=multi_value, f=multi_f, hashfunc=simhashes[0].hashfunc)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before using simhases[0], we need to check the length of simhashes to make sure it's not empty. Also since you are using the first element's hashfunc, do we assume all hashfunc should be the same for each element in simahases list?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, will add some check to look if its empty or not.
Regarding the hashfunc, I think it's safe they must be the same. If they are not, which one do you chose for the new multihash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants