-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiSimhash #70
base: master
Are you sure you want to change the base?
MultiSimhash #70
Conversation
Hi @orapic, Thanks for the PR! |
Ok, should be fixed now. |
for i in simhashes: | ||
multi_f = multi_f + i.f | ||
if multi_f % 8: | ||
raise Exception('Simhashes do not the same length (f)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, could you explain here, what do you want to check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, well looking at it twice, that code doesn't make much sense. I will change it to check them all 1 by 1 to the length of the first simshash.
if multi_f % 8: | ||
raise Exception('Simhashes do not the same length (f)') | ||
multi_value = self._concatenate_simhashes(simhashes) | ||
super(MultiSimhash, self).__init__(value=multi_value, f=multi_f, hashfunc=simhashes[0].hashfunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before using simhases[0]
, we need to check the length of simhashes
to make sure it's not empty. Also since you are using the first element's hashfunc
, do we assume all hashfunc
should be the same for each element in simahases
list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, will add some check to look if its empty or not.
Regarding the hashfunc
, I think it's safe they must be the same. If they are not, which one do you chose for the new multihash?
Add the possibility to concatenate simhashes to make a larger one.
This way one can make a sort of "signature" where multiple simhashes are combined into one. Also made the tests for it.
Example with 4 simhashes of 32 bits: