-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement approximate file matching #5 #6
Conversation
Signed-off-by: Jono Yang <jyang@nexb.com>
* Compute resource fingerprints in fingerprint_codebase Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
9293abc
to
287f8ec
Compare
* Use open instead of codecs.open Signed-off-by: Jono Yang <jyang@nexb.com>
file_fingerprint = BitAverageHaloHash(ngs_bytes) if ngs_bytes else None | ||
|
||
return dict( | ||
halo1=file_fingerprint.hexdigest().decode('utf-8') if file_fingerprint else '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne What should the file fingerprint be called? Currently it is halo1
but I don't think that's a great label. Also, if a fingerprint could not be generated for a file, should we still populate that resource's extra_data field with {'halo1': ''}
or just avoid returning anything at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the file fingerprint be called? Currently it is
halo1
but I don't think that's a great label.
This is OK for now IMHO.
Also, if a fingerprint could not be generated for a file, should we still populate that resource's extra_data field with
{'halo1': ''}
or just avoid returning anything at all?
IMHO, always provide a value, null or empty (the same we do for other hashes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pombredanne I have a question regarding the file fingerprint name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking fine.... but I wonder if the tests are correct or enough? Especially using power of two repeated words may be a special corner case?
@pombredanne What additional tests do you suggest? |
@JonoYang what about a test that would be using two real code files where the second one has been modified a little? |
Signed-off-by: Jono Yang <jyang@nexb.com>
No description provided.