Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different similiarity outputs between libraries #1

Open
WJDigby opened this issue May 26, 2019 · 2 comments
Open

Different similiarity outputs between libraries #1

WJDigby opened this issue May 26, 2019 · 2 comments

Comments

@WJDigby
Copy link

WJDigby commented May 26, 2019

Hello,

Thank you for providing this code.

This library outputs different "similarity" ratings when comparing two hashes than other ssdeep libraries / examples:

Python3 ssdeep library and the same Eicar strings used in the readme:

>>> e1 = ssdeep.hash("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*")
>>> e2 = ssdeep.hash("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-THREATPINCH-ANTIVIRUS-TEST-FILE!$H+H*")
>>> e1
'3:a+JraNvsgzsVqSwHq9:tJuOgzsko'
>>> e2
'3:a+JraNvsg7QhyqzWwHq9:tJuOg7Q4Wo'
>>> ssdeep.compare(e1, e2)
18

JavaScript ssdeep.js library:

>> e1 = ssdeep.digest("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*")
"3:a+JraNvsgzsVqSwHq9:tJuOgzsko"
>> e2 = ssdeep.digest("X5O!P%@AP[4\\PZX54(P^)7CC)7}$EICAR-THREATPINCH-ANTIVIRUS-TEST-FILE!$H+H*")
"3:a+JraNvsg7QhyqzWwHq9:tJuOg7Q4Wo"
>> ssdeep.similarity(e1, e2)
70

Both libraries produce identical hashes.

The ssdeep online demo also produces a value of 18 when comparing the two Eicar strings:

image

​Is this intended behavior? Is there a "weight" or some metric that can adjust the grading scale of the comparison?

@gehaxelt
Copy link

I noticed the same. Any idea why this happens @cloudtracer ?

@memcorrupt
Copy link

I noticed this library has a few bugs in its comparison algorithm and is also inefficient since it runs synchronously. I created a project fast-ssdeep that binds to the ssdeep C API to provide a performant and compliant implementation.

Not sure if this repository is maintained at all. If it isn't, it would be nice if the maintainer could mention my project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants