-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance & hash based whitelisting #592
Comments
I suggest to calculate the hash (once per sample!) externally and pass it as an external variable to your ruleset.
Test run with 100 hashes:
Your whitelist rule would now look like this:
... |
Thanks! That's what I do now in Python, but its 25% slower than Yara's built-in hash module (for single hashes, that is). with open(path, 'rb') as fh:
data = fh.read()
hash = hashlib.sha1(data).hexdigest()
if hash in whitelist:
return False
rules.match(data=data) So I was hoping it would be possible that Yara would memoize the results of its hash lookups (or another cache mechanism). |
@gwillem YARA 3.5 doesn't cache the hash results, but the latest version in master does it. |
Awesome, thank you! |
Does someone have some benchmark to share about this? |
What is the recommend way to implement hash based whitelists in Yara? Some projects such as php malware finder do it similar like this:
However, this seems to take O(n) time while I would expect O(1).
Proof, timing with a single sha1sum:
So checking for 100 hashes takes 35 times as much CPU power as 1 hash. What is the best way to whitelist thousands of hashes?
The text was updated successfully, but these errors were encountered: