Performance & hash based whitelisting #592

gwillem · 2017-01-06T11:13:38Z

What is the recommend way to implement hash based whitelists in Yara? Some projects such as php malware finder do it similar like this:

import "hash"
global private rule Whitelist {
	condition:
		hash.sha1(0, filesize) != "c9cf738d8b1a8a77f6d200f327c5d4ec8201a99d" and
                hash.sha1(0, filesize) != "40a0a6e5ff86f75e6723e0008ddae29b1ed384c8" and
                [...]
}

However, this seems to take O(n) time while I would expect O(1).
Proof, timing with a single sha1sum:

$ time yara -r whitelist-1-hash.yar magento-2.0
real	0m2.780s
user	0m4.056s
sys	0m2.344s

$ time yara -r whitelist-100-hashes.yar magento-2.0
real	0m38.553s
user	2m15.468s
sys	0m2.348s

So checking for 100 hashes takes 35 times as much CPU power as 1 hash. What is the best way to whitelist thousands of hashes?

The text was updated successfully, but these errors were encountered:

aschuster99 · 2017-01-06T11:57:04Z

I suggest to calculate the hash (once per sample!) externally and pass it as an external variable to your ruleset.

$ sha1deep -b sample
bb1ab80641f80fdd0e6258a032a8e9dd9f2f5ee6 sample

Test run with 100 hashes:

$time yara -d ext_hash="bb1ab80641f80fdd0e6258a032a8e9dd9f2f5ee6" ruleset.yar sample
real    0m0.005s
user    0m0.001s
sys     0m0.002s

Your whitelist rule would now look like this:

$ cat ruleset.yar
rule whitelist {
condition:
  ext_hash != "1e6f6dcbc28d0fdcd01d49a71a90d0e2e447c96a" and
  ext_hash != "f0d3ad63910d8d3051bb4dd7af8513652259d796" and
  ext_hash != "47e000551379fe895e4f13a34ea7eb77db5439e2" and

...

gwillem · 2017-01-06T12:10:20Z

Thanks! That's what I do now in Python, but its 25% slower than Yara's built-in hash module (for single hashes, that is).

with open(path, 'rb') as fh:
  data = fh.read()
hash = hashlib.sha1(data).hexdigest()
if hash in whitelist:
 return False
rules.match(data=data)

So I was hoping it would be possible that Yara would memoize the results of its hash lookups (or another cache mechanism).

plusvic · 2017-01-09T12:22:01Z

@gwillem YARA 3.5 doesn't cache the hash results, but the latest version in master does it.

gwillem · 2017-01-09T12:30:27Z

Awesome, thank you!

jvoisin · 2017-01-18T12:56:30Z

Does someone have some benchmark to share about this?
For the reccord, the commit implementing this behaviour is 22ce5e0.

gwillem changed the title ~~Performance & whitelisting~~ Performance & hash based whitelisting Jan 6, 2017

plusvic closed this as completed Jan 9, 2017

gwillem mentioned this issue Jan 17, 2017

40 times speed optimization jvoisin/php-malware-finder#45

Closed

nikhilh-20 mentioned this issue Aug 10, 2022

Iterating over constant strings in yara conditions block #1765

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance & hash based whitelisting #592

Performance & hash based whitelisting #592

gwillem commented Jan 6, 2017

aschuster99 commented Jan 6, 2017 •

edited

Loading

gwillem commented Jan 6, 2017 •

edited

Loading

plusvic commented Jan 9, 2017

gwillem commented Jan 9, 2017

jvoisin commented Jan 18, 2017 •

edited

Loading

Performance & hash based whitelisting #592

Performance & hash based whitelisting #592

Comments

gwillem commented Jan 6, 2017

aschuster99 commented Jan 6, 2017 • edited Loading

gwillem commented Jan 6, 2017 • edited Loading

plusvic commented Jan 9, 2017

gwillem commented Jan 9, 2017

jvoisin commented Jan 18, 2017 • edited Loading

aschuster99 commented Jan 6, 2017 •

edited

Loading

gwillem commented Jan 6, 2017 •

edited

Loading

jvoisin commented Jan 18, 2017 •

edited

Loading