Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Skip digest of empty payload (or static whitelist) #10
Originally reported by gryphius.
We're seing a lot of pyzor "false positives" from messages with attachments but little or no body text. these messages are all different but generate the same digest da39a3ee5e6b4b0d3255bfef95601890afd80709, which is the sha1-sum of the empty string . It looks like this is is the digest produced if all content is stripped out by the pyzor normalizer.
current public.pyzor.org result for this hash:
This would be especially helpful in spamassassin setups, where only the hitcount is checked ( https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6108 )
Attached is a quick & dirty patch we're using to skip this hash.
I was under the impression that SpamAssassin checks the exit code of the pyzor check. Which is:
These thresholds are also now configurable as of 0.6 from the configuration file:
Comment by gryphius:
Even if SA checks the exit code, this doesn't solve the problem for this special digest. Hash collisions like this cause problems in every case: checking whitelist could cause false negatives, not checking whitelist causes false postivies - this is what I see in our setups.
Maybe the real fix would be changing the hashing algorithm so that empty strings (after stripping) are ignored and hash the full unstripped message or at least try to get some unique header value for building the hash?