New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip digest of empty payload (or static whitelist) #10
Comments
I was under the impression that SpamAssassin checks the exit code of the pyzor check. Which is:
These thresholds are also now configurable as of 0.6 from the configuration file:
|
Comment by gryphius: Even if SA checks the exit code, this doesn't solve the problem for this special digest. Hash collisions like this cause problems in every case: checking whitelist could cause false negatives, not checking whitelist causes false postivies - this is what I see in our setups. Maybe the real fix would be changing the hashing algorithm so that empty strings (after stripping) are ignored and hash the full unstripped message or at least try to get some unique header value for building the hash? |
Let's solve this with the local static whitelist in 1.0. We'll also need to add a command that adds remove digests from the list. It should accept messages in various formats (including digests) just like the normal whitelist/report commands. |
…ipt. A new file is available `LocalWhitelist` through the command line or the configuration file. The file can contain a pyzor digest that will always have 0 count and 0 whitelist and skips contacting the actuall server.
- `pyzor local_whitelist` will add to the local_whitelist, an error is shown if the message is already whitelisted. - `pyzor local_unwhitelist` will remove from the local_whitelist an error is show if the message is not whitelisted. These commands can receive messages in all the different style that are currently supported for all other methods.
Originally reported by gryphius.
We're seing a lot of pyzor "false positives" from messages with attachments but little or no body text. these messages are all different but generate the same digest da39a3ee5e6b4b0d3255bfef95601890afd80709, which is the sha1-sum of the empty string . It looks like this is is the digest produced if all content is stripped out by the pyzor normalizer.
current public.pyzor.org result for this hash:
public.pyzor.org:24441 (200, 'OK') 159015 5706
pyzord could maybe treat this special hash as statically whitelisted (whithout the need to have clients submit this hash into the whitelist first) and always return a zero hitcount.
This would be especially helpful in spamassassin setups, where only the hitcount is checked ( https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6108 )
if hardcoding this hash is not an option, you could maybe add a config option to read a static whitelist from a file.
Attached is a quick & dirty patch we're using to skip this hash.
The text was updated successfully, but these errors were encountered: