New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip digest of empty payload (or static whitelist) #10

Closed
alexkiro opened this Issue Jun 6, 2014 · 3 comments

Comments

Projects
None yet
1 participant
@alexkiro
Contributor

alexkiro commented Jun 6, 2014

Originally reported by gryphius.

We're seing a lot of pyzor "false positives" from messages with attachments but little or no body text. these messages are all different but generate the same digest da39a3ee5e6b4b0d3255bfef95601890afd80709, which is the sha1-sum of the empty string . It looks like this is is the digest produced if all content is stripped out by the pyzor normalizer.

current public.pyzor.org result for this hash:
public.pyzor.org:24441 (200, 'OK') 159015 5706
pyzord could maybe treat this special hash as statically whitelisted (whithout the need to have clients submit this hash into the whitelist first) and always return a zero hitcount.

This would be especially helpful in spamassassin setups, where only the hitcount is checked ( https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6108 )
if hardcoding this hash is not an option, you could maybe add a config option to read a static whitelist from a file.

Attached is a quick & dirty patch we're using to skip this hash.

@alexkiro

This comment has been minimized.

Contributor

alexkiro commented Jun 6, 2014

I was under the impression that SpamAssassin checks the exit code of the pyzor check. Which is:

  • 1 if the report count is 0 or the whitelist count is > 0
  • 0 if the report count is > 0 and the whitelist count is 0

These thresholds are also now configurable as of 0.6 from the configuration file:

## These options specify the threshold for number of reports/whitelists. 
## According to these thresholds the pyzor client exit code will differ.
# ReportThreshold = 0
# WhitelistThreshold = 0
@alexkiro

This comment has been minimized.

Contributor

alexkiro commented Jun 6, 2014

Comment by gryphius:

Even if SA checks the exit code, this doesn't solve the problem for this special digest. Hash collisions like this cause problems in every case: checking whitelist could cause false negatives, not checking whitelist causes false postivies - this is what I see in our setups.

Maybe the real fix would be changing the hashing algorithm so that empty strings (after stripping) are ignored and hash the full unstripped message or at least try to get some unique header value for building the hash?

gryphius added a commit to gryphius/pyzor that referenced this issue Jul 3, 2014

@alexkiro alexkiro added this to the 1.0 milestone Sep 18, 2014

@alexkiro alexkiro self-assigned this Sep 18, 2014

@alexkiro

This comment has been minimized.

Contributor

alexkiro commented Sep 18, 2014

Let's solve this with the local static whitelist in 1.0. We'll also need to add a command that adds remove digests from the list. It should accept messages in various formats (including digests) just like the normal whitelist/report commands.

alexkiro added a commit that referenced this issue Nov 5, 2014

Refs. #10. Add ability to skip digest locally in the pyzor client scr…
…ipt.

A new file is available `LocalWhitelist` through the command line
or the configuration file. The file can contain a pyzor digest
that will always have 0 count and 0 whitelist and skips contacting
the actuall server.

alexkiro added a commit that referenced this issue Nov 5, 2014

Refs. #10. Add commands to add and remove from the local whitelist.
 - `pyzor local_whitelist` will add to the local_whitelist, an error
    is shown if the message is already whitelisted.
 - `pyzor local_unwhitelist` will remove from the local_whitelist an
    error is show if the message is not whitelisted.

These commands can receive messages in all the different style that
are currently supported for all other methods.

alexkiro added a commit that referenced this issue Nov 6, 2014

@alexkiro alexkiro closed this Dec 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment