Bayesian spam filter for activitystrea.ms data
JavaScript CSS
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
lib
models
public
routes
scripts
views
.gitignore
LICENSE
README
app.js
config.js.sample
package.json

README

This is an experimental server for filtering Activity Streams (http://activitystrea.ms/) data for spam.

Apache 2.0 license.

More or less copying "a plan for spam" filtering, but make pseudo-tokens for activity streams fields. So something like:

     { id: "urn:uuid:7e4ed55a-2b99-48c8-a274-42819b2ddd39",
       url: "http://example.net/status/35",
       published: "2011-09-23T10:49:00Z",
       actor: { displayName: "John Smith",
       	      	id: "urn:uuid:bff0ecdd-a944-4d92-aed3-d6af8f13d610",
		url: "http://example.net/status/johnsmith" },
       verb: "post",
       object: { id: "urn:uuid:81e43564-c66f-40c5-878b-733275229521",
       	       	 type: "note",
		 content: "<a href='http://example.com/viagra-spam'>Buy Viagra Now!</a>" } }

Would tokenize as:

      id=urn:uuid:7e4ed55a-2b99-48c8-a274-42819b2ddd39
      url=http://example.net/status/35
      published=2011-09-23T10:49:00Z
      actor.displayName=John-Smith
      actor.id=urn:uuid:bff0ecdd-a944-4d92-aed3-d6af8f13d610
      actor.url=http://example.net/status/johnsmith
      verb=post
      object.id=urn:uuid:81e43564-c66f-40c5-878b-733275229521
      object.type=note
      a
      href
      http://example.com/viagra-spam
      Buy
      Viagra
      Now
      a

There may be some value in grabbing the domains of URLs (example.com and example.net here).