RFC: Message Anti-Spam Module

We are currently planning a new module to combat spam messages in Odoo. The plan at the moment is to implement a basic backpropagation algorithm [similar to this](http://radimrehurek.com/data_science_python/).

The algorithm will be trained initially using some open source datasets like [Enron](http://www.aueb.gr/users/ion/data/enron-spam/), [SpamAssassin](https://spamassassin.apache.org/publiccorpus/), [LingSpam](http://csmining.org/index.php/ling-spam-datasets.html), etc. 

It will create a private Junk channel for each user, automatically filtering identified spam to it. Any item moved out of junk would be marked as a false positive, and anything moved into junk would be considered a false negative. TBD how this will play into the learning algorithm.

Messages would be evaluated in the `create` method of `mail.message`.

Spam scanning will need to be strictly opt-in. I'm thinking it makes sense to put this at the model level; something similar to the [`ir.model.website_form_access`](https://github.com/odoo/odoo/blob/10.0/addons/website_form/models/models.py#L19) used for forms, with the `mail.message.res_model` dictating the model.

Challenges I currently foresee:
* The obvious - training the algorithm
* Support of multiple languages
* Handling of spam not directed to specific users (projects, help desk, blog, crm)
* Storage and subsequent search of data. [Looks like pickle may work though](http://scikit-learn.org/stable/modules/model_persistence.html), but with some challenges in terms of upgrades - we will need to add versioning, re-training, and a/b testing.

Nice to haves:
* Model training on a user level (Cindy may like spam from `CompanyA`, but John might not) - this will require training the SVM in an incremental manor, instead of batch.
  * This is not so easy in `sklearn`, and we will be limited to linear case using Stochastic Gradient Descent (SGD). Would need to use something like [`pagasos`](https://github.com/ejlb/pegasos) instead
* Client/Server model, allowing for centralized & offloaded training

Additional reading for anyone interested:
* http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.455.7614&rep=rep1&type=pdf
* http://trec.nist.gov/pubs/trec15/papers/hit.spam.final.final.pdf
* http://ats.cs.ut.ee/u/kt/hw/spam/spam.pdf
* http://cse-wiki.unl.edu/wiki/index.php/Machine_Learning_for_Email_Spam_Filtering
* http://airccse.org/journal/jcsit/0211ijcsit12.pdf
* http://www.wseas.us/e-library/conferences/2014/Florence/CSCCA/CSCCA-23.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Message Anti-Spam Module #121

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

RFC: Message Anti-Spam Module #121

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions