We are currently planning a new module to combat spam messages in Odoo. The plan at the moment is to implement a basic backpropagation algorithm similar to this.
The algorithm will be trained initially using some open source datasets like Enron, SpamAssassin, LingSpam, etc.
It will create a private Junk channel for each user, automatically filtering identified spam to it. Any item moved out of junk would be marked as a false positive, and anything moved into junk would be considered a false negative. TBD how this will play into the learning algorithm.
Messages would be evaluated in the create method of mail.message.
Spam scanning will need to be strictly opt-in. I'm thinking it makes sense to put this at the model level; something similar to the ir.model.website_form_access used for forms, with the mail.message.res_model dictating the model.
Challenges I currently foresee:
- The obvious - training the algorithm
- Support of multiple languages
- Handling of spam not directed to specific users (projects, help desk, blog, crm)
- Storage and subsequent search of data. Looks like pickle may work though, but with some challenges in terms of upgrades - we will need to add versioning, re-training, and a/b testing.
Nice to haves:
- Model training on a user level (Cindy may like spam from
CompanyA, but John might not) - this will require training the SVM in an incremental manor, instead of batch.
- This is not so easy in
sklearn, and we will be limited to linear case using Stochastic Gradient Descent (SGD). Would need to use something like pagasos instead
- Client/Server model, allowing for centralized & offloaded training
Additional reading for anyone interested:
We are currently planning a new module to combat spam messages in Odoo. The plan at the moment is to implement a basic backpropagation algorithm similar to this.
The algorithm will be trained initially using some open source datasets like Enron, SpamAssassin, LingSpam, etc.
It will create a private Junk channel for each user, automatically filtering identified spam to it. Any item moved out of junk would be marked as a false positive, and anything moved into junk would be considered a false negative. TBD how this will play into the learning algorithm.
Messages would be evaluated in the
createmethod ofmail.message.Spam scanning will need to be strictly opt-in. I'm thinking it makes sense to put this at the model level; something similar to the
ir.model.website_form_accessused for forms, with themail.message.res_modeldictating the model.Challenges I currently foresee:
Nice to haves:
CompanyA, but John might not) - this will require training the SVM in an incremental manor, instead of batch.sklearn, and we will be limited to linear case using Stochastic Gradient Descent (SGD). Would need to use something likepagasosinsteadAdditional reading for anyone interested: