This code helps you classify whether a message is spam or ham.
You can install Conda for python which resolves all the dependencies for machine learning.
Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes' theorem to calculate a probability that an email is or is not spam.
Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s.
- Extracted occurances of spam and ham.
- Stored the messages in a list
- Ignored some spam messages since there was an encoding issue. Used try/catch technique for that
To run the code, type python spam.py
python spam.py