Bayes Theorem Spam Filter

This is a simple application of Bayes' Theorem for a spam filter classifier

Bayes' Theorem

The Bayes' Theoreom is at the base of conditional probability and is defined as:

Where:

is the posterior probability: what we are trying to estimate.
is the likelihood: a conditional probability that can be found from data we can obtain from some process.
is the prior probability: the probability we already know and is being updated in the posterior probability.
is the evidence: the new piece of data that we are taking in consideration to update the posterior probability.

Note that the notations 'h' and 'D' could be anything but in the context of machine learning they are usually chosen to indicate hypothesis and Data.

For the spam filter classifier the Bayes' Theorem becomes:

Here our hypothesis is the occurrance of a word in spams and hams ( ), and the data is each word in a given email ( ). We are trying to find the probability of the hypothesis given the data ( ) multiplying the probability of the data given the hypothesis ( ) by the probability of the hypothesis ( ). The probability of the data given the hypothesis ( ) is the bit we can 'train' with our dataset in the classifier and the probability of the hypothesis ( ) is the one we assume, for both cases spam and ham, and compare the resulting probabilities to give a final classification for a new message.

Note that the denominator is being ignored here. It would be the probability of a word to be contained in an email regardless of it being a spam or ham ( ). This is not taken in consideration because it is not relevant and more importantly it is just a normalization constant, which doesn't depend on the parameter.

Overview

The sample dataset provided is from this Kaggle dataset. The classifier is very basic and can be improved greatly. It is meant to demonstrate how the Bayes' Theorem is applicable to Machine Learning.

Dependencies

numpy
pandas
sklearn

Install these using pip

Usage

Type python sample_code.py to run the code.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
naive_bayes.py		naive_bayes.py
sample_code.py		sample_code.py
spam.csv		spam.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayes Theorem Spam Filter

Bayes' Theorem

Overview

Dependencies

Usage

About

Releases

Packages

Languages

FullBeardDev/BayesTheoremSpamFilter

Folders and files

Latest commit

History

Repository files navigation

Bayes Theorem Spam Filter

Bayes' Theorem

Overview

Dependencies

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages