Skip to content

This is a Spam/Ham detector using Naive Bayes classifier implemented from scratch in Python3. It is currently trained on Enron dataset.

License

Notifications You must be signed in to change notification settings

ahmadshafique/Spam-Detector

Repository files navigation

Spam Detector

This is a Spam/Ham detector using Naive Bayes classifier implemented from scratch in Python3.

This is a text classification problem. Naive Bayes makes two assumptions:

  • bag of words assumption which assumes that positions do not matter.
  • conditional independence which assumes that feature probabilities are independent for a given class (e.g. spam/ham).

The following image shows the Naive Bayes Algorithm for training and testing text classification:


At the end, classification performance report is generated showing confusion matrix, accuracy, precision, recall and f1-score. It is currently trained on Enron dataset. However, it can be trained on any other email dataset by changing respective paths.


Usage 🔧

Program requires paths to train and test folders which further contain spam and ham folders having respective files to make datasets.

In Spam Ham Email Classification.ipynb, cell#5 contains the following code:

makeDatasets('train/spam', 'train/ham', 'test/spam', 'test/ham')

These are the paths to dataset files. Change these paths to train on any other dataset.


Author 👋

You can get in touch with me on my LinkedIn Profile:

Ahmad Shafique

LinkedIn Link

You can also follow my GitHub Profile to stay updated about my latest projects: GitHub Follow

If you liked the repo then please support it by giving it a star ⭐!


Contributions Welcome ✨

forthebadge

If you find any bug in the code or have any improvements in mind then feel free to generate a pull request.


License 📄

MIT

Copyright (c) 2020, Ahmad Shafique

About

This is a Spam/Ham detector using Naive Bayes classifier implemented from scratch in Python3. It is currently trained on Enron dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published