This project demonstrates how to classify emails as spam or ham (not spam) using the Naive Bayes algorithm.
It is a beginner-friendly project that shows the core ideas of Natural Language Processing (NLP) and Machine Learning using simple Python libraries.
The notebook processes text data using TF-IDF and applies the Multinomial Naive Bayes classifier to detect spam emails effectively.
- Learn how Naive Bayes works for text classification.
- Preprocess text data using TF-IDF (Term FrequencyβInverse Document Frequency).
- Build a spam classifier using Scikit-learn.
- Evaluate model accuracy and visualize confusion matrix.
- Test the model on unseen text samples.
A small demo dataset is included directly inside the notebook for simplicity.
It contains 10 example messages labeled as βspamβ or βham.β
You can replace it with the UCI SMS Spam Collection Dataset for real-world training.
| Step | Description |
|---|---|
| 1οΈβ£ | Import Libraries β Load scikit-learn, pandas, and visualization tools. |
| 2οΈβ£ | Load Dataset β Create or import a labeled dataset of messages. |
| 3οΈβ£ | Preprocess Text β Convert messages into numerical features using TF-IDF. |
| 4οΈβ£ | Train Model β Apply Multinomial Naive Bayes to learn patterns. |
| 5οΈβ£ | Evaluate Model β Check accuracy, confusion matrix, and classification report. |
| 6οΈβ£ | Predict New Messages β Test the model on new unseen inputs. |
The Naive Bayes algorithm applies Bayesβ Theorem with the assumption that features are independent.
It works well for text because word frequencies are often treated as independent features.
Bayesβ Theorem: [ P(A|B) = \frac{P(B|A) * P(A)}{P(B)} ]
In this project:
- ( A ): Email is spam
- ( B ): Words in the message
β
Model Accuracy: ~90β100% (on sample data)
β
Successfully predicts unseen messages like:
"You have won a free gift card worth $500!" β SPAM
"Are you coming to the meeting tomorrow?" β HAM
Confusion Matrix and Classification Report are included in the notebook.
- Python 3.x
- Pandas
- NumPy
- Scikit-learn
- Matplotlib / Seaborn
- Clone this repository
git clone https://github.com/asimsheikh-coder/spam-email-detector-using-naive-bayes.git cd spam-email-detector-using-naive-bayes - Install dependencies
pip install pandas numpy scikit-learn matplotlib seaborn
- Run the notebook
jupyter notebook Spam_Email_Detector_Naive_Bayes.ipynb
Asim Sheikh
12th Grade Student | Aspiring AI Engineer
π§ Email: asimusmansheikh@gmail.com
π GitHub: @asimsheikh-coder
Sheikh, A. Spam Email Detector using Naive Bayes. 2025. GitHub Repository.
https://github.com/asimsheikh-coder/spam-email-detector-using-naive-bayes
This project introduces the basics of text classification using Naive Bayes β a fundamental concept in Natural Language Processing.
It serves as an ideal starting point for beginners exploring AI, NLP, and Machine Learning.