Welcome to the Spam Detection Model project! This repository provides a robust machine learning solution for classifying email text as "Spam" or "Not Spam (Ham)" using advanced natural language processing (NLP) techniques and a Logistic Regression classifier. The project includes a user-friendly Streamlit web application for real-time predictions and a Jupyter Notebook for experimentation and model exploration.
-
Intuitive Web Interface: Easily interact with the model using a modern Streamlit app.
-
Comprehensive Text Preprocessing: Utilizes NLTK for tokenization, stopword removal, and lemmatization to ensure high-quality input for the model.
-
Accurate Machine Learning Model: Logistic Regression trained on vectorized email data for reliable spam detection.
-
Jupyter Notebook Support:
spam_detection.ipynb
allows you to explore data, experiment with preprocessing, train models, and visualize results. -
Instant Results: Get immediate feedback on your email text classification.
-
Python 3.7 or higher
-
Required Python packages (see
requirement.txt
)
-
Clone the repository or download the source code:
git clone <repository_url>
-
Install dependencies:
pip install -r requirement.txt
-
Ensure
spam_detection_model.pkl
andvectorizer.pkl
are present in the project directory. -
Launch the Streamlit app:
streamlit run spam_detection.py
-
Enter your email text in the provided text area and click Predict to view the classification result.
-
Open
spam_detection.ipynb
in Jupyter Lab or Jupyter Notebook. -
Explore data preprocessing, model training, evaluation, and predictions step by step.
-
Modify parameters or try different ML algorithms directly in the notebook.
If you want to train the spam detection model from scratch or update it with new data, follow these steps:
-
Open
spam_detection.ipynb
. -
Load your dataset containing emails and their labels (
Spam
orHam
). -
Preprocess the text:
-
Tokenization
-
Stopword removal
-
Lemmatization
-
-
Vectorize the text using
CountVectorizer
orTfidfVectorizer
. -
Train a machine learning model, e.g., Logistic Regression, on the vectorized features.
-
Evaluate the model using metrics like accuracy, precision, recall, and F1-score.
-
Save the trained model and vectorizer using
joblib
:import joblib joblib.dump(model, 'spam_detection_model.pkl') joblib.dump(vectorizer, 'vectorizer.pkl')
-
Use the saved files in the Streamlit app for real-time predictions.
Subject: Congratulations! You have won a prize. Click here to claim your reward now.
Prediction: SPAM
-
spam_detection.py
: Streamlit application script -
spam_detection.ipynb
: Jupyter Notebook for experimentation and model development -
spam_detection_model.pkl
: Pre-trained machine learning model -
vectorizer.pkl
: Text vectorizer for feature extraction -
requirement.txt
: List of required Python packages -
Spam_detection_model/README.md
: Project documentation -
Spam_detection_model/LICENSE
: License information
This project is licensed under the MIT License. See the LICENSE for details.
-
Developed by LovishTech
-
Built with Streamlit, scikit-learn, NLTK, pandas, joblib, and Jupyter Notebook