Email-Spam-Detection-Using-Machine-Learning

Overview

This project aims to classify emails as spam or non-spam (ham) using machine learning techniques. It involves preprocessing email data, engineering features, training a classification model, and evaluating its performance.

Dataset

The dataset used in this project consists of 5,728 emails obtained from various sources. It includes both spam and non-spam (ham) emails. The dataset has been preprocessed to remove duplicates and handle missing values.

Preprocessing

Duplicates Removal: Removed duplicate emails to ensure the integrity of the dataset.
Missing Values Handling: Handled missing values to maintain data quality.
Text Preprocessing: Removed punctuation and stopwords using the Natural Language Toolkit (NLTK) library.

Feature Engineering

The text data was transformed into a matrix of token counts using the CountVectorizer from scikit-learn. This process converts the text into numerical features suitable for machine learning algorithms.

Model Training

A Multinomial Naive Bayes classifier was trained on the preprocessed and feature-engineered dataset. The model was trained to learn patterns and relationships between features and target labels (spam or non-spam).

Model Evaluation

The performance of the trained model was evaluated using various metrics such as accuracy, precision, recall, and F1-score. Additionally, confusion matrices were generated to visualize the model's performance on both the training and testing datasets.

Results

Accuracy on Training Dataset: 99.2%
Accuracy on Testing Dataset: 98.8%

Technologies Used

Python
NumPy
pandas
NLTK
scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
email_spam_detection.ipynb		email_spam_detection.ipynb
emails.csv		emails.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Email-Spam-Detection-Using-Machine-Learning

Overview

Dataset

Preprocessing

Feature Engineering

Model Training

Model Evaluation

Results

Technologies Used

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

DarshanVignesh/Email-Spam-Detection-Using-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Email-Spam-Detection-Using-Machine-Learning

Overview

Dataset

Preprocessing

Feature Engineering

Model Training

Model Evaluation

Results

Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages