Spam-Detection-Text-Processing

This repository contains a Jupyter notebook pipeline for Mail spam detection using classical text preprocessing + feature extraction + classification.

spam_NLP.csv — Original Mail messages labelled “spam” or “ham”.
spam_NLP_cleaned.csv — Cleaned version of the text (lowercasing, punctuation removed, etc.).
TEST_DATA_spam.csv — Smaller dataset used for quick testing/validation.
main.ipynb — Main notebook: loads data, cleans, extracts features, trains classifier, evaluates.
logic.ipynb — Supporting notebook for experiments / exploring alternative preprocessing and feature setups.

Features & What It Does

Text preprocessing: tokenization, lower-case conversion, stopword removal, punctuation removal, possibly more cleaning steps (stemming / lemmatization if included).
Feature extraction:
- Bag of Words (count vectorization)
- TF-IDF vectorization
Classification:
- Multinomial Naive Bayes classifier
Evaluation:
- Accuracy
- Precision, Recall, F1-score
- Confusion matrix

Requirements

Python 3.8+ (ideally)
Required packages:
- pandas
- numpy
- scikit-learn
- matplotlib

Dataset

The dataset (spam_NLP.csv, spam_NLP_cleaned.csv, TEST_DATA_spam.csv) contains email messages labeled as spam or ham.
It is a processed version of a mail spam dataset (5,796 rows).

Run

Clone the repository

   git clone https://github.com/Splendorius/Spam-Detection-Text-Processing.git
   cd Spam-Detection-Text-Processing

Install dependencies

pip install pandas numpy scikit-learn matplotlib

Launch Jupyter Notebook

jupyter notebook

Open the notebook main.ipynb and run the cells top-to-bottom.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam-Detection-Text-Processing

Contents

Features & What It Does

Requirements

Dataset

Run

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
README.md		README.md
TEST_DATA_spam.csv		TEST_DATA_spam.csv
logic.ipynb		logic.ipynb
main.ipynb		main.ipynb
spam_NLP.csv		spam_NLP.csv
spam_NLP_cleaned.csv		spam_NLP_cleaned.csv

License

Splendorius/Spam-Detection-Text-Processing

Folders and files

Latest commit

History

Repository files navigation

Spam-Detection-Text-Processing

Contents

Features & What It Does

Requirements

Dataset

Run

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages