Skip to content
This repository was archived by the owner on Sep 19, 2025. It is now read-only.

Splendorius/Spam-Detection-Text-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam-Detection-Text-Processing

This repository contains a Jupyter notebook pipeline for Mail spam detection using classical text preprocessing + feature extraction + classification.


Contents

  • spam_NLP.csv — Original Mail messages labelled “spam” or “ham”.
  • spam_NLP_cleaned.csv — Cleaned version of the text (lowercasing, punctuation removed, etc.).
  • TEST_DATA_spam.csv — Smaller dataset used for quick testing/validation.
  • main.ipynb — Main notebook: loads data, cleans, extracts features, trains classifier, evaluates.
  • logic.ipynb — Supporting notebook for experiments / exploring alternative preprocessing and feature setups.

Features & What It Does

  • Text preprocessing: tokenization, lower-case conversion, stopword removal, punctuation removal, possibly more cleaning steps (stemming / lemmatization if included).
  • Feature extraction:
    • Bag of Words (count vectorization)
    • TF-IDF vectorization
  • Classification:
    • Multinomial Naive Bayes classifier
  • Evaluation:
    • Accuracy
    • Precision, Recall, F1-score
    • Confusion matrix

Requirements

  • Python 3.8+ (ideally)
  • Required packages:
    • pandas
    • numpy
    • scikit-learn
    • matplotlib

Dataset

The dataset (spam_NLP.csv, spam_NLP_cleaned.csv, TEST_DATA_spam.csv) contains email messages labeled as spam or ham.
It is a processed version of a mail spam dataset (5,796 rows).


Run

  1. Clone the repository
   git clone https://github.com/Splendorius/Spam-Detection-Text-Processing.git
   cd Spam-Detection-Text-Processing
  1. Install dependencies
pip install pandas numpy scikit-learn matplotlib
  1. Launch Jupyter Notebook
jupyter notebook
  1. Open the notebook main.ipynb and run the cells top-to-bottom.

About

Jupyter with spam detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published