Skip to content

ayush-agarwal-0502/Spam-Destroyer-NLP

Repository files navigation

Spam-Destroyer-NLP

Spam SMS/E-Mail Detection using Natural Language Processing

About the project / Summary :

Used NLP (Natural Language Processing) techniques in ML (Machine Learning) to detect whether an SMS/e-mail is spam or not spam . Used NLP techniques such as tokeniztion , lemmatization , stop words removal , punctuation removal using tools such as NLTK and regex . Used models such as Multinomial Naive Bayes and Logistic regression to achieve overall F1 Score of 0.99 . Also performed feature engineering and handcrafted features such as number of digits , email length , number of punctuations etc which further helped in predictions . Also generated wordclouds for the different prediction classes .

  • Field : NLP (Natural Language Processing)
  • Tools : NLTK , regex , scikit-learn , python
  • Concepts : tokeniztion , lemmatization , stop words , Logistic regression , naive bayes

Results :

image image

Other Visualiztions :

image image

Dataset :

https://www.kaggle.com/datasets/bagavathypriya/spam-ham-dataset (Originally taken from UCI machine learning repository ) . Note that althought the dataset says SMS , it has a significant resemblance to the E-Mail spam received also , and hance can be used to train a moel to detect spam e-mails also :)