Skip to content

harshdarji23/Sms-spam-detector

Repository files navigation

Sms-spam-detector

Project Description:

In this repository, I have developed an SMS spam detector using logistic regression and pySpark. The main goal is to predict whether an SMS text is a spam or not, this was one of the first use cases of data science and is still widely used to filter emails.

Libraries Required:

  1. PySpark
  2. Sql
  3. Pandas
  4. Numpy
  5. Matplotlib
  6. Spark ML

Data Descripton:

The .csv file contains message(text) as well as it's spam or not(type)

Modeling:

  1. Feature Engineering: Tokenizer, CountVectorizer, Tfidf, has_uppercase
  2. ML: Logistic Regression with cross validation
  3. Feature Importance: Most poitive/ negative words

Blog:

Read the entire article on Medium: https://towardsdatascience.com/sms-spam-detector-499f31515f14

About

Build the SMS spam detector using PySpark and Logistic Regression and NLP techniques like Tokenization, CountVectorizer, Tfidf

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published