In this repository, I have developed an SMS spam detector using logistic regression and pySpark. The main goal is to predict whether an SMS text is a spam or not, this was one of the first use cases of data science and is still widely used to filter emails.
- PySpark
- Sql
- Pandas
- Numpy
- Matplotlib
- Spark ML
The .csv file contains message(text) as well as it's spam or not(type)
- Feature Engineering: Tokenizer, CountVectorizer, Tfidf, has_uppercase
- ML: Logistic Regression with cross validation
- Feature Importance: Most poitive/ negative words
Read the entire article on Medium: https://towardsdatascience.com/sms-spam-detector-499f31515f14