Spam Buster 3000

About

This is a Mini-Project for SC10105 (Introduction to Data Science and Artificial Intelligence) which focuses on Spam Messages from Spam DataSet. For detailed walkthrough, please view the source code in order from:

Contributors

@dylansiew - Model training, final product, slides and script
@integr8ti0n - Video
@ruochee723 - Data Cleaning and Extraction, slides and script

Problem Definition

How can we effectively identify Spam messages with the attributes of text messages?
Which model would be the best to predict it? Or can all models be used to predict it?

Models Used

Naive Bayes
Support Vector Machine
Random Forest Classifier
Logistic Regression
Model Ensemble

Product

The ultimate spam detection tool designed to keep your inbox free of unwanted and harmful messages. With its powerful model ensemble of Naive Bayes, SVM, RFC, and Logistic Regression, Spam Buster 3000 constantly updates its dataframe with every new user input to improve accuracy and robustness over time. Its advanced algorithms allow for a comprehensive analysis of incoming data, ensuring that no spam goes unnoticed. Say goodbye to spam once and for all with Spam Buster 3000.

Conclusion

All models performed well when predicitng Spam Messages with a low false negative and false positive rate
Support Vector Machine performed the best of all 4 models (97.7% Accuracy) and there is a logistic correlation between the presence of Phone numbers and the message being Spam
Running K-Fold resampling on the models produced more accurate performance measure of the models
Model Ensemble performed better than all 4 models as it is the cummulation of the 4 models (98.4% Accuracy)
It is possible to predict Spam messages with sufficiently large datasets for the models to train on.

What did we learn from this project?

Handling imbalanced datasets using resampling methods like K-Fold
Logistic Regression, Naive Bayes, SVC and RandomForestClassifier from sklearn
Other packages such as tqdm, Figlet and Wordcloud
Collaborating using GitHub
Concepts about Accuracy, Vectorizing, and F1 Score

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
1.SMS_spam_Data_Extraction.ipynb		1.SMS_spam_Data_Extraction.ipynb
2.SMS_spam_Data_Visualization.ipynb		2.SMS_spam_Data_Visualization.ipynb
3.SMS_spam_Machine_Learning.ipynb		3.SMS_spam_Machine_Learning.ipynb
4.SMS_spam_Resampling_and_Analysis.ipynb		4.SMS_spam_Resampling_and_Analysis.ipynb
5.SMS_spam_Product.ipynb		5.SMS_spam_Product.ipynb
6.SMS_spam_Compiled.ipynb		6.SMS_spam_Compiled.ipynb
README.md		README.md
Slides.pdf		Slides.pdf
spam.csv		spam.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Buster 3000

About

Contributors

Problem Definition

Models Used

Product

Conclusion

What did we learn from this project?

References

About

Releases

Packages

Contributors 3

Languages

dylansiew/SC1015_Spam_Message_Detection

Folders and files

Latest commit

History

Repository files navigation

Spam Buster 3000

About

Contributors

Problem Definition

Models Used

Product

Conclusion

What did we learn from this project?

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages