This project aims to tackle the prevalent issue of spam messages in the telecommunications sector, specifically for Connect5G. Utilizing machine learning techniques, the goal was to create an efficient system capable of differentiating between spam and genuine messages, enhancing the user experience without compromising privacy.
Connect5G, despite offering premium services, faced challenges with rising customer dissatisfaction due to spam. The objective of this project was to devise a machine learning solution to automate the classification of messages as spam or ham (legitimate), thus ensuring a seamless communication experience for users.
- Data Preparation: Involves cleaning, exploring, and tokenizing a comprehensive dataset of spam and genuine messages.
- Modeling Techniques: Employing K-Nearest Neighbors (KNN) and Decision Tree classifiers to predict and classify messages. The imbalance in dataset was addressed using Synthetic Minority Over-sampling Technique (SMOTE).
- The implementation of SMOTE significantly improved model balance and accuracy.
- Decision Tree classifiers, when combined with SMOTE, offered rapid prediction times suitable for real-time spam detection.
This project underscores the potential of machine learning in solving real-world problems, specifically in improving digital communication security. It highlights the necessity of continuous model monitoring and updating to adapt to evolving spam techniques.
Special thanks to RMIT University and Connect5G for the opportunity to work on this impactful project.