Skip to content

The objective of the project was to build a model that can accurately differentiate spam messages from ham ones. In doing so, I tried to showcase the ability of naive-bayes algorithm to accurately predict the mesage as spam or ham.

Notifications You must be signed in to change notification settings

S-B-Iqbal/spam-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Spam Filter

Identifying spam messages from ham using naive-bayes classification.

1. Introduction

The current project demonstrates the use of naïve-bayes algorithm to filter spam from ham messages. 

2. Data Collection

 URL : http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/

 Source : Gómez JM, Almeida TA, Yamakami A. On the validity of
 a new SMS spam collection. Proceedings of the 11th IEEE International
  Conference on Machine Learning and Applications. 2012

3. Implementation

1. Step-1 : Exploring and preparing the data Data is loaded into R atmosphere from the csv file. 2. Step-2 : Cleaning and standardizing the data Popular “tm” package is used for cleaning and standardizing the data. This includes creating an SMS corpus. Then, the following cleaning process are executed:
            i.	Transformation to lower case.
            ii.	Removal of numbers.
            iii.	Removal of stop words such as ‘to’, ‘and’, ‘but’ and ‘or’ using stopWords() function. removeWords() contains a
            list of ‘stop words’.
            iv.	Removal of punctuation.
            v.	Stemming i.e., transforming words into base form. Ex: learning, learned, learns to learn using the “SnowBallC” package.
  1. Step-3 : Training a model on the data.
  2. Step-4 : Evaluation of model performance
4. Summary
 The use of naïve-bayes classifier is demonstrated for text classification. 
 The text data was prepared for analysis by using specialized R packages for Text processing and visualization. 
 Finally, the model was able to classify 98 percent of all the messages correctly as spam and ham.

About

The objective of the project was to build a model that can accurately differentiate spam messages from ham ones. In doing so, I tried to showcase the ability of naive-bayes algorithm to accurately predict the mesage as spam or ham.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages