Skip to content

Would you like to know which e-mail is spam and which is ham?

Notifications You must be signed in to change notification settings

MelihGulum/Email-Spam-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Email Spam Detection

We all receive a lot of emails in our daily life. Some emails are also very meaningless and irrelevant. We call such emails "spam". So, would you like to know which e-mail is spam and which is ham?

DATASET

Dataset consist of two classes. These are "ham" and "spam". We have 4825 ham data and 747 spam data. The dataset is heavily unbalanced.

The following two figures show WordCloud representation for spam and ham.

TRAINING

We have trained the data set with the machine learning algorithms.

  • Naive Bayes
  • Support Vector Machine
  • KNN
  • Decision Tree
  • Random Forest

Below, for each algorithm you can see the accuracy.

You can also do your predicts for each algorithm or you can choose one for prediction.

MultinomialNB() This is a Real email 

SVC(C=1000, gamma=0.001) This is a Real email 

KNeighborsClassifier(n_neighbors=3) This is a Real email 

DecisionTreeClassifier() This is a Real email 

RandomForestClassifier() This is a Real email