# 📌 Intro Of This Notebook
#### The aim of this study is to clacify spam email by using different machine learning algorithms. For this purpose, I will use,
### * Multinomial Naive Bayes Classifier
### * Support Vecrot Machine Classifier with Radial basis function kernel (RBF)
### * k Nearest Neighbor Classifier(KNN)
### * Decision Tree Classifier,
### * Random Forest Classifier.
#### I will give a short description about those algorithms.

# 📑 About the Dataset:
### - The used dataset is a CSV file.
### - It contains 5573 individual emails.
### - Each email has classified by Ham or Spam.


# 📥 Download Email Datasets:
## Here is Dataset Download link: 👇
### - From kaggle => "https://www.kaggle.com/ashfakyeafi/spam-email-classification"
### - Form my Github => "https://github.com/AshfakYeafi/Spam-Email-Classifier/blob/main/emai.csv"

# 📩 Importing the Libraries


In [None]:
# linear algebra
import numpy as np 

# data processing
import pandas as pd 

# data visualization
import seaborn as sns
%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import style
from sklearn.metrics import confusion_matrix

# Algorithms
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB,MultinomialNB,BernoulliNB
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline


# 💻 Load and Read DataSets


In [None]:
#read the CSV file
df=pd.read_csv("../input/spam-email-classification/email.csv")

In [None]:
#Print top 5 Values
df.head()

In [None]:
df.info()

# Catagorized the Dataframe

In [None]:
df['spam']=df['Category'].apply(lambda x: 1 if x=='spam' else 0)
df.head()

In [None]:
X=df['Message']
Y=df['spam']

# Split train and test data


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X,Y)

# Naive Bayes Algorithm

![HyPUqwF.png](attachment:e9b19d10-e044-4d35-8295-33152f1d3245.png)

![download (9).png](attachment:d07cee17-9c9f-4b0b-ba54-14d1f017ae67.png)
<br>
## <b>What is Naive Bayes algorithm?</b><br>
### It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
## <b> What is Naive Bayes Classifier?</b>
### Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the simplest supervised learning algorithms. Naive Bayes classifier is the fast, accurate and reliable algorithm. Naive Bayes classifiers have high accuracy and speed on large datasets.

### Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of other features. For example, a loan applicant is desirable or not depending on his/her income, previous loan and transaction history, age, and location. Even if these features are interdependent, these features are still considered independently. This assumption simplifies computation, and that's why it is considered as naive. This assumption is called class conditional independence.

<center>
<img src="https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1543836882/image_3_ijznzs.png">
</center>
<ul>
<li>P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior probability of h.</li>
<li>P(D): the probability of the data (regardless of the hypothesis). This is known as the prior probability.</li>
<li>P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.</li>
<li>P(D|h): the probability of data d given that the hypothesis h was true. This is known as posterior probability.</li>
</ul>

<br>
<br>

![5-Figure1-1.png](attachment:4730ebe9-0b2a-4c57-a0ec-efd3de366f63.png)





# Create Classifier for Naive Baised


In [None]:
#Defineing Naive Baised
clf_NaiveBaised= Pipeline([
    ('vectorizer', CountVectorizer()),
    ('nd', MultinomialNB())
])

In [None]:
#Fiting the algorithm
clf_NaiveBaised.fit(X_train,y_train)

In [None]:
#Make prediction on X_test
y_pred_NB=clf_NaiveBaised.predict(X_test)

In [None]:
conf_mat_NB=confusion_matrix(y_test, y_pred_NB)

## Plot Confusion Matrix

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(conf_mat_NB,annot=True,fmt='d')

In [None]:
naive_acc=accuracy_score(y_test,y_pred_NB)
naive_acc

# Support Vector Machine Classifier

![HyPUqwF.png](attachment:3dc7fe6f-1982-4969-ac78-421cf666765a.png)

### Support Vector Machines
Generally, Support Vector Machines is considered to be a classification approach, it but can be employed in both types of classification and regression problems. It can easily handle multiple continuous and categorical variables. SVM constructs a hyperplane in multidimensional space to separate different classes. SVM generates optimal hyperplane in an iterative manner, which is used to minimize an error. The core idea of SVM is to find a maximum marginal hyperplane(MMH) that best divides the dataset into classes.

![index3_souoaz.png](attachment:4060dd34-14a4-47b6-b4eb-5b7756120359.png)

## Support Vectors
Support vectors are the data points, which are closest to the hyperplane. These points will define the separating line better by calculating margins. These points are more relevant to the construction of the classifier.

## Hyperplane
A hyperplane is a decision plane which separates between a set of objects having different class memberships.

## Margin
A margin is a gap between the two lines on the closest class points. This is calculated as the perpendicular distance from the line to support vectors or closest points. If the margin is larger in between the classes, then it is considered a good margin, a smaller margin is a bad margin.


## How does SVM work?
The main objective is to segregate the given dataset in the best possible way. The distance between the either nearest points is known as the margin. The objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. SVM searches for the maximum marginal hyperplane in the following steps:

1. Generate hyperplanes which segregates the classes in the best way. Left-hand side figure showing three hyperplanes black, blue and orange. Here, the blue and orange have higher classification error, but the black is separating the two classes correctly.

2. Select the right hyperplane with the maximum segregation from the either nearest data points as shown in the right-hand side figure.<br>
![index2_ub1uzd.png](attachment:827ffa0e-00d6-42cb-bb22-c8a565e03fa4.png)

## SVM Kernels
The SVM algorithm is implemented in practice using a kernel. A kernel transforms an input data space into the required form. SVM uses a technique called the kernel trick. Here, the kernel takes a low-dimensional input space and transforms it into a higher dimensional space. In other words, you can say that it converts nonseparable problem to separable problems by adding more dimension to it. It is most useful in non-linear separation problem. Kernel trick helps you to build a more accurate classifier.

* <b>Linear Kernel</b>  A linear kernel can be used as normal dot product any two given observations. The product between two vectors is the sum of the multiplication of each pair of input values.
> k(x,xi)=sum(x*xi)

* <b>Radial Basis Function Kernel </b>The Radial basis function kernel is a popular kernel function commonly used in support vector machine classification. RBF can map an input space in infinite dimensional space.
> K(x,xi) = exp(-gamma * sum((x – xi^2))


## Generating Model

Let's build support vector machine model. First, import the SVM module and create support vector classifier object by passing argument kernel as the linear kernel in SVC() function.

Then, fit your model on train set using fit() and perform prediction on the test set using predict()

> #Import svm model<br>
> from sklearn import svm<br>
> 
> #Create a svm Classifier<br>
> clf = svm.SVC(kernel='linear') # Linear Kernel<br>
> 
> #Train the model using the training sets<br>
> clf.fit(X_train, y_train)<br>
> 
> #Predict the response for test dataset<br>
> y_pred = clf.predict(X_test)<br>

![Graphical-presentation-of-the-support-vector-machine-classifier-with-a-non-linear-kernel_W640.jpg](attachment:fd2106f5-6fa4-46e4-8e09-1cca1b5875a9.jpg)

# Create Classifier for Support Vector Machine

In [None]:
clf_svm= Pipeline([
    ('vectorizer', CountVectorizer()),
    ('svc', SVC(kernel="rbf",C=1000,gamma=0.001))
])

In [None]:
clf_svm.fit(X_train,y_train)

In [None]:
y_pred_SVM=clf_svm.predict(X_test)

## Plot Confusion Matrix

In [None]:
conf_mat_SVM=confusion_matrix(y_test, y_pred_SVM)

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(conf_mat_SVM,annot=True,fmt='d')

In [None]:
svm_acc=accuracy_score(y_test,y_pred_SVM)
svm_acc

# k nearest neighbor classifier

![HyPUqwF.png](attachment:8d64edb7-eae0-4b29-99e6-3ea107cf81eb.png)

![images.jpeg](attachment:bfa9b02b-a12f-40e2-9b71-eedbaa7e2546.jpeg)

<b>K-Nearest Neighbors</b>

KNN is a non-parametric and lazy learning algorithm. Non-parametric means there is no assumption for underlying data distribution. In other words, the model structure determined from the dataset. This will be very helpful in practice where most of the real world datasets do not follow mathematical theoretical assumptions. Lazy algorithm means it does not need any training data points for model generation. All training data used in the testing phase. This makes training faster and testing phase slower and costlier. Costly testing phase means time and memory. In the worst case, KNN needs more time to scan all data points and scanning all data points will require more memory for storing training data.


<b>How does the KNN algorithm work?</b>

In KNN, K is the number of nearest neighbors. The number of neighbors is the core deciding factor. K is generally an odd number if the number of classes is 2. When K=1, then the algorithm is known as the nearest neighbor algorithm. This is the simplest case. Suppose P1 is the point, for which label needs to predict. First, you find the one closest point to P1 and then the label of the nearest point assigned to P1.<br>
![Knn_k1_z96jba.png](attachment:6fd4b885-c751-4596-97e2-05ad14c46143.png)<br>
Suppose P1 is the point, for which label needs to predict. First, you find the k closest point to P1 and then classify points by majority vote of its k neighbors. Each object votes for their class and the class with the most votes is taken as the prediction. For finding closest similar points, you find the distance between points using distance measures such as Euclidean distance, Hamming distance, Manhattan distance and Minkowski distance. KNN has the following basic steps:

1. Calculate distance
2. Find closest neighbors
3. Vote for labels <br>

![KNN_final1_ibdm8a.png](attachment:2b609647-42c6-4209-a52c-e9afae728ebc.png)



# Create Classifier for KNeighborsClassifier


In [None]:
clf_knn= Pipeline([
    ('vectorizer', CountVectorizer()),
    ('knn', KNeighborsClassifier(n_neighbors=3))
])

In [None]:
clf_knn.fit(X_train,y_train)

In [None]:
y_pred_KNN=clf_knn.predict(X_test)

## Plot Confusion Matrix

In [None]:
conf_mat_KNN=confusion_matrix(y_test, y_pred_KNN)

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(conf_mat_KNN,annot=True,fmt='d')

In [None]:
knn_acc=accuracy_score(y_test,y_pred_KNN)
knn_acc

# Decision Tree Classifier
![HyPUqwF.png](attachment:4a986081-b213-43c7-acee-cf05aeefded0.png)

![maxresdefault.jpg](attachment:fb1ed5a3-2cd7-4f57-a087-78c9a3fa3fe6.jpg)


<b>Decision Tree Algorithm</b>

A decision tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The topmost node in a decision tree is known as the root node. It learns to partition on the basis of the attribute value. It partitions the tree in recursively manner call recursive partitioning. This flowchart-like structure helps you in decision making. It's visualization like a flowchart diagram which easily mimics the human level thinking. That is why decision trees are easy to understand and interpret.

![1_r5ikdb.jpeg](attachment:8ac5e896-41a7-4176-aa63-ea570131edda.jpeg)

Decision Tree is a white box type of ML algorithm. It shares internal decision-making logic, which is not available in the black box type of algorithms such as Neural Network. Its training time is faster compared to the neural network algorithm. The time complexity of decision trees is a function of the number of records and number of attributes in the given data. The decision tree is a distribution-free or non-parametric method, which does not depend upon probability distribution assumptions. Decision trees can handle high dimensional data with good accuracy.


![2_btay8n.jpeg](attachment:56fe881f-59a7-4bac-b82f-4147e2061335.jpeg)

<b>Entropy</b>


A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one.

![Entropy.png](attachment:2a5ea16a-47b2-4ae8-b4c3-c27607a726f6.png)



# Create Classifier for DecisionTreeClassifier


In [None]:
clf_DecisionTree= Pipeline([
    ('vectorizer', CountVectorizer()),
    ('dt',DecisionTreeClassifier())
])

In [None]:
clf_DecisionTree.fit(X_train,y_train)

In [None]:
y_pred_DT=clf_DecisionTree.predict(X_test)

## Plot Confusion Matrix

In [None]:
conf_mat_DT=confusion_matrix(y_test, y_pred_DT)

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(conf_mat_DT,annot=True,fmt='d')

In [None]:
dt_acc=accuracy_score(y_test,y_pred_DT)
dt_acc

# Random Forest Classifier

![HyPUqwF.png](attachment:79fece57-b42c-4000-99ec-e0985995a7ce.png)

<b>What is a random forest?</b>

A random forest is a machine learning technique that’s used to solve regression and classification problems. It utilizes ensemble learning, which is a technique that combines many classifiers to provide solutions to complex problems.

A random forest algorithm consists of many decision trees. The ‘forest’ generated by the random forest algorithm is trained through bagging or bootstrap aggregating. Bagging is an ensemble meta-algorithm that improves the accuracy of machine learning algorithms.

The (random forest) algorithm establishes the outcome based on the predictions of the decision trees. It predicts by taking the average or mean of the output from various trees. Increasing the number of trees increases the precision of the outcome.

A random forest eradicates the limitations of a decision tree algorithm. It reduces the overfitting of datasets and increases precision.

## Features of a Random Forest Algorithm
* It’s more accurate than the decision tree algorithm.
* It provides an effective way of handling missing data.
* It can produce a reasonable prediction without hyper-parameter tuning.
* It solves the issue of overfitting in decision trees.
* In every random forest tree, a subset of features is selected randomly at the node’s splitting point.

![decision-tree-nodes.png](attachment:ba1bd596-5363-4f7b-b6ed-dc6521c69cf8.png)

## Classification in random forests

Classification in random forests employs an ensemble methodology to attain the outcome. The training data is fed to train various decision trees. This dataset consists of observations and features that will be selected randomly during the splitting of nodes.

A rain forest system relies on various decision trees. Every decision tree consists of decision nodes, leaf nodes, and a root node. The leaf node of each tree is the final output produced by that specific decision tree. The selection of the final output follows the majority-voting system. In this case, the output chosen by the majority of the decision trees becomes the final output of the rain forest system. The diagram below shows a simple random forest classifier.

![random-forest-classifier.png](attachment:a4639a70-0952-4d92-8e17-d22f57ce0466.png)


![example-of-random-forest-classifier.png](attachment:7cc7765d-2f69-4702-b3ee-8d85dfed1950.png)

# Create Classifier for Random Forest

In [None]:
clf_rf= Pipeline([
    ('vectorizer', CountVectorizer()),
    ('rf', RandomForestClassifier(n_estimators=100))
])

In [None]:
clf_rf.fit(X_train,y_train)

In [None]:
y_pred_RF=clf_rf.predict(X_test)

## Plot Confusion Matrix

In [None]:
conf_mat_RF=confusion_matrix(y_test, y_pred_RF)

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(conf_mat_RF,annot=True,fmt='d')

In [None]:
rf_acc=accuracy_score(y_test,y_pred_RF)
rf_acc

# We Have finished basic concept of our algorithm

![hedgehog-jumping-cartoon-happy-hurray-porcupine_125446-345.jpg](attachment:1e709e5e-5109-4c67-a820-f176192f548b.jpg)

# Compareing Accrucy of those algorithm

In [None]:
menMeans = np.array([naive_acc,svm_acc,knn_acc,dt_acc,rf_acc])*100
ind = ['Naive Bayes','SVM','KNN','DT','Random Forest']
fig, ax = plt.subplots(figsize = (20,8))
ax.bar(ind,menMeans,width=0.3,color ='red')
for index,data in enumerate(menMeans):
    plt.text(x=index , y =data+1 , s="{:.2f}".format(data) , fontdict=dict(fontsize=20))
plt.tight_layout()
plt.show()

# Fun Fact
## Lets test our model with custom email

In [None]:
#Function for testing custome email
def spam_dect(clf,txt):
    a=clf.predict([txt])
    if a==1:
        print("This is a Spam email")
    else:
        print("This is a Real email")

In [None]:
#Demo email
test_email_1="Upto 20% discount on parking, exclusive offer just for you. Dont miss this reward!" #Spam Email from my mail box
test_email_2="Hey Ashfak, can we get together to watch footbal game tomorrow?"   #Real Email from my mail box

In [None]:
#Predict with Naive Bayes
spam_dect(clf_NaiveBaised,test_email_1)

In [None]:
#Predict with Naive Bayes
spam_dect(clf_NaiveBaised,test_email_2)

![Thankyouforreading.gif](attachment:a96ef779-c7e6-450e-9ada-cc703f6cceaf.gif)