 <div class="alert alert-block alert-info" style="margin-top: 20px">
      <h1 align=center>SVM AND NAIVE BAYES ALGORITHM IMPLEMENTATION</h1>
</div>

<h3>Objective</h3>

<p>
To Determine the Accuracy of the Dataframe using both Support Vector Machine (SVM) and Naive Bayes Algorithm
</p>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <h4 align=center><a name=p1> DATA PRE-PREPROCESSING FOR IMPLEMENTING THE SVM AND NAIVE BAYES ALGORITHM </a></h4>
</div>

In [1]:
#Load Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:
# Load the Dataset
drugdata=pd.read_csv('C:/Users/DELL/Documents/PythonData1200/Datasets/drugdataset.csv')
drugdata.head()

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,1,2,1,25.355,drugY
1,47,0,1,1,13.093,drugC
2,47,0,1,1,10.114,drugC
3,28,1,0,1,7.798,drugX
4,61,1,1,1,18.043,drugY


<div class="alert alert-block alert-info" style="margin-top: 20px">
    <h4 align=center><a name=p1> EXPLORING THE DATASET USING DESCRIBE() AND UNIQUE() FUNCTION </a></h4>
</div>

In [8]:
# Basic Statistical Data 
drugdata.describe()

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K
count,200.0,200.0,200.0,200.0,200.0
mean,44.315,0.48,1.09,0.515,16.084485
std,16.544315,0.500854,0.821752,0.501029,7.223956
min,15.0,0.0,0.0,0.0,6.269
25%,31.0,0.0,0.0,0.0,10.4455
50%,45.0,0.0,1.0,1.0,13.9365
75%,58.0,1.0,2.0,1.0,19.38
max,74.0,1.0,2.0,1.0,38.247


In [5]:
# Identify number of Classes (i.e. Drug)
drugdata.Drug.unique()

array(['drugY', 'drugC', 'drugX', 'drugA', 'drugB'], dtype=object)

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <h4 align=center><a name=p1> VALIDATING THE KEY MEASURES OF THE DATASET </a></h4>
</div>

In [7]:
# Calculate the Mean for each variable of the Dataset
drugdata.describe().loc[['mean']]

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K
mean,44.315,0.48,1.09,0.515,16.084485


In [9]:
# Calculate the Percentiles of the Dataset
drugdata.describe().loc[['25%','50%','75%']]

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K
25%,31.0,0.0,0.0,0.0,10.4455
50%,45.0,0.0,1.0,1.0,13.9365
75%,58.0,1.0,2.0,1.0,19.38


In [8]:
# Calculate the Covariance Matrix
cov_matrix = drugdata.cov()
print(cov_matrix)

                    Age       Sex        BP  Cholesterol    Na_to_K
Age          273.714347 -0.845427 -0.737035     0.565603  -7.543752
Sex           -0.845427  0.250854 -0.003216    -0.002211   0.452299
BP            -0.737035 -0.003216  0.675276    -0.056633   0.886358
Cholesterol    0.565603 -0.002211 -0.056633     0.251030  -0.036196
Na_to_K       -7.543752  0.452299  0.886358    -0.036196  52.185533


<div class="alert alert-block alert-info" style="margin-top: 20px">
    <h4 align=center><a name=p1> BUILDING THE SVM AND NAIVE BAYES ALGORITHM </a></h4>
</div>

In [10]:
# Create x and y variables
x=drugdata.drop('Drug', axis=1).to_numpy()
y=drugdata['Drug'].to_numpy()

# Create Training and Test Datasets
from sklearn.model_selection import train_test_split
x_train, x_test,y_train, y_test = train_test_split(x, y, stratify=y,test_size=0.2,random_state=100)

# Scale the Data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train2 = sc.fit_transform(x_train)
x_test2 = sc.transform(x_test)

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <h4 align=center><a name=p1> EXECUTING THE SVM AND NAIVE BAYES CLASSIFICATION ALGORITHM AND INTERPRETING THE CONFUSION MATRIX </a></h4>
</div>

In [11]:
#Script for SVM and NB
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix  

for name,method in [('Support Vector Machine (SVM)', SVC(kernel='linear',random_state=100)),
                    ('Naive Bayes Algorithm',GaussianNB())]: 
    method.fit(x_train2,y_train)
    predict = method.predict(x_test2)
    target_names=['drugY', 'drugC', 'drugX', 'drugA', 'drugB']
    print('\nEstimator: {}'.format(name)) 
    print(confusion_matrix(y_test,predict))  
    print(classification_report(y_test,predict,target_names=target_names))          


Estimator: Support Vector Machine (SVM)
[[ 5  0  0  0  0]
 [ 0  2  0  0  1]
 [ 0  0  3  0  0]
 [ 0  0  0 11  0]
 [ 0  0  0  1 17]]
              precision    recall  f1-score   support

       drugY       1.00      1.00      1.00         5
       drugC       1.00      0.67      0.80         3
       drugX       1.00      1.00      1.00         3
       drugA       0.92      1.00      0.96        11
       drugB       0.94      0.94      0.94        18

    accuracy                           0.95        40
   macro avg       0.97      0.92      0.94        40
weighted avg       0.95      0.95      0.95        40


Estimator: Naive Bayes Algorithm
[[ 5  0  0  0  0]
 [ 0  3  0  0  0]
 [ 0  0  3  0  0]
 [ 0  0  0 10  1]
 [ 1  1  3  1 12]]
              precision    recall  f1-score   support

       drugY       0.83      1.00      0.91         5
       drugC       0.75      1.00      0.86         3
       drugX       0.50      1.00      0.67         3
       drugA       0.91      0.91    