<a href="https://colab.research.google.com/github/devi777/Heart-Disease-Classification/blob/master/HD_Prediction3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Contents

1. SVM
2. Kernel SVM
3. Naive Bayes

# Pre-processing Data

In [0]:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

In [0]:
# Importing the dataset
df = pd.read_csv('heart.csv')

In [0]:
X = df.iloc[:,:-1].values
y = df.iloc[:,-1].values

In [0]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/5, random_state = 0)

In [0]:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Analyzing Data using SVM

Documentation: https://scikit-learn.org/stable/modules/svm.html

So, in SVM, we find the boundary line which has the maximum equidistant margin between the 2 closest points of the different classes.

In [0]:
# Fitting classifier to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel='sigmoid',random_state=0)
classifier.fit(X_train,y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='sigmoid',
    max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,
    verbose=False)

In [0]:
print('Test accuracy {:.2f}%'.format(classifier.score(X_test,y_test)*100))

Test accuracy 83.61%


In [0]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)

In [0]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[20  7]
 [ 3 31]]


In [0]:
print("Accuracy:",str(round((20+31)/(20+7+31+3)*100,2)), "%")

Accuracy: 83.61 %


Good, so SVM performs well on this dataset too. So we can look into the parameters of SVM too, but let's just view all the types of classification and then later check them out.

# Analyzing Data using Kernel SVM

Documentation: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Kernel SVM can handle non-linearity in data unlike SVM. 

## Kernel Functions 
1. Polynomial
2. Gaussian
3. RBF 
4. Sigmoid

In [0]:
# Fitting classifier to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel='rbf',random_state =0)
classifier.fit(X_train,y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,
    verbose=False)

In [0]:
print('Test accuracy {:.2f}%'.format(classifier.score(X_test,y_test)*100))

Test accuracy 86.89%


In [0]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)

In [0]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[21  6]
 [ 2 32]]


In [0]:
print("Accuracy:",str(round((21+32)/(21+6+32+2)*100,2)), "%")

Accuracy: 86.89 %


Great, looks like RBF fits better. Cool, let's move ahead and try Naive Bayes Classification.

# Analyzing using Naive Bayes 

Documentation: 

A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The crux of the classifier is based on the Bayes theorem. 

In [0]:
# Fitting classifier to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train,y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [0]:
print('Test accuracy {:.2f}%'.format(classifier.score(X_test,y_test)*100))

Test accuracy 85.25%


In [0]:
# Predicting the Test set results
y_pred = classifier.predict(X_test)

In [0]:
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[21  6]
 [ 3 31]]


In [0]:
print("Accuracy:",str(round((21+31)/(21+6+31+3)*100,2)), "%")

Accuracy: 85.25 %


Well, Naive Bayes performs almost equilvalent to Kernel SVM. We have a 1.64% difference because of 1 test example. Having more data might have trained the machine better (unless it's high bias).  