<a href="https://colab.research.google.com/github/dharineeshramtp2000/Support-Vector-Machine-Classification---Liver-Disease-Prediction/blob/master/Liver_Disease_Predictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Import the Libraries

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Importing the Dataset. Here we use the dataset from [ML Data](https://www.mldata.io/datasets/).
![](https://www.health.harvard.edu/media/content/images/p3_InjuredLiver_W1907_gi937610338.jpg)
---
The dataset consists of 
1.  age	integer	Age of the patient in years
2.  gender	string	Patient Gender: Male or Female
3.  TB	float	Total Bilirubin
4.  DB	float	Direct Bilirubin
5.  alkphos	float	Alkaline Phosphotase
6.  sgpt	float	Alamine Aminotransferase
7.  sgot	float	Aspartate Aminotransferase
8.  TP	float	Total Proteins
9.  ALB	float	Albumin
10. A_G	float	Ratio of Albumin and Globulin
11. class	float	Predictor Class: 1 if patient has Liver Disease and 2 if they do not






In [13]:
Dataset = pd.read_csv("indian_liver_patient_dataset.csv")
X = Dataset.iloc[:,:-1].values
y = Dataset.iloc[:, -1].values
Dataset.describe()

Unnamed: 0,age,TB,DB,alkphos,sgpt,sgot,TP,ALB,A_G,class
count,583.0,583.0,583.0,583.0,583.0,583.0,583.0,583.0,583.0,583.0
mean,44.746141,3.298799,1.486106,290.576329,80.713551,109.910806,6.48319,3.141852,-685.16578,1.286449
std,16.189833,6.209522,2.808498,242.937989,182.620356,288.918529,1.085451,0.795519,8261.856,0.45249
min,4.0,0.4,0.1,63.0,10.0,10.0,2.7,0.9,-100000.0,1.0
25%,33.0,0.8,0.2,175.5,23.0,25.0,5.8,2.6,0.7,1.0
50%,45.0,1.0,0.3,208.0,35.0,42.0,6.6,3.1,0.92,1.0
75%,58.0,2.6,1.3,298.0,60.5,87.0,7.2,3.8,1.1,2.0
max,90.0,75.0,19.7,2110.0,2000.0,4929.0,9.6,5.5,2.8,2.0


Here in the dataset, one of the attribute is sex, given as 'Male' and 'Female'. We need to convert these strings into binary values(0,1)

In [0]:
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:,1] = labelencoder_X.fit_transform(X[:,1])

Now lets feature scale our independent variables from our famous [scikit](https://scikit-learn.org/stable/) library

In [0]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X[:,[0,2,3,4,5,6,7,8,9]] = sc_X.fit_transform(X[:,[0,2,3,4,5,6,7,8,9]])

Splitting the dataset into training and testing and performing a good shuffle of data

In [0]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42, shuffle = True)

Now lets define our SVM Classification model.
We use the Gaussian Kernel which is most commonly preferred.
Apart from that we define the constant C so that the hyperplane best fits for our train and test data.

![](https://miro.medium.com/max/1400/1*c_JJszZ8GlnQ7kx88Z2TeA.png)

Feel free to use the [Documnetaton for Support Vector Classification](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)

In [14]:
from sklearn.svm import SVC
classifier = SVC(C = 10.0, kernel = 'rbf',decision_function_shape='ono')
classifier.fit(X_train, y_train)


SVC(C=10.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ono', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

Now lets predict with our test set

In [0]:
y_pred = classifier.predict(X_test)

Its time for evaluating our model.

![](https://www.thinkebiz.net/wp-content/uploads/2018/01/74228032_s.jpg)

---
Lets first predict how our model is fitting for our training set
We use the famous
1.   f1 score
2.   accuracy 
to evaluate the model





In [0]:
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score

In [10]:
acc_train = accuracy_score(y_train, classifier.predict(X_train))
f1_train = f1_score(y_train, classifier.predict(X_train), average= 'weighted')

print("Traing set results")
print("ACCURACY ---------------------->",acc_train)
print("F1 SCORE ---------------------->",f1_train)


Traing set results
ACCURACY ----------------------> 0.7573529411764706
F1 SCORE ----------------------> 0.7105043102792938


Now lets see how well is our model. So now lets evaluate with our test set 

In [11]:
acc_test = accuracy_score(y_test, y_pred)
f1_test = f1_score(y_test, y_pred, average= 'weighted')

print("Test set results")
print("ACCURACY ---------------------->",acc_test)
print("F1 SCORE ---------------------->",f1_test)

Test set results
ACCURACY ----------------------> 0.7257142857142858
F1 SCORE ----------------------> 0.6859216878240657


Now lets have our famous Confusion Matrix to visually understand.

In [12]:
cm = confusion_matrix(y_test,y_pred)
print(cm)

[[117  11]
 [ 37  10]]


Our model has worked well with our test set too.
The **accuaracy** and **f1 score** for both test and training set is good.
Its always to put a question and ask to ourselves that is this the best model?

---

Obvisouly not!
We can still achive our accuracy by changing our models, changing the independent variables, change the metrics and so on.
Its great that we have successfully implemented SVM classification with our Liver Disease Dataset.
