# Heart Disease Support Vector Machines (SVM) 
### [Heart Disease](https://archive.ics.uci.edu/ml/datasets/Heart+Disease)

* **age**--in years
* **sex**--(1 = male; 0 = female)
* **cp**--chest pain type (1: typical angina, 2: atypical angina, 3: non-anginal, pain 4: asymptomatic)  
* **trestbps** -- resting blood pressure
* **chol**--serum cholesterol in mg/dl
* **fbs**--fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) 
* **restecg**--resting ecg (electrocardiographic) results
* **thalach**--maximum heart rate achieved 
* **exang**--exercise induced angina (1 = yes; 0 = no)
* **oldpeak**--ST depression induced by exercise relative to rest 
* **slope**--the slope of the peak exercise ST segment (1: upsloping, 2: flat, 3: downsloping) 
* **ca**: number of major vessels (0-3) colored by flourosopy 
* **thal**: 3 = normal; 6 = fixed defect; 7 = reversable defect 
* **target** (0, 1, 2, 3 4)'target' with 'N' for 0 and 'Y' for 1,2,3 & 4

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
df = pd.read_csv("HD_Cleveland_Data_Clean.csv")

In [3]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slop,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,6,N
1,67,1,4,160,286,0,2,108,1,1.5,2,3,3,Y
2,67,1,4,120,229,0,2,129,1,2.6,2,2,7,Y
3,37,1,3,130,250,0,0,187,0,3.5,3,0,3,N
4,41,0,2,130,204,0,2,172,0,1.4,1,0,3,N


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 297 entries, 0 to 296
Data columns (total 14 columns):
age         297 non-null int64
sex         297 non-null int64
cp          297 non-null int64
trestbps    297 non-null int64
chol        297 non-null int64
fbs         297 non-null int64
restecg     297 non-null int64
thalach     297 non-null int64
exang       297 non-null int64
oldpeak     297 non-null float64
slop        297 non-null int64
ca          297 non-null int64
thal        297 non-null int64
target      297 non-null object
dtypes: float64(1), int64(12), object(1)
memory usage: 32.6+ KB


## EDA

In [5]:
df['target'].value_counts()

N    160
Y    137
Name: target, dtype: int64

### Train Test Split

In [6]:
from sklearn.model_selection import train_test_split
X = df.drop('target', axis = 1)
y = df['target']

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

In [8]:
from sklearn.svm import SVC
svm_model = SVC()
svm_model.fit(X_train,y_train)



SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

### Predictions 

In [9]:
predictions = svm_model.predict(X_test)

### Evaluation

In [10]:
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,predictions))

[[49  0]
 [40  1]]


In [11]:
print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

           N       0.55      1.00      0.71        49
           Y       1.00      0.02      0.05        41

   micro avg       0.56      0.56      0.56        90
   macro avg       0.78      0.51      0.38        90
weighted avg       0.76      0.56      0.41        90



### Gridsearch

In [12]:
param_grid = {'C': [0.01,0.1, 10, 100, 1000],
              'gamma': [1,0.1,0.01,0.001,0.0001]}

In [13]:
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(SVC(), param_grid)
grid.fit(X_train, y_train)



GridSearchCV(cv='warn', error_score='raise-deprecating',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'C': [0.01, 0.1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [14]:
grid.best_params_

{'C': 100, 'gamma': 0.0001}

In [15]:
grid.best_estimator_

SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.0001, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

### Predictions and Evaluation

In [16]:
grid_predictions = grid.predict(X_test)

In [17]:
print(confusion_matrix(y_test,grid_predictions))
print(classification_report(y_test,grid_predictions))

[[43  6]
 [ 8 33]]
              precision    recall  f1-score   support

           N       0.84      0.88      0.86        49
           Y       0.85      0.80      0.83        41

   micro avg       0.84      0.84      0.84        90
   macro avg       0.84      0.84      0.84        90
weighted avg       0.84      0.84      0.84        90

