<a href="https://colab.research.google.com/github/aleksanderprofic/Machine-Learning/blob/master/Classification/SVM/sklearn/tumors_svm_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Support Vector Machine (SVM)

## Data preprocessing

### Data loading

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('Tumors.csv')
df.head()

Unnamed: 0,Sample code number,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Bland Chromatin,Normal Nucleoli,Mitoses,Class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2


In [2]:
len(df)

683

There are 683 samples of data

### Data preprocessing

In [3]:
X = df.iloc[:, 1:-1].values
y = df.iloc[:, -1].values

#### Splitting the dataset into the Training set and the Test set

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#### Feature scaling

In [6]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Model training and predictions

### Performing Grid Search to find the best hyper parameters for each kernel

#### Linear

In [7]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

parameters = [{'C': [0, 0.1, 0.25, 0.5, 0.75, 1], 'kernel': ['linear']}]
grid_search = GridSearchCV(estimator=SVC(), param_grid=parameters, scoring='accuracy', n_jobs=-1, cv=10)
grid_search.fit(X_train, y_train)
print('----- Linear SVM model -----')
print('Best mean accuracy: {:.2f}%'.format(grid_search.best_score_ * 100))
print('Best Standard deviation: {:.2f}%'.format(grid_search.cv_results_['std_test_score'][grid_search.best_index_] * 100))
print(f"Best parameters: {grid_search.best_params_}")

----- Linear SVM model -----
Best mean accuracy: 97.07%
Best Standard deviation: 2.19%
Best parameters: {'C': 0.1, 'kernel': 'linear'}


#### Rbf

In [8]:
parameters = [{'C': [0, 0.1, 0.25, 0.5, 0.75, 1], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}]
grid_search = GridSearchCV(estimator=SVC(), param_grid=parameters, scoring='accuracy', n_jobs=-1, cv=10)
grid_search.fit(X_train, y_train)
print('----- Kernel SVM model -----')
print('Best mean accuracy: {:.2f}%'.format(grid_search.best_score_ * 100))
print('Best Standard deviation: {:.2f}%'.format(grid_search.cv_results_['std_test_score'][grid_search.best_index_] * 100))
print(f"Best parameters: {grid_search.best_params_}")

----- Kernel SVM model -----
Best mean accuracy: 97.07%
Best Standard deviation: 2.34%
Best parameters: {'C': 0.75, 'gamma': 0.1, 'kernel': 'rbf'}


#### Sigmoid

In [9]:
parameters = [{'C': [0, 0.1, 0.25, 0.5, 0.75, 1], 'kernel': ['sigmoid'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}]
grid_search = GridSearchCV(estimator=SVC(), param_grid=parameters, scoring='accuracy', n_jobs=-1, cv=10)
grid_search.fit(X_train, y_train)
print('----- Sigmoid SVM model -----')
print('Best mean accuracy: {:.2f}%'.format(grid_search.best_score_ * 100))
print('Best Standard deviation: {:.2f}%'.format(grid_search.cv_results_['std_test_score'][grid_search.best_index_] * 100))
print(f"Best parameters: {grid_search.best_params_}")

----- Sigmoid SVM model -----
Best mean accuracy: 97.43%
Best Standard deviation: 2.04%
Best parameters: {'C': 0.1, 'gamma': 0.1, 'kernel': 'sigmoid'}


#### Poly

In [10]:
parameters = [{'C': [0, 0.1, 0.25, 0.5, 0.75, 1], 'kernel': ['poly'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'degree': [2,3,4,5,6,7]}]
grid_search = GridSearchCV(estimator=SVC(), param_grid=parameters, scoring='accuracy', n_jobs=-1, cv=10)
grid_search.fit(X_train, y_train)
print('----- Poly SVM model -----')
print('Best mean accuracy: {:.2f}%'.format(grid_search.best_score_ * 100))
print('Best Standard deviation: {:.2f}%'.format(grid_search.cv_results_['std_test_score'][grid_search.best_index_] * 100))
print(f"Best parameters: {grid_search.best_params_}")

----- Poly SVM model -----
Best mean accuracy: 96.33%
Best Standard deviation: 2.45%
Best parameters: {'C': 1, 'degree': 3, 'gamma': 0.2, 'kernel': 'poly'}


### Training the models on the Training set

In [None]:
from sklearn.svm import SVC

linear_classifier = SVC(C=0.1, kernel='linear')
linear_classifier.fit(X_train, y_train)

kernel_classifier = SVC(C=0.75, kernel='rbf', gamma=0.1)
kernel_classifier.fit(X_train, y_train)

sigmoid_classifier = SVC(C=0.1, kernel='sigmoid', gamma=0.1)
sigmoid_classifier.fit(X_train, y_train)

poly_classifier = SVC(C=1, kernel='poly', gamma=0.2, degree=3)
poly_classifier.fit(X_train, y_train)

#### Predicting the Test set results

In [12]:
linear_y_pred = linear_classifier.predict(X_test)
kernel_y_pred = kernel_classifier.predict(X_test)
sigmoid_y_pred = sigmoid_classifier.predict(X_test)
poly_y_pred = poly_classifier.predict(X_test)

#### Creating Confusion Matrix

In [15]:
from sklearn.metrics import confusion_matrix, accuracy_score, recall_score

cm = confusion_matrix(y_test, linear_y_pred)
print(f'Confusion Matrix for linear model: \n{cm}')
print('Accuracy: {:.2f}%'.format(accuracy_score(y_test, linear_y_pred) * 100))
print('Recall: {:.2f}%\n'.format(recall_score(y_test, linear_y_pred, pos_label=4) * 100))

cm = confusion_matrix(y_test, kernel_y_pred)
print(f'Confusion Matrix for rbf kernel model: \n{cm}')
print('Accuracy: {:.2f}%'.format(accuracy_score(y_test, kernel_y_pred) * 100))
print('Recall: {:.2f}%\n'.format(recall_score(y_test, kernel_y_pred, pos_label=4) * 100))

cm = confusion_matrix(y_test, sigmoid_y_pred)
print(f'Confusion Matrix for sigmoid model: \n{cm}')
print('Accuracy: {:.2f}%'.format(accuracy_score(y_test, sigmoid_y_pred) * 100))
print('Recall: {:.2f}%\n'.format(recall_score(y_test, sigmoid_y_pred, pos_label=4) * 100))

cm = confusion_matrix(y_test, poly_y_pred)
print(f'Confusion Matrix for poly model: \n{cm}')
print('Accuracy: {:.2f}%'.format(accuracy_score(y_test, poly_y_pred) * 100))
print('Recall: {:.2f}%\n'.format(recall_score(y_test, poly_y_pred, pos_label=4) * 100))

Confusion Matrix for linear model: 
[[83  4]
 [ 3 47]]
Accuracy: 94.89%
Recall: 94.00%

Confusion Matrix for rbf kernel model: 
[[82  5]
 [ 1 49]]
Accuracy: 95.62%
Recall: 98.00%

Confusion Matrix for sigmoid model: 
[[84  3]
 [ 3 47]]
Accuracy: 95.62%
Recall: 94.00%

Confusion Matrix for poly model: 
[[86  1]
 [ 4 46]]
Accuracy: 96.35%
Recall: 92.00%



I would say the best model is rbf kernel SVM, even though it has 6 incorrect predictions, while Poly SVM has 1 less, because we want to maximize the recall score.