# Support Vector Machine and Image Recognition
This is a learning project with aim of understanding supervised learning and classification using SVM. SVM is well-known for its performance in high dimensional spaces. The project entails the application of SVM in handling different types of data for the purposes of prediction.In addition, PCA (dimensionality reduction technique) is utilized to boost image classification. 


## Breast Cancer Simple Binary Classification

In [8]:
# Load Wisconsin Breast Cancer Dataset
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np

data_cancer = load_breast_cancer()

# Assign X  and Y (target) values

X = data_cancer.data
Y = data_cancer.target

# Print Data Size
print('Input data size:', X.shape)
print('Output data size: ', Y.shape)

Input data size: (569, 30)
Output data size:  (569,)


In [9]:
# Get more descriptive information about the data
print('Labels:', data_cancer.target_names)


# Positive and Negative samples in the data
pos_samples = (Y == 1).sum()
neg_samples = (Y == 0).sum()

print(f'{pos_samples} positive samples and {neg_samples} negative samples')

Labels: ['malignant' 'benign']
357 positive samples and 212 negative samples


From the dataset, there are 357 positive samples and 212 negative samples, which translates to about 63% of the samples being positive. In addition, there 569 samples and 30 features. It is imperative to take time understanding the data before proceeding to solve the classification problem. In this case, the data is relatively balanced. 

In [10]:
# Split Training and Testing Data
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=42)

In [11]:
# Initialize SVM classifier
from sklearn.svm import SVC 

model = SVC(kernel='linear', random_state=42, C = 1.0)


# Fit Classifier Model
model.fit(X_train, Y_train)

SVC(kernel='linear', random_state=42)

In [12]:
# check model accuracy
score_accuracy = model.score(X_test, Y_test)

print(f'Accuracy: {score_accuracy*100:.1f}%')

Accuracy: 95.8%


The model achieved an accuracy of 95.8%, which is commendable.

In [13]:
from sklearn.metrics import classification_report

pred = model.predict(X_test)

print(classification_report(Y_test, pred))

              precision    recall  f1-score   support

           0       0.96      0.93      0.94        54
           1       0.96      0.98      0.97        89

    accuracy                           0.96       143
   macro avg       0.96      0.95      0.96       143
weighted avg       0.96      0.96      0.96       143

