# Support Vector Machine (SVM) For Detecting Defective Parts

#### Erin Rasmussen, Kevin Thomas, Mike Wells, Sam Little

This notebook uses the [`scikit-learn`](http://scikit-learn.org/) library to implement a Support Vector Machine (SVM) classifier with various kernels. The SVM classifier is used to detect defective mechanical parts in a manufacturing process. The image data we used is from the ["Casting Product Image Data For Quality Inspection"](https://www.kaggle.com/ravirajsinh45/real-life-industrial-dataset-of-casting-product) dataset, and contains 8646 images used for quality inspection that we used to train the SVM.

First, we import the necessary libraries.

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import os
import shutil
import json
from tqdm.auto import tqdm

We declare global variables below

In [None]:
DATA_PATH = './casting_512x512/casting_512x512/'
OUTPUT_PATH = './out/'
TRAIN_TEST_SPLIT = 0.8
RESIZE_SHAPE = (64, 64)

Load image data

In [None]:
data = []
labels = []
print('Loading OK images...')
for filename in tqdm(os.listdir(f'{DATA_PATH}/ok_front')):
  img = Image.open(f'{DATA_PATH}/ok_front/{filename}')
  img = img.resize(RESIZE_SHAPE)
  img = np.array(img)
  data.append(img)
  labels.append(0)
print('Loading defective images...')
for filename in tqdm(os.listdir(f'{DATA_PATH}/def_front')):
  img = Image.open(f'{DATA_PATH}/def_front/{filename}')
  img = img.resize(RESIZE_SHAPE)
  img = np.array(img)
  data.append(img)
  labels.append(1)

Flatten the images, shuffle them, and split them into training and test sets

In [None]:
data = np.array(data)
original_shape = data[0].shape
data = data.reshape(data.shape[0], -1)
shuffled = list(zip(data, labels))

np.random.shuffle(shuffled)
train = shuffled[:int(len(shuffled) * TRAIN_TEST_SPLIT)]
test = shuffled[int(len(shuffled) * TRAIN_TEST_SPLIT):]

train_data = np.array([i[0] for i in train])
train_labels = np.array([i[1] for i in train])
test_data = np.array([i[0] for i in test])
test_labels = np.array([i[1] for i in test])

In [None]:
print(f'Train data shape: {train_data.shape}')
print(f'Train labels shape: {train_labels.shape}')
print(f'Test data shape: {test_data.shape}')
print(f'Test labels shape: {test_labels.shape}')

Show a random image to make sure everything is working and retain sanity

In [None]:
random_image = np.random.randint(0, len(train_data))
image = train_data[random_image].reshape(original_shape)
plt.imshow(image)
label = train_labels[random_image]
type = 'OK' if label == 0 else 'DEFECTIVE'
plt.title(f'TYPE: {type}')
plt.show()

Declare kernels we wish to test, and feed each into an `sklearn` SVM, fitting it to the training data. Accuracy, confusion matrices, and other metrics are generated for each kernel.

In [None]:
kernels_to_test = [
  'linear',
   'poly',
   'rbf',
   #'sigmoid',
]

results = {}

for kernel in kernels_to_test:
  print(f'Testing kernel: {kernel}')
  svm = SVC(kernel=kernel, gamma='scale', verbose=True)
  svm.fit(train_data, train_labels)
  predictions = svm.predict(test_data)
  accuracy = accuracy_score(test_labels, predictions)
  conf_matrix = confusion_matrix(test_labels, predictions)
  report = classification_report(test_labels, predictions)
  results[kernel] = {
    'accuracy': accuracy,
    'confusion_matrix': conf_matrix,
    'classification_report': report
  }

For each kernel, we save the confusion matrix, accuracy, and other metrics to the output directory.

In [None]:
for kernel, result in results.items():

  # if path doesn't exist, create it
  if not os.path.exists(f'{OUTPUT_PATH}{kernel}'):
    os.makedirs(f'{OUTPUT_PATH}{kernel}')

  # save confusion matrix
  plt.figure(figsize=(10, 10))
  plt.imshow(result['confusion_matrix'], interpolation='nearest', cmap=plt.cm.Blues)
  plt.title(f'Confusion matrix for {kernel} kernel')
  plt.colorbar()
  tick_marks = np.arange(2)
  plt.xticks(tick_marks, ['OK', 'DEFECTIVE'], rotation=45)
  plt.yticks(tick_marks, ['OK', 'DEFECTIVE'])
  plt.tight_layout()
  plt.ylabel('True label')
  plt.xlabel('Predicted label')
  plt.savefig(f'{OUTPUT_PATH}{kernel}/confusion_matrix.png')
  plt.show()

  print(f'Accuracy for {kernel} kernel: {result["accuracy"]}')
  print(f'Classification report for {kernel} kernel:')
  print(result['classification_report'])

  # save results to JSON
  with open(f'{OUTPUT_PATH}{kernel}/results.json', 'w') as f:
    result = {
      'accuracy': result['accuracy'],
      'confusion_matrix': result['confusion_matrix'].tolist(),
      'classification_report': result['classification_report']
    }
    json.dump(result, f)