# Casting Defects with One Class SVM

We will apply two different methods on this model - outlier detection and novelty detection.

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
import cv2
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.svm import OneClassSVM

### Feature Engineering

In [2]:
def quantify_image(img):
    # compute a greyscale histogram over an image and normalize it
    hist = cv2.calcHist([img],[0],None,[256],[0,256])
    hist = cv2.normalize(hist, hist).flatten()
    return hist

In [3]:
def transform_data_from_path(imagePath):
    # convert all images in the imagePath
    # to greyscale histogram values (normalised)
    data = []

    # loop over the imagePath directory
    for imageName in os.listdir(imagePath):
        # load the image
        image = cv2.imread(imagePath + '\\' + imageName)
        # quantify the image and update the data list
        features = quantify_image(image)
        data.append(features)
    return np.array(data)

In [4]:
# The paths to the images
defect_path = os.path.join('casting_512x512', 'def_front')
normal_path = os.path.join('casting_512x512', 'ok_front')

In [5]:
defect = transform_data_from_path(defect_path)
normal = transform_data_from_path(normal_path)

print('Defect:', len(defect))
print('Normal:', len(normal))

Defect: 781
Normal: 519


In [6]:
normal_train, normal_test = train_test_split(normal, test_size=0.2, random_state=42)
defect_train, defect_test = train_test_split(defect, test_size=0.15, random_state=42)

print('Normal')
print('Train:', len(normal_train))
print('Test:', len(normal_test))

print()
print('Defective')
print('Train:', len(defect_train))
print('Test:', len(defect_test))

Normal
Train: 415
Test: 104

Defective
Train: 663
Test: 118


## Outlier Detection

For outlier detection, we will train the model with both normal and casting images.

In [7]:
train = np.concatenate((normal_train, defect_train), axis=0)
test = np.concatenate((normal_test, defect_test), axis=0)

# 1 means normal casting while -1 means defective casting
test_labels = np.ones(len(test), dtype=int)
test_labels[-len(defect_test):] = -1

print(len(test))

222


### Model Training

In [8]:
model = OneClassSVM(kernel='rbf', gamma='scale', nu=0.5, max_iter=-1)

In [9]:
model.fit(train)
preds = model.predict(test)
print(classification_report(test_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.49      0.51      0.50       118
      Normal       0.42      0.40      0.41       104

    accuracy                           0.46       222
   macro avg       0.46      0.46      0.46       222
weighted avg       0.46      0.46      0.46       222



The accuracy for outlier detection is very low. Let's try to tune the hyperparameters.

### Hyperparameter Tuning

We will loop through nu values of 0.1, 0.2, ..., 1.0 and change the kernel type.

In [12]:
optimal_nu = 0
max_accuracy = -1
for nu_value in np.arange(0.1, 1, 0.1):
    model = OneClassSVM(kernel='sigmoid', gamma='scale', nu=np.round_(nu_value, decimals=1), max_iter=-1)
    model.fit(train)
    preds = model.predict(test)
    accuracy = accuracy_score(test_labels, preds)
    if(accuracy > max_accuracy):
        max_accuracy = accuracy
        optimal_nu = nu_value
    print(f'nu: {nu_value: .1f}, accuracy: {accuracy}')

nu:  0.1, accuracy: 0.4594594594594595
nu:  0.2, accuracy: 0.5135135135135135
nu:  0.3, accuracy: 0.5405405405405406
nu:  0.4, accuracy: 0.6036036036036037
nu:  0.5, accuracy: 0.6306306306306306
nu:  0.6, accuracy: 0.6171171171171171
nu:  0.7, accuracy: 0.581081081081081
nu:  0.8, accuracy: 0.5585585585585585
nu:  0.9, accuracy: 0.545045045045045


After changing kernel type to sigmoid, the best outlier detection model occurs at nu = 0.5, which is 0.63.

In [13]:
model = OneClassSVM(kernel='sigmoid', gamma='scale', nu=np.round_(optimal_nu, decimals=1), max_iter=-1)
model.fit(train)
preds = model.predict(test)
print(classification_report(test_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.66      0.62      0.64       118
      Normal       0.60      0.64      0.62       104

    accuracy                           0.63       222
   macro avg       0.63      0.63      0.63       222
weighted avg       0.63      0.63      0.63       222



The recall for both defective and normal casting are 0.62 and 0.64 respectively. Next, we will apply novelty detection to see whether there is an increase in accuracy and recall.

## Novelty Detection

For novelty detection, we only need to train the model with normal casting images and the defective casting is considered 'novel'.

In [22]:
test = np.concatenate((normal_test, defect_test), axis=0)

# 1 means normal casting while -1 means defective casting
test_labels = np.ones(len(test), dtype=int)
test_labels[-len(defect_test):] = -1

print(len(test))

222


### Model Training

In [23]:
model = OneClassSVM(kernel='rbf', gamma='scale', nu=0.5, max_iter=-1)

In [24]:
model.fit(normal_train)
preds = model.predict(test)
print(classification_report(test_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.69      0.92      0.79       118
      Normal       0.86      0.52      0.65       104

    accuracy                           0.73       222
   macro avg       0.77      0.72      0.72       222
weighted avg       0.77      0.73      0.72       222



We see that the overall accuracy has increased to 0.73. However, the recall for normal casting is only 0.52, which means nearly half of the normal casting is misclassifed as defective.

### Hyperparameter Tuning

In [28]:
optimal_nu = 0
max_accuracy = -1
for nu_value in np.arange(0.1, 1, 0.1):
    model = OneClassSVM(kernel='rbf', gamma='scale', nu=np.round_(nu_value, decimals=1), max_iter=-1)
    model.fit(normal_train)
    preds = model.predict(test)
    accuracy = accuracy_score(test_labels, preds)
    if(accuracy > max_accuracy):
        max_accuracy = accuracy
        optimal_nu = nu_value
    print(f'nu: {nu_value: .1f}, accuracy: {accuracy_score(test_labels, preds)}')

nu:  0.1, accuracy: 0.8243243243243243
nu:  0.2, accuracy: 0.8108108108108109
nu:  0.3, accuracy: 0.7702702702702703
nu:  0.4, accuracy: 0.7612612612612613
nu:  0.5, accuracy: 0.7342342342342343
nu:  0.6, accuracy: 0.6711711711711712
nu:  0.7, accuracy: 0.5990990990990991
nu:  0.8, accuracy: 0.581081081081081
nu:  0.9, accuracy: 0.5540540540540541


The best novelty detection model occurs at nu = 0.1 with accuracy of 0.82.

In [29]:
model = OneClassSVM(kernel='rbf', gamma='scale', nu=np.round_(optimal_nu, decimals=1), max_iter=-1)
model.fit(normal_train)
preds = model.predict(test)
print(classification_report(test_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.92      0.73      0.82       118
      Normal       0.75      0.93      0.83       104

    accuracy                           0.82       222
   macro avg       0.84      0.83      0.82       222
weighted avg       0.84      0.82      0.82       222



The novelty detection method performs better on the images, reaching 0.82 overall accuracy. We can also see that the recalls for both defective and normal casting have increased to 0.73 and 0.93 respectively.