# Casting Defects with Local Outlier Factor (LOF)

Usually, LOF is only meant to be used for outlier detection. The latest version of LOF supports novelty detection. Hence,
We will apply two different methods on this model - outlier detection and novelty detection.

In [1]:
import os
import numpy as np
import matplotlib.pyplot as plt
import cv2
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.neighbors import LocalOutlierFactor

### Feature Engineering

In [2]:
def quantify_image(img):
    # compute a greyscale histogram over an image and normalize it
    hist = cv2.calcHist([img],[0],None,[256],[0,256])
    hist = cv2.normalize(hist, hist).flatten()
    return hist

In [3]:
def transform_data_from_path(imagePath):
    # convert all images in the imagePath
    # to greyscale histogram values (normalised)
    data = []

    # loop over the imagePath directory
    for imageName in os.listdir(imagePath):
        # load the image
        image = cv2.imread(imagePath + '\\' + imageName)
        # quantify the image and update the data list
        features = quantify_image(image)
        data.append(features)
    return np.array(data)

In [4]:
# The paths to the images
defect_path = os.path.join('casting_512x512', 'def_front')
normal_path = os.path.join('casting_512x512', 'ok_front')

In [5]:
defect = transform_data_from_path(defect_path)
normal = transform_data_from_path(normal_path)

print('Defect:', len(defect))
print('Normal:', len(normal))

Defect: 781
Normal: 519


In LOF, we need to specify the contamination which should be in the range of (0, 0.5]. Hence, we will drop some defective images such that the number of defective casting is less than or equal to that of normal casting.

In [6]:
defect = defect[:519]
print('Defect:', len(defect))

Defect: 519


In [7]:
normal_train, normal_test = train_test_split(normal, test_size=0.2, random_state=42)
defect_train, defect_test = train_test_split(defect, test_size=0.2, random_state=42)

print('Normal')
print('Train:', len(normal_train))
print('Test:', len(normal_test))

print()
print('Defective')
print('Train:', len(defect_train))
print('Test:', len(defect_test))

Normal
Train: 415
Test: 104

Defective
Train: 415
Test: 104


## Outlier Detection

For outlier detection, we will train the model with both normal and casting images.

In [8]:
train = np.concatenate((normal, defect), axis=0)

# 1 means normal casting while -1 means defective casting
true_labels = np.ones(len(train), dtype=int)
true_labels[-len(defect):] = -1

### Model Training

In [9]:
model = LocalOutlierFactor(n_neighbors=20, 
                           contamination=len(defect)/len(train), 
                           n_jobs=-1)

In [10]:
preds = model.fit_predict(train)
print(classification_report(true_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.67      0.67      0.67       519
      Normal       0.67      0.67      0.67       519

    accuracy                           0.67      1038
   macro avg       0.67      0.67      0.67      1038
weighted avg       0.67      0.67      0.67      1038



The accuracy for outlier detection is only 0.67. Let's try to tune the hyperparameters.

### Hyperparameter Tuning

In [11]:
contamination_value = len(defect)/len(train)
for i in range(21, 36):
    model = LocalOutlierFactor(n_neighbors=i, 
                               contamination=contamination_value, 
                               n_jobs=-1)
    preds = model.fit_predict(train)
    print(f'n_neighbors: {i}, accuracy: {accuracy_score(true_labels, preds)}')

n_neighbors: 21, accuracy: 0.6647398843930635
n_neighbors: 22, accuracy: 0.6724470134874759
n_neighbors: 23, accuracy: 0.6782273603082851
n_neighbors: 24, accuracy: 0.6955684007707129
n_neighbors: 25, accuracy: 0.7052023121387283
n_neighbors: 26, accuracy: 0.697495183044316
n_neighbors: 27, accuracy: 0.6917148362235067
n_neighbors: 28, accuracy: 0.6763005780346821
n_neighbors: 29, accuracy: 0.653179190751445
n_neighbors: 30, accuracy: 0.630057803468208
n_neighbors: 31, accuracy: 0.6204238921001927
n_neighbors: 32, accuracy: 0.605009633911368
n_neighbors: 33, accuracy: 0.5934489402697495
n_neighbors: 34, accuracy: 0.5780346820809249
n_neighbors: 35, accuracy: 0.5741811175337187


The best outlier detection model occurs at n_neighbors = 25 with accuracy of 0.71. 

In [12]:
model = LocalOutlierFactor(n_neighbors=25, 
                           contamination=len(defect)/len(train), 
                           n_jobs=-1)
preds = model.fit_predict(train)
print(classification_report(true_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.71      0.71      0.71       519
      Normal       0.71      0.71      0.71       519

    accuracy                           0.71      1038
   macro avg       0.71      0.71      0.71      1038
weighted avg       0.71      0.71      0.71      1038



Next, we will apply novelty detection to see whether there is an increase in accuracy and recall.

## Novelty Detection

For novelty detection, we only need to train the model with normal casting images and the defective casting is considered 'novel'.

In [13]:
train = np.concatenate((normal_train, defect_train), axis=0)
test = np.concatenate((normal_test, defect_test), axis=0)

# 1 means normal casting while -1 means defective casting
test_labels = np.ones(len(test), dtype=int)
test_labels[-len(defect_test):] = -1

print(len(test))

208


### Model Training

In [14]:
model = LocalOutlierFactor(n_neighbors=20, 
                           contamination=len(defect_train)/len(train),
                           novelty=True,
                           n_jobs=-1)

In [15]:
model.fit(train)
preds = model.predict(test)
print(classification_report(test_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.69      0.74      0.71       104
      Normal       0.72      0.66      0.69       104

    accuracy                           0.70       208
   macro avg       0.70      0.70      0.70       208
weighted avg       0.70      0.70      0.70       208



The initial overall accuracy is 0.7. Let's tune the n_neighbors hyperparameter.

### Hyperparameter Tuning

In [16]:
optimal_ngh = 0
max_accuracy = -1
contamination_value = len(defect_train)/len(train)
for n_neighbors in range(21, 35):
    model = LocalOutlierFactor(n_neighbors=n_neighbors, 
                               contamination=contamination_value,
                               novelty=True,
                               n_jobs=-1)
    model.fit(train)
    preds = model.predict(test)
    accuracy = accuracy_score(test_labels, preds)
    if(accuracy > max_accuracy):
        max_accuracy = accuracy
        optimal_ngh = n_neighbors
    print(f'n_neighbors: {n_neighbors}, accuracy: {accuracy}')

n_neighbors: 21, accuracy: 0.6875
n_neighbors: 22, accuracy: 0.6778846153846154
n_neighbors: 23, accuracy: 0.7019230769230769
n_neighbors: 24, accuracy: 0.6778846153846154
n_neighbors: 25, accuracy: 0.6442307692307693
n_neighbors: 26, accuracy: 0.6442307692307693
n_neighbors: 27, accuracy: 0.6105769230769231
n_neighbors: 28, accuracy: 0.5961538461538461
n_neighbors: 29, accuracy: 0.5865384615384616
n_neighbors: 30, accuracy: 0.5721153846153846
n_neighbors: 31, accuracy: 0.5673076923076923
n_neighbors: 32, accuracy: 0.5721153846153846
n_neighbors: 33, accuracy: 0.5625
n_neighbors: 34, accuracy: 0.5576923076923077


In [17]:
model = LocalOutlierFactor(n_neighbors=optimal_ngh, 
                           contamination=len(defect_train)/len(train),
                           novelty=True,
                           n_jobs=-1)
model.fit(train)
preds = model.predict(test)
print(classification_report(test_labels, preds, target_names=['Defective', 'Normal']))

              precision    recall  f1-score   support

   Defective       0.70      0.71      0.70       104
      Normal       0.71      0.69      0.70       104

    accuracy                           0.70       208
   macro avg       0.70      0.70      0.70       208
weighted avg       0.70      0.70      0.70       208



We see that both outlier and novelty detection methods have similar performances.