# Rail defects: squats in Eddy current data

For early detection of damaged rail, we use an Eddy current (https://en.wikipedia.org/wiki/Eddy-current_testing) lorry to measure the resistance of the rail tracks. From this resistance data we can identify squats (cracks) in the steel. There are three types of squats: A, B and C. Squats are classified depending on their depth and size. Can we find the relationship between these quantities and the squat type?

![A squat in a rail track](squat.jpg "A squat in a rail track")

In [None]:
# Import libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.svm import LinearSVC, SVC
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split

In [None]:
# Read data.
df = pd.read_csv('squats.csv', delimiter=',')[['max_depth', 'size', 'type_vid']]
df.head()

In [None]:
# Organize into train and test data.
type_dict = {'Squat - A': 0, 'Squat - B': 1, 'Squat - C': 2}
X = np.array(df[['max_depth', 'size']])
y = np.array([type_dict[tp] for tp in df['type_vid']])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
print("Data size: ", X.shape, y.shape)
print("Train size: ", X_train.shape, y_train.shape)
print("Test size: ", X_test.shape, y_test.shape)

In [None]:
# Define model.
# -------------
# See https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
# and https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
#
# Hint: class_weight can be None or 'balanced', or you can define individual class weights 
# in a dictionary, e.g. class_weight={0: 5.0, 1: 1.0, 2: 4.0}.

clf = LinearSVC(max_iter=100000, class_weight='balanced', random_state=0)
# clf = SVC(C=1.0, kernel='poly', gamma=0.05, random_state=0, class_weight='balanced')

# Fit training set and score test set.
clf.fit(X_train, y_train)
print("Accuracy on test set: ", clf.score(X_test, y_test))

In [None]:
# Plot the decision boundary.
def plot_decision_boundary(model, X_data, y_data):
    x_min, x_max = X_data[:, 0].min() - 1, X_data[:, 0].max() + 1
    y_min, y_max = X_data[:, 1].min() - 1, X_data[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, (x_max-x_min)/500),
                         np.arange(y_min, y_max, (y_max-y_min)/500))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.figure(figsize=(8, 6))
    cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
    cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
    plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
    plt.scatter(X_data[:, 0], X_data[:, 1], c=y_data, cmap=cmap_bold,
                edgecolor='k', s=20)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("Classification of squats - red: A, green: B, blue: C.")
    plt.xlabel("max depth")
    plt.ylabel("size")
    plt.show()
    
print("Train data:")
plot_decision_boundary(clf, X_train, y_train)

print("Test data:")
plot_decision_boundary(clf, X_test, y_test)

In [None]:
# Print confusion matrix on test set
# Horizontal: actual class; vertical: predicted class
classes = ['Squat - A', 'Squat - B', 'Squat - C']
conf_matrix = confusion_matrix(y_test, clf.predict(X_test), labels=[0, 1, 2])
pd.DataFrame(data=conf_matrix.T, columns=classes, index=classes)