# Defect Detection from Scratch

In this and subsequent notebooks we are going to incrementally construct a defect detection model using PyTorch and the current version of FastAI. The approach will be single pass and efficient.

In these initial notebooks we are going to use a dataset containing images of defects in aircraft glass (AGDD). Each sample consists of two images of the same sample area, one with front-side and one with backside illumination. This is a reasonable starting point, as it allows us to understand how to load multiple images into the model.

The notebooks will build from scratch, in this first pass we detect only the bounding box of the largest defect using the front-side illumination. Later notebooks will infer the categories, multiple defects, and use both input images.

This first cell downloads the data. Only run it once.

In [None]:
import fsspec
from pathlib import Path

path = Path.cwd() / "data"

if not path.exists():
    path.mkdir(exist_ok=True, parents=True)
    fs = fsspec.filesystem("github", org="core128", repo="AGDD")
    
    for subfolder in ["image", "images", "labels_rect"]:
        for split in ["train", "val"]:
            destination = path / subfolder / split
            destination.mkdir(exist_ok=True, parents=True)
            fs.get(fs.ls(f"data/{subfolder}/{split}"), destination.as_posix())

In [None]:
from fastai.vision.all import *
path.ls()

# Exploring
The data is in two subfolders. The `images` subfolder contains the back-side illuminated images, the `image` folder the frontside illuminated image and the `labels_rect` contains the labels. Each of these folders contains a `train` and `val` subfolder: the data is pre-split for us.

The first job is to get the data into an appropriate form. Generally vision problems like this use data in a COCO format, so it would be sensible to try and convert the data into a similar format. Then we can use the usual FastAI processing pipelines. We can create this JSON for the training and validation set:

In [None]:
labels_dir = path / 'labels_rect'

def box_cxcywh_to_xywh(x):
    x_c, y_c, w, h = x[0], x[1], x[2], x[3]
    b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
         w, h]
    return np.array(b)


def create_coco_format_json(classes, filepaths):
    """
    This function creates a COCO dataset.
    :param classes: list of strings where each string is a class.
    :param filepaths: a list of strings containing all images paths
    :return dataset_coco_format: COCO dataset (JSON).
    """
    images = []
    annotations = []
    categories = []
    count = 0
    
    # Creates a categories list, i.e: [{'id': 0, 'name': 'a'}, {'id': 1, 'name': 'b'}, {'id': 2, 'name': 'c'}] 
    for idx, class_ in enumerate(classes):
        categories.append(
            { 
                "id": idx,
                "name": class_
            }
        )
    
    # Iterate over image filepaths
    for each in filepaths:
        # Get the image id, e.g: "10044"
        file_name = each.name
        file_id = Path(file_name).stem
        image = Image.open(each)
        height, width = image.shape

        images.append(
            {
                "id": file_id,
                "width": width,
                "height": height,
                "file_name": file_name,
                "date_captured": "2013-11-18 02:53:27"
            }
        )

        parent = each.parent.name
        label_path = labels_dir / parent / f"{file_id}.txt"
        labels = np.genfromtxt(label_path)


        # If there are labels:
        if labels.size != 0:
            # expand the dim so we can iterate
            if labels.ndim == 1:
                labels = np.expand_dims(labels, axis=0)


            for each in labels:
                bb = box_cxcywh_to_xywh(each[1:])
                # Scale to the image size
                bb[0] *= width
                bb[2] *= width
                bb[1] *= height
                bb[3] *= height
                seg = {
                    'bbox': bb.astype(int).tolist(),
                    'image_id':file_id, 
                    'category_id': each[0],
                    'id': count
                }
                annotations.append(seg)
                count += 1



    # Create the dataset
    dataset_coco_format = {
        "categories": categories,
        "images": images,
        "annotations": annotations,
    }
    
    return dataset_coco_format

This code creates the COCO format json for the training and validation sets for both illuminations:

In [None]:
import json

classes = ["contusion", "scratch", "crack", "spot"]

for subdir in ['image', 'images']:
    subdir = path / subdir
    for splitdir in ['train', 'val']:
        splitdir = subdir / splitdir
        image_paths = splitdir.glob('**/*.png')
        coco = create_coco_format_json(classes, image_paths)
        with open(splitdir/'data.json', 'w') as f:
            json.dump(coco, f)

FastAI has a `get_annotations` function which can parse COCO style bounding boxes similar to those in the dataset. Here we just load the data associated with the training set. This function returns a tuple of lists, the first containing the image file names in the training set and the second the corresponding bounding boxes and categories. We can check the training set:

In [None]:
imgs, lbl_bbox = get_annotations(path/'image'/'train'/'data.json')
len(imgs), len(lbl_bbox), imgs[1], lbl_bbox[1]

And the validation set:

In [None]:
val_imgs, val_lbl_bbox = get_annotations(path/'image'/'val'/'data.json')
len(val_imgs), len(val_lbl_bbox), val_imgs[1], val_lbl_bbox[1]

The bounding box contains four numbers, the first pair are the xy coordinates of the upper left corner and the second are those of the lower left corner of the box.

# Visualisation
We can use `matplotlib` to visualise:

In [None]:
import matplotlib.colors as mcolors
import matplotlib.cm as cmx
from matplotlib import patches, patheffects

img_file, img_bbox = imgs[1], lbl_bbox[1]
img = Image.open(path/'image'/'train'/img_file)
h, w = img.shape
h, w

These images are square, and uniform so we can probably proceed as is. For clarity though we define the size: all the images to be ingested have to be uniform:

In [None]:
SIZE = 640

img_scaled = img.resize((640, 640))
img_scaled

In object detection the independent variable is the image, and the dependent ones are the classes and bounding boxes. Given an image we want to predict a class label for each object present in the image, in addition to a bounding box for each. As the bounding box is defined in the coordinate space of the image (independent) scaling must be applied consistently. 

In [None]:
xscale, yscale = w/SIZE, h/SIZE
img_bbox_scaled = [[x1//xscale, y1//yscale, x2//xscale, y2//yscale] for x1, y1, x2, y2 in img_bbox[0]]
img_bbox_scaled = (img_bbox_scaled, img_bbox[1])
img_bbox_scaled

This is a small utility function to display an image with some overlaid annotations:

In [None]:
def show_img(im, figsize=None, ax=None):
    if not ax: fig, ax = plt.subplots(figsize=figsize)
    ax.imshow(im)
    ax.set_xticks(np.linspace(0, SIZE, 8))
    ax.set_yticks(np.linspace(0, SIZE, 8))
    ax.grid()
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    return ax

show_img(img_scaled)

We want to overlay the class labels and bounding boxes in a useful way. We can use the utility functions:

In [None]:
# draw an outline around the shape; used to add contrast to the text so we can read it easily
def draw_outline(o, lw):
    o.set_path_effects([patheffects.Stroke(
        linewidth=lw, foreground='black'), patheffects.Normal()])

# draw text in the specified location along with an outline so that there is some contrast between the text and the image
def draw_text(ax, xy, txt, sz=14, color='white'):
    text = ax.text(*xy, txt,
        verticalalignment='top', color=color, fontsize=sz, weight='bold')
    draw_outline(text, 1)

def draw_rect(ax, b, color='white'):
    patch = ax.add_patch(patches.Rectangle(b[:2], *b[-2:], fill=False, edgecolor=color, lw=2))
    draw_outline(patch, 4)

def get_cmap(N):
    color_norm  = mcolors.Normalize(vmin=0, vmax=N-1)
    return cmx.ScalarMappable(norm=color_norm, cmap='Set3').to_rgba

# generate a list of different colors for rendering our bounding boxes
num_colr = 12
cmap = get_cmap(num_colr)
colr_list = [cmap(float(x)) for x in range(num_colr)]

# draw an image along with it's associated bounding boxes and class labels
def show_item(im, lbl_bbox, figsize=None, ax=None):
    if not ax: fig,ax = plt.subplots(figsize=figsize)
    ax = show_img(im, ax=ax)
    for i,(b,c) in enumerate(zip(lbl_bbox[0], lbl_bbox[1])):
        b = (*b[:2],b[2]-b[0]+1,b[3]-b[1]+1)
        draw_rect(ax, b, color=colr_list[i%num_colr])
        draw_text(ax, b[:2], c, color=colr_list[i%num_colr])

show_item(img_scaled,img_bbox_scaled)

# Decomposition
Let's start with a nice simple problem. We can create a model which takes an image as input and predicts one object class. We can do this for the *largest* object present in the image. The dataset and bounding box information can give us this data, and we can use the derived dataset.

This function takes a labelled bounding box sample and returns the largest single bounding box along with its class label:

In [None]:
def area(box): return (box[2] - box[0]) * (box[3] - box[1])

def get_largest(boxes):
    return sorted(L(zip(*boxes)), key=lambda each: -area(each[0]))[0]

img_bbox_scaled, get_largest(img_bbox_scaled)

Now we can use a list comprehension to make a new training set with only the largest object:

In [None]:
lrg_bbox = [get_largest(boxes) for boxes in lbl_bbox]
img2lrgbbox = dict(zip(imgs, lrg_bbox))
k = L(img2lrgbbox)[1]; k, img2lrgbbox[k]

# Training a Classifier
We begin by creating a dataloader. To do this we want to merge the validation and training sets:

In [None]:
all_imgs = [f"train/{each}" for each in imgs] + [f"val/{each}" for each in val_imgs]
all_lbl_bbox = lbl_bbox + val_lbl_bbox
all_imgs, all_lbl_bbox

And of course we need to remake the largest item data:

In [None]:
all_lrg_bbox = [get_largest(boxes) for boxes in all_lbl_bbox]
allimg2lrgbbox = dict(zip(all_imgs, all_lrg_bbox))
k = L(allimg2lrgbbox)[1]; k, allimg2lrgbbox[k]

The getters are

In [None]:
getters = [lambda o: path/'image'/o, lambda o: allimg2lrgbbox[o][1]]
item_tfms = [Resize(SIZE, method='squish')]
batch_tfms = [Rotate(10), Flip(), Dihedral()]
dblock = DataBlock(blocks=(ImageBlock(cls=PILImageBW), CategoryBlock),
                   getters=getters,
                   item_tfms=item_tfms,
                   batch_tfms=batch_tfms,
                   splitter = FuncSplitter(lambda o: Path(o).parent == 'val'))
dls = dblock.dataloaders(all_imgs, bs=12)

FastAI looks at the dataset, and collects all the classes into a vocabulary:

In [None]:
dls.vocab, len(dls.vocab)

And we can inspect a batch:

In [None]:
dls.show_batch()

Now we can use the usual vision learner API, specifying the dataset, architecture, loss and metrics. Here we choose a `resnet34` which is balanced between capacity and performance. 

In [None]:
learn = vision_learner(dls, resnet34, metrics=[accuracy, error_rate])

Look at the backbone;

In [None]:
backbone = learn.model[0]; backbone

FastAI has selected the cross entropy loss. We can find a learning rate suitable for the problem:

In [None]:
lr = learn.lr_find()
lr

# Training
Now we can fit using the selected loss function. Note that we use `fine_tune` here as the `resnet34` has pretrained weights.

In [None]:
learn.fine_tune(10, base_lr=lr.valley)

In [None]:
learn.show_results(dl=dls)

These results seem quite bad! We need to work on this... It could be because we are only using one image as input. Something to work on! Let's try again with a composite fusion:

# Composite Images
If we want to make a composite model, the simplest thing is to average all the images. Making a model to ingest two image inputs is more complex, and will be covered at a later date. We make a three-channel image, where the first contains the average, the second contains the second image and the third contains the first:

In [None]:
import cv2
import shutil


destination = Path.cwd() / "data" / "composite"
destination.mkdir(exist_ok=True, parents=True)
(destination / "train").mkdir(exist_ok=True, parents=True)
(destination / "val").mkdir(exist_ok=True, parents=True)

shutil.copyfile(path / "image" / "train" / "data.json", destination / "train" / "data.json")
shutil.copyfile(path / "image" / "val" / "data.json", destination / "val" / "data.json")

for each in all_imgs:
    first = path / "image" / each
    second = path / "images" / each
    destination = path / "composite" / each

    first = cv2.imread(first, cv2.IMREAD_GRAYSCALE)
    second = cv2.imread(second, cv2.IMREAD_GRAYSCALE)

    avg_value = (first.astype(int) + second.astype(int)) // 2
    avg_value = avg_value.astype('uint8')

    bgr_img = cv2.merge((avg_value, second, first))
    cv2.imwrite(destination, bgr_img)



In [None]:
getters = [
    lambda o: path/'composite'/o, 
    lambda o: allimg2lrgbbox[o][1]
]
item_tfms = [Resize(SIZE, method='squish')]
batch_tfms = [Rotate(10), Flip(), Dihedral()]
dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                   n_inp=1,
                   getters=getters,
                   item_tfms=item_tfms,
                   batch_tfms=batch_tfms,
                   splitter = FuncSplitter(lambda o: Path(o).parent == 'val'))
dls = dblock.dataloaders(all_imgs, bs=16)

FastAI looks at the dataset, and collects all the classes into a vocabulary:

In [None]:
dls.vocab, len(dls.vocab)

And we can inspect a batch:

In [None]:
def show_batch(dls):
    b = dls.one_batch()
    print(len(b), b[0].shape)

    axs = subplots(3, 3)[1].flat
    for img, box, ax in zip(b[0][:9], b[1][:9], axs):
        show_img(img, ax=ax)

show_batch(dls)

Now we can use the usual vision learner API, specifying the dataset, architecture, loss and metrics. Here we choose a `resnet34` which is balanced between capacity and performance. 

In [None]:
learn = vision_learner(dls, resnet34, metrics=accuracy)

Look at the backbone;

In [None]:
backbone = learn.model[0]; backbone

FastAI has selected the cross entropy loss. We can find a learning rate suitable for the problem:

In [None]:
lr = learn.lr_find()
lr

# Training
Now we can fit using the selected loss function. Note that we use `fine_tune` here as the `resnet34` has pretrained weights.

In [None]:
learn.fine_tune(10, base_lr=lr.valley)

In [None]:
learn.show_results(dl=dls)

This seems like a better result!