## Cropping using AI Object Detection

Based on the code found in <a href="https://www.pyimagesearch.com/2017/09/11/object-detection-with-deep-learning-and-opencv/">this article</a>.

Using a pre-trained model with OpenCV we'll identify where dogs are within images, in order to obtain cropping coordinates to produce the images without having pre-defined crops provided for us. In this manner we'll be able to take images from outside of the Stanford dogs dataset and have them cropped without human intervention. 

The model used here was originally trained to detect some 20 objects, one of which is dogs, so we can just disregard any detection that is not a dog, however it does mean this Notebook should be open to future use for other purposes. If a better model exists or is made, one need only swap the model used.

The network architecture is a <a href="https://arxiv.org/abs/1704.04861">"MobileNet"</a> combined with <a href="https://arxiv.org/abs/1512.02325">SSDs</a> (Single Shot Detectors). Alternatives include R-CNNs and YOLO, however the former is too slow and the latter too inaccurate, so SSDs are a nice balancy between the two. "MobileNet" is useful as it is designed for use on smaller devices, e.g. mobile phones, thus it is much smaller than the alternatives in terms of file size - the network used here is barely over 20 MB - and a lot more efficient, at a cost to accurracy, but should still be good enough to identify objects in the vast majority of our images. 

#### The Model

The Caffe model here was trained by Github user <a href="https://github.com/chuanqi305/MobileNet-SSD">chuanqi305</a> using the <a href="http://cocodataset.org/">COCO dataset</a>. 

More details on the workings of the original code and network can be found in the original article. 

#### This Implementation

We'll be recycling some of the logic and the network from the article. That code was designed for command-line use on a single image at a time, here the detection logic will be refitted to work automatically with all of the images from our dataset, organsing the output similarly to the previous dataset processor. 

It requires you have the images and Matlab list files from the <a href="http://vision.stanford.edu/aditya86/ImageNetDogs/">Stanford dog dataset</a> unpacked into folders of `images`, `lists`, adjacent to this Notebook file. Note this does _not_ require the annotation files. Simply unpack the folder from the provided .tar files into the same directory as ths notebook.

In [1]:
import cv2

# for displaying images 
import matplotlib.pyplot as plt

import numpy as np
from scipy import io
from os import path, mkdir, rename
from random import random, seed, choices
from IPython.display import display, clear_output
import xmltodict

########## Customisation ##########
# fractions (sum=1):
frac_train = 0.8
frac_dev = 0.1
frac_test = 0.1

# Our output folder:
image_path_sorted = "images_sorted_auto"

# minimum percent confidence (between 0 and 1) in a detection the network requires 
# to consider that detection as being correct:
confidence_requirement = 0.4
###################################

image_path_train = path.join(image_path_sorted, "train")
image_path_dev = path.join(image_path_sorted, "dev")
image_path_test = path.join(image_path_sorted, "test")
if not path.exists(image_path_sorted):
    print("Creating missing directories... ", end='', flush=True)
    mkdir(image_path_sorted)
    mkdir(image_path_train)
    mkdir(image_path_dev)
    mkdir(image_path_test)
    print("Done")

Creating missing directories... Done


## Load pre-trained network files

In [2]:
########### customisation #############
network_f_folder = "detectionNetwork" # directory the network files are
model = "MobileNetSSD_deploy.caffemodel" # caffee model file
prototxt = "MobileNetSSD_deploy.prototxt.txt" # model prototxt
#######################################

# classes the network has been trained to detect
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
    "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
    "dog", "horse", "motorbike", "person", "pottedplant", "sheep",
    "sofa", "train", "tvmonitor"]
#COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))
interested = [12] # CLASSES[12]="dog"

print("loading model... ", end="", flush=True)
model_path = path.join(network_f_folder, model)
prototxt_path = path.join(network_f_folder, prototxt)
net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)
print("Done!")

loading model... Done!


### Split function

In [3]:
# auto split function
pop = [image_path_train, image_path_dev, image_path_test]
prob = [frac_train, frac_dev, frac_test]
def split_loc():
    """Randomly chooses which category to put the image in. Returns directory path."""
    res = choices(population=pop, weights=prob, k=1)
    return res[0]

## Load list of images. 
This section could be replaced with different code for a different set of labelled images. Thus functioning for any set of images. So long as `file_list` is defined as a list of list-like objects, where each object within `file-list` has index 0 as the path of a file relative to `image_path`, and index 1 as the object Y-label. 

In [4]:
# Load our list of original Stanford images and labels
image_path = "images"
lists_path = "lists"


files_mat = io.loadmat(path.join(lists_path, "file_list.mat"))
file_list = [[item[0][0], item[0][0].split('/')[0]] for item in files_mat["file_list"]]
#print(file_list)



### Progress tracker setup

In [5]:
# Progress tracking
######## Customisation #########
prog_track = True # toggle tracker
freq_track = 100 # how often to update
############################


processed = 0
state = freq_track
total_images = len(file_list)

errors = 0
no_det = 0

## Image Processing
Here we finally perform the image sorting,

In [None]:
for file in file_list:
    # progress tracking
    if prog_track == True:
        interests_detected = 0
        if state >= freq_track:
            clear_output(wait=True)
            state = 0
            print(f"Done {processed} | {100*(processed/total_images):.3f}%")
            print(f"Non-detections: {no_det} | Errors: {errors}")
        processed+=1
        state +=1
        
    sub_folder = file[1]
    image = cv2.imread(path.join(image_path, file[0]))
    (h, w) = image.shape[:2]
    blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300, 300), 127.5)
    net.setInput(blob)
    detections = net.forward()
    for i in np.arange(0, detections.shape[2]):
        # extract the confidence (i.e., probability) associated with the
        # prediction
        confidence = detections[0, 0, i, 2]
        
        # choose where we're putting it (train/test/dev)
        # and make a class folder 
        loc = path.join(split_loc(), sub_folder)
        if not path.exists(loc):
            mkdir(loc)
        idx = int(detections[0, 0, i, 1])
        # filter out weak detections by ensuring the `confidence` is
        # greater than the minimum confidence
        if confidence > confidence_requirement:
            
            # extract the index of the class label from the `detections`,
            # check if its a dog, 
            # then compute the (x, y)-coordinates of the bounding box for
            # the object, then crop it out, then save it to a file. 
            idx = int(detections[0, 0, i, 1])
            if idx not in interested:
                continue # skip anything not of interest
            else:
                interests_detected += 1
                box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
                (startX, startY, endX, endY) = box.astype("int")
                if startX < 0: startX = 0
                #if startX > w: startX = w
                if startY < 0: startY = 0
                #if startY > h: startY = h
                    
                    
                img_crop = image[startY:endY, startX:endX]
                #print(f"detection in {file[0]}")
                #print(startY, endY, startX, endX)
                
                final_path = path.join(loc, f"{file[0].split('/')[1]}_{i}_c{confidence*100:.2f}.jpg")
                #print(final_path)
                try:
                    cv2.imwrite(final_path, img_crop)
                    # Note that pyplot and OpenCV use differnt image formats so the colours look stragne
                    # when displayed by plt.imshow()
                    #plt.imshow(img_crop)
                    #plt.show()
                except Exception as e:
                    print(f"Encountered an error, skipping.\n{e}")
                    errors += 1
                                       
    # progress tracking   
    if prog_track == True and interests_detected == 0:
        no_det += 1

if prog_track == True:
    clear_output(wait=True)
    print(f"Done {processed} | {100*(processed/total_images):.3f}%")
    print(f"Non-detections: {no_det} | Errors: {errors}")

Done 15500 | 75.316%
Non-detections: 2371 | Errors: 0


<u>Current issues</u>: 
- Naming files is a bit of a mess and the whole thing could be made more sensible in this regard. `path.join(loc, f"{file[0].split('/')[1]}` where `loc` contains the first half of `file[0]` is really stupid.
- I've since discovered the `split-folders` Python package which would deal with sorting the images into folders automatically without having to write that code myself, however due to only just finding out it exists at such late notice, I have not implemented it here. 
- The network we're using here is of course not designed for *just* dogs but 20 object classes in total. It would likely see an improvement were we to train a new model from scratch for this purpose, but as this is a proof-of-concept, the existing model is sufficient to show it has promise of being highly effective. 