# Purpose:

Can I design a simple fish object recognizer?

Can I design a simple fish object recognizer

In [8]:
import os
import sys
import glob

import dlib
from skimage import io


In [31]:
faces_folder = "./train_exp/"
test_faces_folder = "./test_exp/"
# Now let's do the training.  The train_simple_object_detector() function has a
# bunch of options, all of which come with reasonable default values.  The next
# few lines goes over some of these options.
options = dlib.simple_object_detector_training_options()
# Since faces are left/right symmetric we can tell the trainer to train a
# symmetric detector.  This helps it get the most value out of the training
# data.
options.add_left_right_image_flips = False
# The trainer is a kind of support vector machine and therefore has the usual
# SVM C parameter.  In general, a bigger C encourages it to fit the training
# data better but might lead to overfitting.  You must find the best C value
# empirically by checking how well the trained detector works on a test set of
# images you haven't trained on.  Don't just leave the value set at 5.  Try a
# few different C values and see what works best for your data.
options.C = 5
# Tell the code how many CPU cores your computer has for the fastest training.
options.num_threads = 4
options.be_verbose = True


training_xml_path = os.path.join(faces_folder, "fish_train_ds_01.xml")
testing_xml_path = os.path.join(test_faces_folder, "fish_test_ds_01.xml")

In [26]:
dlib.train_simple_object_detector(training_xml_path, "detector.svm", options)

In [32]:
print("Training accuracy: {}".format(
    dlib.test_simple_object_detector(testing_xml_path, "detector.svm")))

Training accuracy: precision: 1, recall: 0.0444444, average precision: 0.0444444


First trial not that successful. Let me reduce complexity by converting to black and white images and making image resolution same.

In [51]:
from PIL import Image
import sys
from os.path import isfile, join
from os import listdir

In [67]:
train_path = "../Anthony/train_exp_2"
test_path = "../Anthony/test_exp_2"

In [74]:
train_files = [f for f in listdir(train_path) if isfile(join(train_path,f))]
test_files = [f for f in listdir(test_path) if isfile(join(test_path,f))]

In [75]:
def to_grey_scale(path, files):
    for file in files:
        cur_file_loc = "{0}/{1}".format(path, file)
        img = Image.open(cur_file_loc).convert('L')
        img.save(cur_file_loc)

In [76]:
to_grey_scale(test_path, test_files)

In [77]:
faces_folder = "./train_exp_2/"
test_faces_folder = "./test_exp_2/"
# Now let's do the training.  The train_simple_object_detector() function has a
# bunch of options, all of which come with reasonable default values.  The next
# few lines goes over some of these options.
options = dlib.simple_object_detector_training_options()
# Since faces are left/right symmetric we can tell the trainer to train a
# symmetric detector.  This helps it get the most value out of the training
# data.
options.add_left_right_image_flips = False
# The trainer is a kind of support vector machine and therefore has the usual
# SVM C parameter.  In general, a bigger C encourages it to fit the training
# data better but might lead to overfitting.  You must find the best C value
# empirically by checking how well the trained detector works on a test set of
# images you haven't trained on.  Don't just leave the value set at 5.  Try a
# few different C values and see what works best for your data.
options.C = 5
# Tell the code how many CPU cores your computer has for the fastest training.
options.num_threads = 4
options.be_verbose = True


training_xml_path = os.path.join(faces_folder, "fish_train_ds_01.xml")
testing_xml_path = os.path.join(test_faces_folder, "fish_test_ds_01.xml")

In [81]:
dlib.train_simple_object_detector(training_xml_path, "detector.svm", options)

In [82]:
print("Training accuracy: {}".format(
    dlib.test_simple_object_detector(testing_xml_path, "detector.svm")))

Training accuracy: precision: 1, recall: 0.0540541, average precision: 0.0540541


Still abysmal, but ever so slightly better. I bet this has to do with the different resolutions. I remember some ML people talking about different resolution cameras giving the team a lot of trouble for VSM based image recognition. Also maybe fish are too complex and different, perhaps if I just use the fish face, which is more constant that can lead to a better outcome.

In [92]:
faces_folder = "./train_exp_3/"
test_faces_folder = "./test_exp_3/"
# Now let's do the training.  The train_simple_object_detector() function has a
# bunch of options, all of which come with reasonable default values.  The next
# few lines goes over some of these options.
options = dlib.simple_object_detector_training_options()
# Since faces are left/right symmetric we can tell the trainer to train a
# symmetric detector.  This helps it get the most value out of the training
# data.
options.add_left_right_image_flips = False
# The trainer is a kind of support vector machine and therefore has the usual
# SVM C parameter.  In general, a bigger C encourages it to fit the training
# data better but might lead to overfitting.  You must find the best C value
# empirically by checking how well the trained detector works on a test set of
# images you haven't trained on.  Don't just leave the value set at 5.  Try a
# few different C values and see what works best for your data.
options.C = 5
# Tell the code how many CPU cores your computer has for the fastest training.
options.num_threads = 4
options.be_verbose = True


training_xml_path = os.path.join(faces_folder, "fish_train_ds_01.xml")
testing_xml_path = os.path.join(test_faces_folder, "fish_test_ds_01.xml")

In [93]:
dlib.train_simple_object_detector(training_xml_path, "detector.svm", options)

In [94]:
print("Training accuracy: {}".format(
    dlib.test_simple_object_detector(testing_xml_path, "detector.svm")))

Training accuracy: precision: 1, recall: 0.030303, average precision: 0.030303


Maybe my dataset is too small. But just using fish faces does not lead to improvement. However, I am able to get rid of these error messages talking about an impossible set of bounding boxes. Perhaps for this rather dirty dataset, it is better to skip HOG-SVM-based fish recognition, and either use some pre-trained model directly or use deep learning for identifying a fish first and then use that result to identify what fish it is. 

It is well possible that my hole approach of only using 50 or so images was too little data to fit. I wanted to limit the number of boxes I had to draw. Tong mentioned that there is a kernel on Kaggle where someone actually labeled all the fish pictures. So it may still be possible to use that dataset.

This much is clear: The quick win that the blog post referred to for human face recognition is not to be had with fish images.

For tonight, I will focus on exploring just a different netowrk topology for deep learning. To reduce the barrier of entry, I will probably make use of a kernel that's already out there.

In [85]:
train_path = "../Anthony/train_exp_3"
test_path = "../Anthony/test_exp_3"
train_files = [f for f in listdir(train_path) if isfile(join(train_path,f))]
test_files = [f for f in listdir(test_path) if isfile(join(test_path,f))]

In [88]:
def to_set_size(path, files, base_size):
    for file in files:
        cur_file_loc = "{0}/{1}".format(path, file)
        img = Image.open(cur_file_loc)
        print(img.size)

In [89]:
to_set_size(train_path, train_files, 300)

(1280, 720)
(1280, 974)
(1280, 974)
(1280, 720)
(1280, 720)
(1518, 854)
(1280, 720)
(1280, 974)
(1280, 720)
(1280, 750)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 974)
(1280, 750)
(1280, 974)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 974)
(1280, 750)
(1280, 720)


In [90]:
to_set_size(train_path, train_files, 300)

(1280, 720)
(1280, 974)
(1280, 974)
(1280, 720)
(1280, 720)
(1518, 854)
(1280, 720)
(1280, 974)
(1280, 720)
(1280, 750)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 974)
(1280, 750)
(1280, 974)
(1280, 720)
(1280, 720)
(1280, 720)
(1280, 974)
(1280, 750)
(1280, 720)
