# Overview:

1. I used Caffe to train a small neural net that can detect vehicles. <br/>
2. My classifier's accuracy on the training set is 100%. <br/>

# End to end training for vehicle detetion

There are only 1050 images in the training set. Since data set is small, we can leverage a simple neural network architecture. 
The reason that I choose to use nerual network is that NN is scalable, compared to the HOG+SVM implementation, nerual networks can easily be scaled up to classify RGB images or detect more objects. 
There are various choices on the neural net frameworks, I'm using Caffe for this project, because Caffe is optimized on training images. 

For the visualization of NN and the thoughs on params of the NN, please refer to the 4th section(at the end of this file) :)

First, we start with training the data. This involves the following parts:
(1) Write all training image fileNames and related labels into a file.<br/>
(2) Define the network architecture.<br/>
(3) Start training.<br/>

After training the net, we get a caffe model file: weights.vehicle.caffemodel. We then save it locally and use it to predict labels on the test set.

### 1. Read image filenames and labels into a file

In [None]:
test_imgs = glob.glob("images/test/*.jpg")
train_imgs = glob.glob("images/train/*.jpg")

with open("train.txt", 'w') as outfile:
    for f in train_imgs:
        if 'pos-' in f:
            outfile.write(f + " " + "1\n" )
        elif 'neg-' in f:
            outfile.write(f + " " + "0\n")

### 2.  Defining and running the nets

We'll start by defining `build_net`, a function which initializes the vehicle net architecture (a minor variant on *LeNet*), taking arguments specifying the data and number of output classes.

In [5]:
from caffe import layers as L
from caffe import params as P

weight_param = dict(lr_mult=1, decay_mult=1)
bias_param   = dict(lr_mult=2, decay_mult=0)
learned_param = [weight_param, bias_param]
frozen_param = [dict(lr_mult=0)] * 2

def conv(bottom, ks, nout, stride=1, pad=0, group=1,
              param=learned_param,
              weight_filler=dict(type='xavier', std=0.01),
              bias_filler=dict(type='constant', value=0.1)):
    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
                         num_output=nout, pad=pad, group=group,
                         param=param, weight_filler=weight_filler,
                         bias_filler=bias_filler)
    return conv

def fc_relu(bottom, nout, param=learned_param,
            weight_filler=dict(type='xavier', std=0.005),
            bias_filler=dict(type='constant', value=0.1)):
    fc = L.InnerProduct(bottom, num_output=nout, param=param,
                        weight_filler=weight_filler,
                        bias_filler=bias_filler)
    return fc, L.ReLU(fc, in_place=True)

def max_pool(bottom, ks, stride=1):
    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)

def build_net(data, label=None, train=True, num_classes=2,
             classifier_name='fc8', learn_all=False):
    n = caffe.NetSpec()
    n.data = data
    param = learned_param if learn_all else frozen_param
    n.conv1 = conv(n.data, 5, 4, stride=1, param=param)
    n.pool1 = max_pool(n.conv1, 2, stride=2)
    n.conv2 = conv(n.pool1, 5, 2, stride=1, param=param)
    n.pool2 = max_pool(n.conv2, 2, stride=2)
    ip1, relu1 = fc_relu(n.pool2, nout=10, param=learned_param)
    ip2 = L.InnerProduct(ip1, num_output=2, param=learned_param)

    if not train:
        n.probs = L.Softmax(ip2)

    if label is not None:
        n.label = label
        n.loss = L.SoftmaxWithLoss(ip2, n.label)
        n.acc = L.Accuracy(ip2, n.label)
    # write the net to a temporary file and return its filename
    with tempfile.NamedTemporaryFile(delete=False) as f:
        f.write(str(n.to_proto()))
        return f.name

Define a function `vehicle_net` which calls `build_net` on data from the training dataset.

The new network will also have the vehicle net architecture:

- the input is the vehicle training data we have in the dir images/train/, provided by an `ImageData` layer
- the output is a distribution over 2 classes

In [7]:
def vehicle_net(train=True, learn_all=False, subset=None, batch_size=1050):
    if subset is None:
        subset = 'train' if train else 'test'
    source = '%s.txt' % subset
    vehicle_data, vehicle_label = L.ImageData(
        source=source,
        batch_size=batch_size,  ntop=2, is_color=False, shuffle=False)
    if train:
        learn_all = learn_all
        label = vehicle_label
    else:
        learn_all = False
        label = None
    return build_net(data=vehicle_data, label=label, train=train,
                    num_classes=2,
                    classifier_name='vehicle',
                    learn_all=learn_all)

Use the `vehicle_net` function defined above to initialize the vehicle net, with input images from the training dataset.

Call `forward` to get a batch of vehicle training data.

#### Running the net

In [9]:
def train_vehicle_net():

    niter = 200  # number of iterations to train

    vehicle_solver_filename = solver(vehicle_net(train=True))
    vehicle_solver = caffe.get_solver(vehicle_solver_filename)

    print 'Running solvers for %d iterations...' % niter
    solvers = [('scratch', vehicle_solver)]
    loss, acc, weights = run_solvers(niter, solvers)
    print 'Done.'

    train_loss = loss['scratch']
    train_acc = acc['scratch']
    vehicle_weights = weights['scratch']

    print 'train loss:', train_loss
    print 'train acc:', train_acc
    print 'vehicle weights:', vehicle_weights

def solver(train_net_path, test_net_path=None, base_lr=0.001):
    s = caffe_pb2.SolverParameter()

    s.train_net = train_net_path
    if test_net_path is not None:
        s.test_net.append(test_net_path)
        s.test_interval = 100  # Test after every 100 training iterations.
        s.test_iter.append(100) # Test on 100 batches each time we test.

    # The number of iterations over which to average the gradient.
    # Effectively boosts the training batch size by the given factor, without
    # affecting memory utilization.
    s.iter_size = 1
    s.max_iter = 100000     # # of times to update the net (training iterations)

    # Solve using the stochastic gradient descent (SGD) algorithm.
    s.type = 'SGD'

    # Set the initial learning rate for SGD.
    s.base_lr = base_lr

    s.lr_policy = 'inv'
    s.gamma = 0.0001
    s.power = 0.75

    s.momentum = 0.9
    s.weight_decay = 5e-4

    # Display the current training loss and accuracy every 10 iterations.
    s.display = 10

    # Snapshots are files used to store networks we've trained.  Here, we'll
    # snapshot every 10K iterations -- ten times during training
    # -- as long as we have that much data to train.
    s.snapshot = 10000
    s.snapshot_prefix = caffe_root + 'models/finetune_vehicle/finetune_vehicle'

    # Train on the GPU.  Using the CPU to train large networks is very slow.
    s.solver_mode = caffe_pb2.SolverParameter.GPU

    # Write the solver to a temporary file and return its filename.
    with tempfile.NamedTemporaryFile(delete=False) as f:
        f.write(str(s))
        return f.name

def run_solvers(niter, solvers, disp_interval=10):
    blobs = ('loss', 'acc')
    loss, acc = ({name: np.zeros(niter) for name, _ in solvers}
                 for _ in blobs)
    for it in range(niter):
        for name, s in solvers:
            s.step(1)  # run a single SGD step in Caffe
            loss[name][it], acc[name][it] = (s.net.blobs[b].data.copy()
                                             for b in blobs)
            loss_disp = '; '.join('%s: loss=%.3f, acc=%2d%%' %
                                  (n, loss[n][it], np.round(100*acc[n][it]))
                                  for n, _ in solvers)
            print '%3d) %s' % (it, loss_disp)
    # Save the learned weights from nets.
    weight_dir = tempfile.mkdtemp()
    weights = {}
    for name, s in solvers:
        filename = 'weights.%s.caffemodel' % name
        weights[name] = os.path.join(weight_dir, filename)
        s.net.save(weights[name])
    return loss, acc, weights

run_solvers runs n numbers of iterations on solvers. Solver represents the network, runs it, saves it. I tried different solvers so that here in run_solvers it takes a list of solvers as param.

After 200 iterations, the accuracy of the network is 100%...


### 3. Running the classifier on test data

Now, we'll run the classifier on the test data set and generate the result csv file.

In [15]:
def eval_vehicle_net(weights):
    file_names = []
    with open('test.txt', 'r') as sourcefile:
        for line in sourcefile:
            file_names.append(line.split(' ')[0])
    batch_size = len(file_names)
    test_net = caffe.Net(vehicle_net(train=False, batch_size=batch_size), weights, caffe.TEST)
    test_net.forward()
    data_batch = test_net.blobs['data'].data.copy()
    classifications= []
    for i in range(0, batch_size):
        image = data_batch[i]
        label = disp_vehicle_preds(test_net, image)
        classifications.append([file_names[i].split('/')[-1], label])
    with open('classifications.csv', 'w') as f:
        writer = csv.writer(f)
        writer.writerows(classifications)

weights = 'weights.vehicle.caffemodel'
eval_vehicle_net(weights)

### 4. Visualize the network architecture

![](vehicle_network_structure.jpg?raw=true)

I'm using only two layers of convolution, because the input dataset is small, the input pixel size is 40\*100 for every image, so that there are 4000 input params. With kernel size 5 and output size as 4 for the first convolutional layer, we have (40-5+1)\*(100-5+1)\*4 = 36\*96 params in the first convolutional layer. Same calculation applies for the next conv layer. We have to make sure that number of params adding together is smaller than number of input pixels (for all input training set images), or else we'll underfit.