# AIM-SR

Sign recognition demo using the BrainForge library

In [1]:
import numpy as np

from brainforge import Backpropagation, LayerStack
from brainforge import layers, optimizers

## Dataset

The dataset can either be downloaded and extracted to this project's data/ folder using the data/get.sh script or you can specify the dataset root below.

In [2]:
DATASET_ROOT = "data/train-52x52"

In [3]:
import streamer

In [4]:
stream = streamer.Stream(root=DATASET_ROOT, split_validation=0.2, image_format="channels_first")

 [Streamer] - Num train samples: 48000
 [Streamer] - Num val samples: 12000


## Model

The data will be fit by an Artificial Neural Network, more specifically a Fully Convolutional Neural Network, which has a relatively low number of parameters and thus (hopefully) generalizes better than a vanilla CNN with a Dense head.

The network weights will be optimized by Gradient Descent on the gradients determined by Backpropagation.

The task is a 12-clas

In [5]:
stack = LayerStack(stream.input_shape, layers=[
    
    layers.ConvLayer(nfilters=16, filterx=5, filtery=5, compiled=True),
    layers.PoolLayer(filter_size=2, compiled=True),
    layers.Activation("relu"),

    layers.ConvLayer(nfilters=32, filterx=5, filtery=5, compiled=True),
    layers.Activation("relu"),

    layers.ConvLayer(nfilters=32, filterx=5, filtery=5, compiled=True),
    layers.PoolLayer(filter_size=2, compiled=True),
    layers.Activation("relu"),

    layers.ConvLayer(nfilters=stream.NUM_CLASSES, filterx=5, filtery=5, compiled=True),

    layers.GlobalAveragePooling(),
    layers.Activation("softmax"),
])

## Training

The training takes about 2-3 hours and the network reaches over 99.9% accuracy on the validation set, which is unnaturally high and is caused probably by validation set which is highly similar to the training set.

Below are the parameters for the training. Previous experiments showed that 10 epochs are sufficient to reach convergence on this dataset. The relatively low batch size and high learning rate has ensures the network jumps out of smaller local minima and finds a good optimum with good generalization. Together with the fully convolutional nature of the architecture, this produces sufficient regularization, so no additional regularization was required.

In [6]:
EPOCHS = 10
BATCH_SIZE = 10
LEARNING_RATE = 1e-3
VALIDATION_INCREASE_FACTOR = 4  # divides steps per epoch and multiplies epochs

In [None]:
net = Backpropagation(layerstack=stack, cost="cxent", optimizer=optimizers.Adam(LEARNING_RATE))

net.fit_generator(stream.iter_subset("train", BATCH_SIZE),
                  lessons_per_epoch=stream.steps_per_epoch("train", BATCH_SIZE) // VALIDATION_INCREASE_FACTOR,
                  epochs=10 * VALIDATION_INCREASE_FACTOR,
                  metrics=["acc"],
                  validation=stream.iter_subset("val", BATCH_SIZE),
                  validation_steps=stream.steps_per_epoch("val", BATCH_SIZE))

# Save the weights as NumPy vector.
weights = net.get_weights(unfold=True)
np.save("AIM-SR-weights.npy", weights)

Epoch  1/40
Training Progress:   1.2%  cost: 2.4997 accuracy: 0.0533

## Testing

Below we set up some functions to aid testing the network on arbitrary input images.

In [None]:
NETWORK_WEIGHTS = "AIM-SR-weights.npy"

stack.set_weights(np.load(NETWORK_WEIGHTS), fold=True)

def preprocess_image(image):
    x = image / 255.  # Downscale to range 0. - 1.
    x = x.transpose((2, 0, 1))  # Convert to channels first
    return x[None, ...]  # Add a batch dimension

def execute_detection(image):
    x = preprocess_image(image)
    output = stack.feedforward(x)[0]  # eliminate batch dim
    prediction = np.argmax(output, axis=1)
    return prediction
