# AIM-SR

*Sign recognition demo using the BrainForge library*

**Author**: Csaba Gór

This notebook illustrates the usage of the *BrainForge* Neural Network library. The library can be obtained by issuing the following *pip* command:

```pip install git+https://github.com/csxeba/brainforge.git```

Since this demonstration also depends on other packages, a *conda* environment descriptor *YaML* file is supplied (*env.yml*). This environment can be set up by issuing the following *conda* command:

```conda env create -f env.yml```

In [1]:
import numpy as np

from brainforge import Backpropagation, LayerStack
from brainforge import layers, optimizers

## Dataset

The dataset which is going to be fit is a **road sign recognition** dataset, which can either be downloaded and extracted to this project's *data/* folder using the *data/get.sh* script or the dataset root can also be specified below if the dataset is already available.

The model performance will be monitored on a validation subset, which is a 20% split from the training set. The validation set is always the last 20% of the images of every class, determined by the increasing sorting order of their file names.

In [2]:
DATASET_ROOT = "data/train-52x52"

In [3]:
import streamer

stream = streamer.Stream(root=DATASET_ROOT, split_validation=0.2, image_format="channels_first")

## Model

The data is fit by an *Artificial Neural Network*, more specifically a *Fully Convolutional Neural Network*, which has a relatively low number of parameters and thus (hopefully) generalizes better than a classic CNN with a Dense head.

The network weights are optimized by *Stochastic Gradient Descent* on the gradients determined by *Backpropagation*. The model output activation and loss functions are chosen so that they reflect the *multiclass classification* nature of the problem. The optimizer is chosen to be the *Adam* optimizer [Kigma & Ba, 2015](https://arxiv.org/abs/1412.6980), which is more-or-less a default choice for the optimizer and tends to perform adequatly.

In [5]:
stack = LayerStack(stream.input_shape, layers=[
    
    layers.ConvLayer(nfilters=16, filterx=5, filtery=5, compiled=True),
    layers.PoolLayer(filter_size=2, compiled=True),
    layers.Activation("relu"),

    layers.ConvLayer(nfilters=32, filterx=5, filtery=5, compiled=True),
    layers.Activation("relu"),

    layers.ConvLayer(nfilters=32, filterx=5, filtery=5, compiled=True),
    layers.PoolLayer(filter_size=2, compiled=True),
    layers.Activation("relu"),

    layers.ConvLayer(nfilters=stream.NUM_CLASSES, filterx=5, filtery=5, compiled=True),

    layers.GlobalAveragePooling(),
    layers.Activation("softmax"),
])

## Training

The training takes about 1.5 hours and the network reaches over 99.9% accuracy on the validation set, which is unnaturally high and is caused probably by the fact that the validation set is highly similar to the training set.

Below are the parameters for the training. Previous experiments showed that 6 epochs are sufficient to reach convergence on this dataset. The relatively low batch size and high learning rate has ensures the network jumps out of smaller local minima and finds a good optimum with good generalization. Together with the fully convolutional nature of the architecture, this produces sufficient regularization, so no additional regularization was required.

A validation increase factor is applied to better monitor the development of the target KPI, which is the classification accuracy on the validation set.

In [6]:
EPOCHS = 6
BATCH_SIZE = 10
LEARNING_RATE = 1e-3
VALIDATION_INCREASE_FACTOR = 4  # divides steps per epoch and multiplies epochs

In [7]:
net = Backpropagation(layerstack=stack, cost="cxent", optimizer=optimizers.Adam(LEARNING_RATE))

net.fit_generator(stream.iter_subset("train", BATCH_SIZE),
                  lessons_per_epoch=stream.steps_per_epoch("train", BATCH_SIZE) // VALIDATION_INCREASE_FACTOR,
                  epochs=EPOCHS * VALIDATION_INCREASE_FACTOR,
                  metrics=["acc"],
                  validation=stream.iter_subset("val", BATCH_SIZE),
                  validation_steps=stream.steps_per_epoch("val", BATCH_SIZE))

# Save the weights as NumPy vector.
weights = net.get_weights(unfold=True)
np.save("AIM-SR-weights.npy", weights)

Epoch  1/24
Training Progress: 100.0%  cost: 0.8197 accuracy: 0.6938 Validation cost: 0.4722 accuracy: 0.8012
 took 3.0 minutes
Epoch  2/24
Training Progress: 100.0%  cost: 0.3428 accuracy: 0.8802 Validation cost: 0.2664 accuracy: 0.9108
 took 3.0 minutes
Epoch  3/24
Training Progress: 100.0%  cost: 0.2280 accuracy: 0.9245 Validation cost: 0.2057 accuracy: 0.9420
 took 3.0 minutes
Epoch  4/24
Training Progress: 100.0%  cost: 0.1499 accuracy: 0.9583 Validation cost: 0.1402 accuracy: 0.9607
 took 3.0 minutes
Epoch  5/24
Training Progress: 100.0%  cost: 0.1073 accuracy: 0.9710 Validation cost: 0.0856 accuracy: 0.9761
 took 3.0 minutes
Epoch  6/24
Training Progress: 100.0%  cost: 0.0777 accuracy: 0.9828 Validation cost: 0.0753 accuracy: 0.9806
 took 3.0 minutes
Epoch  7/24
Training Progress: 100.0%  cost: 0.0619 accuracy: 0.9844 Validation cost: 0.0518 accuracy: 0.9879
 took 3.0 minutes
Epoch  8/24
Training Progress: 100.0%  cost: 0.0475 accuracy: 0.9898 Validation cost: 0.0366 accuracy: 0

KeyboardInterrupt: 

## Testing

Below we set up some functions to aid testing the network on arbitrary input images.

In [None]:
NETWORK_WEIGHTS = "AIM-SR-weights.npy"

stack.set_weights(np.load(NETWORK_WEIGHTS), fold=True)

def preprocess_image(image):
    x = image / 255.  # Downscale to range 0. - 1.
    x = x.transpose((2, 0, 1))  # Convert to channels first
    return x[None, ...]  # Add a batch dimension

def execute_detection(image: np.ndarray) -> int:
    """
    Runs preprocessing, executes the network and returns an integer label.
    Returned labels are indexed from 1, just like in the dataset.
    
    image: np.ndarray
        Single BGR image as a 3D numpy array in channels last format.
    """
    
    x = preprocess_image(image)
    output = stack.feedforward(x)[0]  # eliminate batch dim
    prediction = np.argmax(output, axis=1) + 1
    return prediction
