New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Convolution networking #14

Closed
davidrmiller opened this Issue Apr 15, 2015 · 1 comment

Comments

Projects
None yet
1 participant
@davidrmiller
Owner

davidrmiller commented Apr 15, 2015

The purpose of this issue is to discuss implementing convolution neural networking in neural2d. Comments are appreciated.

Neural2d currently implements convolution filtering, but not convolution networking.

Definitions


A convolution filter is an image processing operation such as edge detection, low pass filtering (smoothing), etc. In neural2d, any layer of neurons can be configured as a convolution filter. In a convolution filter layer, all the neurons share the same convolution kernel, which is just a matrix of weights. The weights do not undergo training; they remain constant throughout the life of the net.

A convolution neural network is somewhat similar to convolution filtering, except that the kernel weights undergo training, and the layer is replicated N times so that N separate kernels can be trained.

To be more precise, when we speak of a convolution neural net, we are usually talking about a regular neural net with one or more convolution network layers that form special subnets within the larger neural network.

It's unfortunate that the terms are so similar -- "convolution filtering" vs. "convolution networking." Is there a better terminology we can use?


Purpose

The purpose of convolution neural networking is to uncover a number of image features during training that help with pattern recognition. Commonly, image features are little patches of pixels that look like bits of edges, corners, or other shapes. Ideally, all the features found during training are highly uncorrelated, forming a set of orthogonal mathematical bases that can help partition the classification space.

The output from one convolution network layer can become the input to another convolution network layer, so that each convolution stage can find higher level features.

Implementation plan

Here are the minimum required changes to implement convolution neural networking in neural2d:

  • We would need to devise a syntax to specify a convolution network in the topology configuration file.
  • Expand the Layer structure to include the concept of layer depth.
  • The backprop code would need to update the kernel weights in each convolution networking layer.

Additionally, it would be useful to implement the following features:

  • Rectified linear units (ReLU) -- this is a type of transfer function that goes with convolution networking like chocolate and peanut butter.
  • Pooling -- this is the reduction of the outputs of a regular layer, or convolution network layer into a new layer with fewer neurons, by passing a windowed filter (max or avg) over the source layer.

These sub-tasks are discussed below.

Topology configuration file syntax

A convolution filter layer can be thought of as a degenerate convolution network layer with a depth of 1, and no weight updates during backprop.

The existing topology configuration syntax for convolution filtering could be expanded to allow a depth parameter. For example, for a convolution network layer of 10 kernels:

 input size 32x32
 layerConv size 10*32x32 from input convolve 2x2

The 10 * specifies that this is a convolution network layer with 10 kernels instead of a convolution filter layer with a single kernel. The 2x2 is the size of the convolution kernels to be trained. Specifying the size of the convolution kernels (instead of the curly brace syntax) implies that these kernels are to be included in backprop weight learning.

Expand the Layer data structure

Currently a single layer contains a two-dimensional set of neurons. That can be expanded to implement a layer depth. If a layer is configured as a convolution network layer, then the layer instance will be replicated N times to train N kernels. We'll refer to as the layer depth. (Regular layers of neurons and convolution filter layers have a default depth of 1.)

Adding a depth dimension to the layer structure will make it a little more complicated to loop over neurons during forward or back propagation. Perhaps the Layer class should provide a variety of iterators to hide those details. That would be a nice little refactoring project that could be done before the other changes.

Backprop weight updates

Currently, neural2d does not update input weights on layers flagged as convolution filter layers. For convolution network layers, we would need to add a backprop function to update the weights in each convolution network layer.

Rectified linear units

Here is one possible ReLU transfer function and its derivative:

 f(x) = ln(1 + exp(x))
 f'(x) = 1 / (1 + exp(-x))

Care must be given to the derivative at zero.

This would be just another transfer function specified by the "tf" parameter in the topology config file.

Pooling the output

To pool the output of a convolution network layer, define a new layer smaller than the convolution layer with the convolution layer as the input, and give it the new "pool" parameter. The pool parameter takes a pooling method (max or avg), and a window size over which to apply the pooling method:

 layerPool size 10*8x8 from layerConv pool max 2x2

A pooling layer may also take its input from a regular layer of depth 1.

Example

Following is an example of a convolution network pipeline using the syntax proposed above:

input size 64x64
layerConvolve1 size 10*64x64 from input convolve 7x7
layerPool1 size 10*16x16 from layerConvolve1 pool max 4x4
layerConvolve2 size 10*16x16 from layerPool1 convolve 5x5
layerPool2 size 10*8x8 from layerConvolve2 pool max 2x2
layerReduce size 10 from layerPool2 pool
layerHidden size 10 from layerReduce
output size 3 from layerHidden
@davidrmiller

This comment has been minimized.

Show comment
Hide comment
@davidrmiller

davidrmiller May 14, 2015

Owner

Here are some design notes collected during the implementation of [#14]. These changes will soon be checked in:

  • The Layer class is now polymorphic, corresponding to the four layer types: regular, convolution filtering, convolution networking, and pooling. Much of the layer-specific code that used to be in class Net or class Neuron is now in the appropriate class derived from Layer.
  • In the Layer class, the container named neurons was changed from a vector of float to a vector of vector of float. The outer container now corresponds to the depth of the layer. Each element of the the inner container is a 2D plane of neurons flattened into a 1D array. Depth is one additional dimension of topology within a layer. Convolution filter layers and regular layers have a depth of 1; convolution network and pooling layers have a depth > 1.
  • Convolution networking layers have been implemented and a few superficial tests added to unitTest.cpp. Convolution nets do slowly converge on a solution, but I'm not sure that I implemented the math correctly in the kernel weight updates during back propagation. I hope somebody will review that.

A convolution network layer can be defined in the topology config file by specifying a depth on the size parameter and specifying the convolution kernel size with a convolve parameter. For example, to train 40 kernels of size 7x7 on an input image of 64x64 pixels:

  input size 64x64
  layerConv size 40*64x64 from input convolve 7x7
  . . .
  • Pooling layers have been implemented for operators avg and max, and tests added to unitTest.cpp. A pooling layer is defined in the topology config file by specifying a pool parameter on a layer. Pooling layers can take their input from any other kind of layer of equal depth, or from a regular layer of depth 1.

In the topology config syntax, the pool parameter requires the argument "avg" or "max" followed by the operator size. For example, to pool a 40-kernel layer of 64x64 neurons into a layer of 16x16 neurons:

  input size 64x64
  layerConv size 40*64x64 from input convolve 7x7
  layerPool size 40*16x16 from layerConv pool max 4x4
  . . .
  • Previously, the constant weights associated with a convolution filter layer were copied into all the relevant Connection records. That has changed so that now all convolution filter layers and convolution network layers store their convolution matrix (weights) in a container in the Layer object.

There are some performance considerations in addition to the extra level of indirection of the Connection records. In convolution filtering and convolution networking layers, the weight and gradient members of the Connection records are unused, reducing cache efficiency when looping over the connections. We could define different kinds of connection records derived from a base class Connection, but making class Connection virtual immediately adds space overhead to every instance for the vtable, negating any advantages we hoped to achieve. So, I'm keeping class Connection a non-virtual POD object even though it is not as cache-efficient as possible for convolution layers.

  • The BMP files had been using a different coordinate system (inverted Y) than the coordinates of the neurons in the input layer. The result was as if the input images were mirrored in the vertical axis. I don't think that would have any effect on training a net, except in the case where a convolution filter layer had been defined with a kernel asymmetric in the Y axis. Now the upper left corner of the image is the origin (0, 0) for the image data as well as the input layer of neurons.
  • In unitTest.cpp, added a flag NNet::StopAtFirstError, defaults to false.
  • Various other minor refactoring. No operational or functional changes to the console program or to the public interface of class Net.
Owner

davidrmiller commented May 14, 2015

Here are some design notes collected during the implementation of [#14]. These changes will soon be checked in:

  • The Layer class is now polymorphic, corresponding to the four layer types: regular, convolution filtering, convolution networking, and pooling. Much of the layer-specific code that used to be in class Net or class Neuron is now in the appropriate class derived from Layer.
  • In the Layer class, the container named neurons was changed from a vector of float to a vector of vector of float. The outer container now corresponds to the depth of the layer. Each element of the the inner container is a 2D plane of neurons flattened into a 1D array. Depth is one additional dimension of topology within a layer. Convolution filter layers and regular layers have a depth of 1; convolution network and pooling layers have a depth > 1.
  • Convolution networking layers have been implemented and a few superficial tests added to unitTest.cpp. Convolution nets do slowly converge on a solution, but I'm not sure that I implemented the math correctly in the kernel weight updates during back propagation. I hope somebody will review that.

A convolution network layer can be defined in the topology config file by specifying a depth on the size parameter and specifying the convolution kernel size with a convolve parameter. For example, to train 40 kernels of size 7x7 on an input image of 64x64 pixels:

  input size 64x64
  layerConv size 40*64x64 from input convolve 7x7
  . . .
  • Pooling layers have been implemented for operators avg and max, and tests added to unitTest.cpp. A pooling layer is defined in the topology config file by specifying a pool parameter on a layer. Pooling layers can take their input from any other kind of layer of equal depth, or from a regular layer of depth 1.

In the topology config syntax, the pool parameter requires the argument "avg" or "max" followed by the operator size. For example, to pool a 40-kernel layer of 64x64 neurons into a layer of 16x16 neurons:

  input size 64x64
  layerConv size 40*64x64 from input convolve 7x7
  layerPool size 40*16x16 from layerConv pool max 4x4
  . . .
  • Previously, the constant weights associated with a convolution filter layer were copied into all the relevant Connection records. That has changed so that now all convolution filter layers and convolution network layers store their convolution matrix (weights) in a container in the Layer object.

There are some performance considerations in addition to the extra level of indirection of the Connection records. In convolution filtering and convolution networking layers, the weight and gradient members of the Connection records are unused, reducing cache efficiency when looping over the connections. We could define different kinds of connection records derived from a base class Connection, but making class Connection virtual immediately adds space overhead to every instance for the vtable, negating any advantages we hoped to achieve. So, I'm keeping class Connection a non-virtual POD object even though it is not as cache-efficient as possible for convolution layers.

  • The BMP files had been using a different coordinate system (inverted Y) than the coordinates of the neurons in the input layer. The result was as if the input images were mirrored in the vertical axis. I don't think that would have any effect on training a net, except in the case where a convolution filter layer had been defined with a kernel asymmetric in the Y axis. Now the upper left corner of the image is the origin (0, 0) for the image data as well as the input layer of neurons.
  • In unitTest.cpp, added a flag NNet::StopAtFirstError, defaults to false.
  • Various other minor refactoring. No operational or functional changes to the console program or to the public interface of class Net.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment