Skip to content
Permalink
Browse files

Initial commit including sources and README.

  • Loading branch information...
davidstutz committed Mar 31, 2014
0 parents commit 141b44e03afa8fd18cf57359bd4b7cfd9c065b79
@@ -0,0 +1,93 @@
# Recognizing Handwritten Digits using a Two-layer Perceptron

In course of a seminar on “Selected Topics in Human Language Technology and Pattern Recognition”, I wrote a seminar paper on neural networks: "Introduction to Neural Networks". The seminar paper and the slides of the corresponding talk can be found in my blog article: [Seminar Paper “Introduction to Neural Networks”](http://davidstutz.de/seminar-paper-introduction-neural-networks/). Background on neural networks and the two-layer perceptron can be found in my seminar paper.

## MNIST Dataset

The [MNIST dataset](http://yann.lecun.com/exdb/mnist/) provides a training set of 60,000 handwritten digits and a validation set of 10,000 handwritten digits. The images have size 28 x 28 pixels. Therefore, when using a two-layer perceptron, we need 28 x 28 = 784 input units and 10 output units (representing the 10 different digits).

The methods `loadMNISTImages` and `loadMNISTLaels` are used to load the MNIST dataset as it is stored in a special file format. The methods can be found online at [http://ufldl.stanford.edu/wiki/index.php/Using_the_MNIST_Dataset](http://ufldl.stanford.edu/wiki/index.php/Using_the_MNIST_Dataset).

## Methods and Usage

The main method to train the two-layer perceptron is `trainStochasticSquaredErrorTwoLayerPerceptron`. The method applies stochastic training (or to be precise a stochastic variant of mini-batch training) using the sum-of-squared error function and the error backpropagation algorithm.

function [hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate)
% trainStochasticSquaredErrorTwoLayerPerceptron Creates a two-layer perceptron
% and trains it on the MNIST dataset.
%
% INPUT:
% activationFunction : Activation function used in both layers.
% dActivationFunction : Derivative of the activation
% function used in both layers.
% numberOfHiddenUnits : Number of hidden units.
% inputValues : Input values for training (784 x 60000)
% targetValues : Target values for training (1 x 60000)
% epochs : Number of epochs to train.
% batchSize : Plot error after batchSize images.
% learningRate : Learning rate to apply.
%
% OUTPUT:
% hiddenWeights : Weights of the hidden layer.
% outputWeights : Weights of the output layer.
%

The above method requires the activation function used for both the hidden and the output layer to be given as parameter. I used the logistic sigmoid activation function:

function y = logisticSigmoid(x)
% simpleLogisticSigmoid Logistic sigmoid activation function
%
% INPUT:
% x : Input vector.
%
% OUTPUT:
% y : Output vector where the logistic sigmoid was applied element by
% element.
%

In addition, the error backpropagation algorithm needs the derivative of the used activation function:

function y = dLogisticSigmoid(x)
% dLogisticSigmoid Derivative of the logistic sigmoid.
%
% INPUT:
% x : Input vector.
%
% OUTPUT:
% y : Output vector where the derivative of the logistic sigmoid was
% applied element by element.
%

The method `applyStochasticSquaredErrorTwoLayerPerceptronMNIST` uses both the training method seen above and the method `validateTwoLayerPerceptron` to evaluate the performance of the two-layer perceptron:

function [correctlyClassified, classificationErrors] = validateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputValues, labels)
% validateTwoLayerPerceptron Validate the twolayer perceptron using the
% validation set.
%
% INPUT:
% activationFunction : Activation function used in both layers.
% hiddenWeights : Weights of the hidden layer.
% outputWeights : Weights of the output layer.
% inputValues : Input values for training (784 x 10000).
% labels : Labels for validation (1 x 10000).
%
% OUTPUT:
% correctlyClassified : Number of correctly classified values.
% classificationErrors : Number of classification errors.
%

## License

Copyright 2013 - 2014 David Stutz

The application is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

The application is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

See <http://www.gnu.org/licenses/>.
@@ -0,0 +1,45 @@
function [] = applyStochasticSquaredErrorTwoLayerPerceptronMNIST()
%applyStochasticSquaredErrorTwoLayerPerceptronMNIST Train the two-layer
%perceptron using the MNIST dataset and evaluate its performance.

% Load MNIST.
inputValues = loadMNISTImages('train-images.idx3-ubyte');
labels = loadMNISTLabels('train-labels.idx1-ubyte');

% Transform the labels to correct target values.
targetValues = 0.*ones(10, size(labels, 1));
for n = 1: size(labels, 1)
targetValues(labels(n) + 1, n) = 1;
end;

% Choose form of MLP:
numberOfHiddenUnits = 700;

% Choose appropriate parameters.
learningRate = 0.1;

% Choose activation function.
activationFunction = @logisticSigmoid;
dActivationFunction = @dLogisticSigmoid;

% Choose batch size and epochs. Remember there are 60k input values.
batchSize = 100;
epochs = 500;

fprintf('Train twolayer perceptron with %d hidden units.\n', numberOfHiddenUnits);
fprintf('Learning rate: %d.\n', learningRate);

[hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate);

% Load validation set.
inputValues = loadMNISTImages('t10k-images.idx3-ubyte');
labels = loadMNISTLabels('t10k-labels.idx1-ubyte');

% Choose decision rule.
fprintf('Validation:\n');

[correctlyClassified, classificationErrors] = validateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputValues, labels);

fprintf('Classification errors: %d\n', classificationErrors);
fprintf('Correctly classified: %d\n', correctlyClassified);
end
@@ -0,0 +1,12 @@
function y = dLogisticSigmoid(x)
% dLogisticSigmoid Derivative of the logistic sigmoid.
%
% INPUT:
% x : Input vector.
%
% OUTPUT:
% y : Output vector where the derivative of the logistic sigmoid was
% applied element by element.
%
y = logisticSigmoid(x).*(1 - logisticSigmoid(x));
end
@@ -0,0 +1,26 @@
function images = loadMNISTImages(filename)
%loadMNISTImages returns a 28x28x[number of MNIST images] matrix containing
%the raw MNIST images

fp = fopen(filename, 'rb');
assert(fp ~= -1, ['Could not open ', filename, '']);

magic = fread(fp, 1, 'int32', 0, 'ieee-be');
assert(magic == 2051, ['Bad magic number in ', filename, '']);

numImages = fread(fp, 1, 'int32', 0, 'ieee-be');
numRows = fread(fp, 1, 'int32', 0, 'ieee-be');
numCols = fread(fp, 1, 'int32', 0, 'ieee-be');

images = fread(fp, inf, 'unsigned char');
images = reshape(images, numCols, numRows, numImages);
images = permute(images,[2 1 3]);

fclose(fp);

% Reshape to #pixels x #examples
images = reshape(images, size(images, 1) * size(images, 2), size(images, 3));
% Convert to double and rescale to [0,1]
images = double(images) / 255;

end
@@ -0,0 +1,19 @@
function labels = loadMNISTLabels(filename)
%loadMNISTLabels returns a [number of MNIST images]x1 matrix containing
%the labels for the MNIST images

fp = fopen(filename, 'rb');
assert(fp ~= -1, ['Could not open ', filename, '']);

magic = fread(fp, 1, 'int32', 0, 'ieee-be');
assert(magic == 2049, ['Bad magic number in ', filename, '']);

numLabels = fread(fp, 1, 'int32', 0, 'ieee-be');

labels = fread(fp, inf, 'unsigned char');

assert(size(labels,1) == numLabels, 'Mismatch in label count');

fclose(fp);

end
@@ -0,0 +1,13 @@
function y = logisticSigmoid(x)
% simpleLogisticSigmoid Logistic sigmoid activation function
%
% INPUT:
% x : Input vector.
%
% OUTPUT:
% y : Output vector where the logistic sigmoid was applied element by
% element.
%

y = 1./(1 + exp(-x));
end
@@ -0,0 +1,9 @@
function [] = saveMNISTImages(images, n, k)
% saveMNISImages Saves the first every k-th image of the MNIST training
% data set up to n images.

for i = 1: n
imwrite(reshape(images(:,i*k), 28, 28), strcat('MNIST/', num2str(i*k), '.png'));
end;
end

BIN +7.48 MB t10k-images.idx3-ubyte
Binary file not shown.
BIN +9.77 KB t10k-labels.idx1-ubyte
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,74 @@
function [hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate)
% trainStochasticSquaredErrorTwoLayerPerceptron Creates a two-layer perceptron
% and trains it on the MNIST dataset.
%
% INPUT:
% activationFunction : Activation function used in both layers.
% dActivationFunction : Derivative of the activation
% function used in both layers.
% numberOfHiddenUnits : Number of hidden units.
% inputValues : Input values for training (784 x 60000)
% targetValues : Target values for training (1 x 60000)
% epochs : Number of epochs to train.
% batchSize : Plot error after batchSize images.
% learningRate : Learning rate to apply.
%
% OUTPUT:
% hiddenWeights : Weights of the hidden layer.
% outputWeights : Weights of the output layer.
%

% The number of training vectors.
trainingSetSize = size(inputValues, 2);

% Input vector has 784 dimensions.
inputDimensions = size(inputValues, 1);
% We have to distinguish 10 digits.
outputDimensions = size(targetValues, 1);

% Initialize the weights for the hidden layer and the output layer.
hiddenWeights = rand(numberOfHiddenUnits, inputDimensions);
outputWeights = rand(outputDimensions, numberOfHiddenUnits);

hiddenWeights = hiddenWeights./size(hiddenWeights, 2);
outputWeights = outputWeights./size(outputWeights, 2);

n = zeros(batchSize);

figure; hold on;

for t = 1: epochs
for k = 1: batchSize
% Select which input vector to train on.
n(k) = floor(rand(1)*trainingSetSize + 1);

% Propagate the input vector through the network.
inputVector = inputValues(:, n(k));
hiddenActualInput = hiddenWeights*inputVector;
hiddenOutputVector = activationFunction(hiddenActualInput);
outputActualInput = outputWeights*hiddenOutputVector;
outputVector = activationFunction(outputActualInput);

targetVector = targetValues(:, n(k));

% Backpropagate the errors.
outputDelta = dActivationFunction(outputActualInput).*(outputVector - targetVector);
hiddenDelta = dActivationFunction(hiddenActualInput).*(outputWeights'*outputDelta);

outputWeights = outputWeights - learningRate.*outputDelta*hiddenOutputVector';
hiddenWeights = hiddenWeights - learningRate.*hiddenDelta*inputVector';
end;

% Calculate the error for plotting.
error = 0;
for k = 1: batchSize
inputVector = inputValues(:, n(k));
targetVector = targetValues(:, n(k));

error = error + norm(activationFunction(outputWeights*activationFunction(hiddenWeights*inputVector)) - targetVector, 2);
end;
error = error/batchSize;

plot(t, error,'*');
end;
end
@@ -0,0 +1,69 @@
function [correctlyClassified, classificationErrors] = validateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputValues, labels)
% validateTwoLayerPerceptron Validate the twolayer perceptron using the
% validation set.
%
% INPUT:
% activationFunction : Activation function used in both layers.
% hiddenWeights : Weights of the hidden layer.
% outputWeights : Weights of the output layer.
% inputValues : Input values for training (784 x 10000).
% labels : Labels for validation (1 x 10000).
%
% OUTPUT:
% correctlyClassified : Number of correctly classified values.
% classificationErrors : Number of classification errors.
%

testSetSize = size(inputValues, 2);
classificationErrors = 0;
correctlyClassified = 0;

for n = 1: testSetSize
inputVector = inputValues(:, n);
outputVector = evaluateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputVector);

class = decisionRule(outputVector);
if class == labels(n) + 1
correctlyClassified = correctlyClassified + 1;
else
classificationErrors = classificationErrors + 1;
end;
end;
end

function class = decisionRule(outputVector)
% decisionRule Model based decision rule.
%
% INPUT:
% outputVector : Output vector of the network.
%
% OUTPUT:
% class : Class the vector is assigned to.
%

max = 0;
class = 1;
for i = 1: size(outputVector, 1)
if outputVector(i) > max
max = outputVector(i);
class = i;
end;
end;
end

function outputVector = evaluateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputVector)
% evaluateTwoLayerPerceptron Evaluate two-layer perceptron given by the
% weights using the given activation function.
%
% INPUT:
% activationFunction : Activation function used in both layers.
% hiddenWeights : Weights of hidden layer.
% outputWeights : Weights for output layer.
% inputVector : Input vector to evaluate.
%
% OUTPUT:
% outputVector : Output of the perceptron.
%

outputVector = activationFunction(outputWeights*activationFunction(hiddenWeights*inputVector));
end

3 comments on commit 141b44e

@masoudtala20

This comment has been minimized.

Copy link

replied Nov 22, 2014

hi my dear davide i want use these of functions in a main file for excute my project please write this m file with mattlab as a main function to use these functions in order to use handwritten digit for 60000 sample in mnist file thanks

@masoudtala20

This comment has been minimized.

Copy link

replied Nov 22, 2014

plse send answer to my email masoud.talabigi@gmail.com or ms_tala@aut.ac.ir

@davidstutz

This comment has been minimized.

Copy link
Owner Author

replied Nov 24, 2014

Excuse me, I do not fully understand what you are trying to achieve. Some more details would be helpful. For your convenience, you can of course add all the functions to a single .m file and use this file for your project. However, in my opinion this would not make a big difference.

Please sign in to comment.
You can’t perform that action at this time.