# How to Manually Optimize Perceptron Models

Here we are learning
1. How to develop the forward inference pass for neural network models from scratch.
2. How to optimize the weights of a Perceptron model for binary classification.
3. How to optimize the weights of a Multilayer Perceptron model using stochastic hill climbing.

## Optimize a Perceptron Model

It is a model of a single neuron that can be used for two-class classification problems and provides the foundation for later developing much larger networks.

In [1]:
# define a binary classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# summarize the shape of the dataset
print(X.shape, y.shape)

(1000, 5) (1000,)


The Perceptron model has a single node that has one input weight for each column in the dataset. Each input is multiplied by its corresponding weight to give a weighted sum and a bias weight is then added, like an intercept coefficient in a regression model. This weighted sum is called the activation. Finally, the activation is interpreted and used to predict the class label, 1 for a positive activation and 0 for a negative activation.

In [2]:
# transfer function
def transfer(activation):
    if activation >= 0.0:
        return 1
    return 0

The above transfer() function takes the activation of the model and returns a class label, class=1 for a positive or zero activation and class=0 for a negative activation. This is called a step transfer function.

In [3]:
# activation function
def activate(row, weights):
    # add the bias, the last weight
    activation = weights[-1]
    # add the weighted input
    for i in range(len(row)):
        activation += weights[i] * row[i]
    return activation

This function will take the row of data and the weights for the model and calculate the weighted sum of the input with the addition of the bias weight.

In [4]:
# use model weights to predict 0 or 1 for a given row of data
def predict_row(row, weights):
    # activate for input
    activation = activate(row, weights)
    # transfer for activation
    return transfer(activation)

In [5]:
# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, weights):
    yhats = list()
    for row in X:
        yhat = predict_row(row, weights)
        yhats.append(yhat)
    return yhats

In [7]:
from numpy.random import rand
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of weights
n_weights = X.shape[1] + 1
# generate random weights
weights = rand(n_weights)

In [8]:
# generate predictions for dataset
yhat = predict_dataset(X, weights)

In [9]:
from sklearn.metrics import accuracy_score
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

0.343


We can now optimize the weights of the dataset to achieve good accuracy on this dataset.

In [11]:
from sklearn.model_selection import train_test_split
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

The optimization algorithm requires an objective function to optimize. It must take a set of weights and return a score that is to be minimized or maximized corresponding to a better model.

In [12]:
# objective function
def objective(X, y, weights):
    # generate predictions for dataset
    yhat = predict_dataset(X, weights)
    # calculate accuracy
    score = accuracy_score(y, yhat)
    return score

### stochastic hill climbing algorithm
The algorithm will require an initial solution (e.g. random weights) and will iteratively keep making small changes to the solution and checking if it results in a better performing model. The amount of change made to the current solution is controlled by a step_size hyperparameter. This process will continue for a fixed number of iterations, also provided as a hyperparameter.

In [15]:
from numpy.random import randn
# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = solution + randn(len(solution)) * step_size
        # evaluate candidate point
        candidte_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidte_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidte_eval
            # report progress
            print('>%d %.5f' % (i, solution_eval))
    return [solution, solution_eval]

We can then call this function, passing in a set of weights as the initial solution and the training dataset as the dataset to optimize the model against.

In [16]:
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# determine the number of weights
n_weights = X.shape[1] + 1
# define the initial solution
solution = rand(n_weights)
# perform the hill climbing search
weights, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('Done!')
print('f(%s) = %f' % (weights, score))

>1 0.58955
>7 0.59851
>8 0.63881
>10 0.68806
>11 0.70896
>12 0.72388
>13 0.72388
>16 0.77761
>20 0.79552
>22 0.80299
>30 0.80896
>31 0.81642
>32 0.82985
>34 0.83284
>39 0.84328
>41 0.84328
>46 0.84328
>47 0.84627
>70 0.84776
>75 0.84925
>76 0.85075
>77 0.85075
>78 0.85373
>94 0.85672
>145 0.85821
>166 0.85821
>215 0.85821
>265 0.85970
>394 0.85970
>414 0.86119
>629 0.86119
>708 0.86119
>961 0.86269
Done!
f([ 0.0198308   0.07148138  1.0748972   0.22823332  0.3180039  -0.09307344]) = 0.862687


Finally, we can evaluate the best model on the test dataset and report the performance

In [17]:
# generate predictions for the test dataset
yhat = predict_dataset(X_test, weights)
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))

Test Accuracy: 86.36364


## Optimize a Multilayer Perceptron
A Multilayer Perceptron (MLP) model is a neural network with one or more layers, where each layer has one or more nodes.

In [24]:
from math import exp
# transfer function
def transfer(activation):
    # sigmoid transfer function
    return 1.0 / (1.0 + exp(-activation))

In [25]:
# activation function for a network
def predict_row(row, network):
    inputs = row
    # enumerate the layers in the network from input to output
    for layer in network:
        new_inputs = list()
        # enumerate nodes in the layer
        for node in layer:
            # activate the node
            activation = activate(inputs, node)
            # transfer activation
            output = transfer(activation)
            # store output
            new_inputs.append(output)
        # output from this layer is input to the next layer
        inputs = new_inputs
    return inputs[0]

In [26]:
# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):
    yhats = list()
    for row in X:
        yhat = predict_row(row, network)
        yhats.append(yhat)
    return yhats

In [27]:
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]

In [28]:
# generate predictions for dataset
yhat = predict_dataset(X, network)

In [29]:
# round the predictions
yhat = [round(y) for y in yhat]

In [30]:
# calculate accuracy
score = accuracy_score(y, yhat)
print(score)

0.499


Next, we can apply the stochastic hill climbing algorithm to the dataset.

For this, we will develop a new function that creates a copy of the network and mutates each weight in the network while making the copy.

In [31]:
# take a step in the search space
def step(network, step_size):
    new_net = list()
    # enumerate layers in the network
    for layer in network:
        new_layer = list()
        # enumerate nodes in this layer
        for node in layer:
            # mutate the node
            new_node = node.copy() + randn(len(node)) * step_size
            # store node in layer
            new_layer.append(new_node)
        # store layer in network
        new_net.append(new_layer)
    return new_net

In [32]:
# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = step(solution, step_size)
        # evaluate candidate point
        candidte_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidte_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidte_eval
            # report progress
            print('>%d %f' % (i, solution_eval))
    return [solution, solution_eval]

In [35]:
# stochastic hill climbing to optimize a multilayer perceptron for classification
from math import exp
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# transfer function
def transfer(activation):
    # sigmoid transfer function
    return 1.0 / (1.0 + exp(-activation))

# activation function
def activate(row, weights):
    # add the bias, the last weight
    activation = weights[-1]
    # add the weighted input
    for i in range(len(row)):
        activation += weights[i] * row[i]
    return activation

# activation function for a network
def predict_row(row, network):
    inputs = row
    # enumerate the layers in the network from input to output
    for layer in network:
        new_inputs = list()
        # enumerate nodes in the layer
        for node in layer:
            # activate the node
            activation = activate(inputs, node)
            # transfer activation
            output = transfer(activation)
            # store output
            new_inputs.append(output)
        # output from this layer is input to the next layer
        inputs = new_inputs
    return inputs[0]

# use model weights to generate predictions for a dataset of rows
def predict_dataset(X, network):
    yhats = list()
    for row in X:
        yhat = predict_row(row, network)
        yhats.append(yhat)
    return yhats

# objective function
def objective(X, y, network):
    # generate predictions for dataset
    yhat = predict_dataset(X, network)
    # round the predictions
    yhat = [round(y) for y in yhat]
    # calculate accuracy
    score = accuracy_score(y, yhat)
    return score

# take a step in the search space
def step(network, step_size):
    new_net = list()
    # enumerate layers in the network
    for layer in network:
        new_layer = list()
        # enumerate nodes in this layer
        for node in layer:
            # mutate the node
            new_node = node.copy() + randn(len(node)) * step_size
            # store node in layer
            new_layer.append(new_node)
        # store layer in network
        new_net.append(new_layer)
    return new_net

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
    # evaluate the initial point
    solution_eval = objective(X, y, solution)
    # run the hill climb
    for i in range(n_iter):
        # take a step
        candidate = step(solution, step_size)
        # evaluate candidate point
        candidte_eval = objective(X, y, candidate)
        # check if we should keep the new point
        if candidte_eval >= solution_eval:
            # store the new point
            solution, solution_eval = candidate, candidte_eval
            # report progress
            print('>%d %f' % (i, solution_eval))
    return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.1
# determine the number of inputs
n_inputs = X.shape[1]
# one hidden layer and an output layer
n_hidden = 10
hidden1 = [rand(n_inputs + 1) for _ in range(n_hidden)]
output1 = [rand(n_hidden + 1)]
network = [hidden1, output1]
# perform the hill climbing search
network, score = hillclimbing(X_train, y_train, objective, network, n_iter, step_size)
print('Done!')
print('Best: %f' % (score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, network)
# round the predictions
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))

>0 0.486567
>1 0.486567
>2 0.486567
>3 0.486567
>4 0.486567
>5 0.486567
>6 0.486567
>7 0.486567
>8 0.486567
>9 0.486567
>10 0.486567
>11 0.486567
>12 0.486567
>13 0.486567
>14 0.486567
>15 0.486567
>16 0.486567
>17 0.486567
>18 0.486567
>19 0.486567
>20 0.486567
>21 0.486567
>22 0.486567
>23 0.486567
>24 0.488060
>25 0.489552
>27 0.491045
>28 0.495522
>30 0.498507
>34 0.498507
>35 0.500000
>41 0.501493
>42 0.516418
>43 0.519403
>44 0.544776
>50 0.546269
>54 0.547761
>55 0.559701
>59 0.573134
>60 0.600000
>61 0.610448
>62 0.614925
>63 0.629851
>67 0.647761
>68 0.652239
>70 0.667164
>71 0.667164
>74 0.689552
>75 0.705970
>77 0.723881
>78 0.732836
>81 0.737313
>82 0.746269
>83 0.761194
>85 0.795522
>141 0.798507
>142 0.798507
>144 0.808955
>152 0.808955
>158 0.814925
>161 0.831343
>185 0.831343
>190 0.834328
>196 0.838806
>202 0.841791
>209 0.847761
>212 0.850746
>228 0.852239
>248 0.852239
>272 0.856716
>334 0.856716
>378 0.856716
>389 0.856716
>410 0.856716
>427 0.856716
>457 0.856716
>