# Manually Optimize Neural Network Models

From Machine Learning Mastery.  

https://machinelearningmastery.com/manually-optimize-neural-networks/

The code below optimizes a single/multi-layer Perceptron using Hill Climb algorithm.  

This was a good practice to code up your own Perceptron and Neural Network - impractical, but still a really good exercise.  Then we apply Hill Climb algorithm to train the network.  We first do this on a single Perceptron, then we run it on Multi Layer perceptron (i.e. Neural Network). 

I don't see any advantage to Hill Climb over feed forward/back propagation, but Hill Climb is an interesting and fairly trivial way to find local maxima.


In [1]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from math import exp
import random
import numpy as np


## Make Data

In [2]:
DATA_SIZE = 1000
FEATURES = 5

In [3]:
# Make data with features.   Make 1 redundant and 2 useless, so num of features > 3
def make_data(data_size, features):
    assert(features > 3)
    X, y = make_classification(n_samples=data_size, 
                           n_features=features, 
                           n_informative=features-3, 
                           n_redundant=1)
    return(np.array(X), np.array(y))



## Initialize weight and bias

In [4]:
W = np.random.randn(DATA_SIZE)
b = np.random.rand(1)

## Perceptron Model
Perceptron model takes a row of data and:
* Activate: Multiplies weights and add bias, (A = X[i,:] * W + b) 
* Transfer: Classify the output T = ( A >= 0.0 ? 1 : 0 )

Code below uses numpy vectors

In [5]:
def activate(X, W, b):
    activation = np.dot(X, W)
    activation = np.add(activation, b)
    return(activation)

def transfer(activation):
    yhat = np.greater_equal(activation, 0.0)
    yhat = yhat.astype(int)
    return yhat

def predict(X, W, b):
    yhat = transfer(activate(X, W, b))
    return yhat

def objective(X, y, W, b):
    yhat = predict(X, W, b)
    score = accuracy_score(y, yhat)
    return score

### Test Perceptron model

Make prediction LOOP times and accumulate accuracy score, each time creating new data.  
Mean of the accuracy should be about 50% accuracy.

In [6]:
LOOP = 100
scores = np.zeros(LOOP)
for i in range(1, LOOP):
    W = np.random.randn(FEATURES)
    X, y = make_data(DATA_SIZE, FEATURES)    
    scores[i] = objective(X, y, W, b)

score = np.mean(scores)
print(score)

0.47986999999999996


# Stochastic Hill Climb

Stochastic Hill climb tries to find maxima by repeatedly moving in random direction and see if it is higher than current or not.

Note: I'm unclear about changing b together with W. i.e. changing slope and offset at the same time.  Should b changed separately with W and see if it improves the score?  


In [7]:
def hill_climb(X, y, objective, W, b, n_iter, step_size):
    # print('X shape %s, Y shape %s, W shape %s' % (X.shape, y.shape, W.shape))
    score = objective(X, y, W, b)
    
    for i in range(n_iter):
        W_next = W + (np.random.randn(FEATURES) * step_size)
        b_next = b + (np.random.rand() * step_size)
        score_next = objective(X, y, W_next, b_next)
        
        if score_next > score:
            W, b, score = W_next, b_next, score_next
            #print('>%d %.5f' % (i, score))
    return [W, b, score]

## Test Hill Climb to train single perceptron

In [8]:
NITER = 1000
STEP_SZ = 0.05
TRAIN_TEST_SPLIT = 0.33

X, y = make_data(DATA_SIZE, FEATURES)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TRAIN_TEST_SPLIT)

W = np.random.randn(X_train.shape[1])
b = np.random.rand(1)


In [9]:
W_trained, b_trained, score_trained = hill_climb(X_train, y_train, objective, W, b, NITER, STEP_SZ)
print('W_trained: %s \nb_trained: %s \n\nscore: %f' % (W_trained, b_trained, score_trained))


W_trained: [ 1.55605377  0.15436247 -0.88840323 -0.12033405  0.21669997] 
b_trained: [1.52066144] 

score: 0.631343


In [10]:
yhat = predict(X_test, W_trained, b_trained)
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %.5f' % (score * 100))

Test Accuracy: 62.72727


# Multi Layer Perceptron 


MLP is stacked (layered) perceptrons - Neural Network. 

When passing output of one layer to the next, we want to pass the binomial probability rather than binary (1 or 0).  For this, we use sigmoid as the transfer function.

In addition, there are multiple layers and each layer is made up of multiple nodes.  The nodes must be trained in a nested loop.
In reality, each perceptron is represented as weights and offset (W, b), so forward transfer in neural network is just a matter of:

* For each layer in network:
* Iterate over each node in a layer, calculating (input * W + b) and passing the result as input to next layer

I vectorized only the node calculation to follow the shapes of Weights and outputs, but actually the entire layer can be calculated in one shot.

In [11]:
def mlp_transfer(activation):
    return 1.0/(1.0+exp(-activation))

def mlp_predict(X, network):
    transfer_vectorized = np.vectorize(mlp_transfer)
    inputs = X
    for layer in network:
        to_next_layer = None
        for W_node, b_node in layer:
            output = transfer_vectorized(activate(inputs, W_node, b_node))
            to_next_layer = output if to_next_layer is None else np.vstack((to_next_layer, output))
        inputs = np.transpose(to_next_layer)
    return inputs

def mlp_objective(X, y, network):
    yhat = mlp_predict(X, network)
    yhat = [round(y) for y in yhat]
    score = accuracy_score(y, yhat)
    return score

## Test multi layer perceptron (network)

In [12]:
def test_mlp():
    X, y = make_classification(n_samples=1000, n_features=FEATURES, n_informative=2, n_redundant=1, random_state=1)

    # construct network
    n_hidden = 10
    hidden1 = [ (np.random.randn(FEATURES), np.random.rand(1)) for _ in range(n_hidden)]
    hidden2 = [ (np.random.randn(n_hidden), np.random.rand(1)) for _ in range(n_hidden)]
    output1 = [ (np.random.randn(n_hidden), np.random.rand(1)) ]
    network = [hidden1, hidden2, output1]

    score = mlp_objective(X, y, network)
    print('Test MLP score - should be near 0.5:  %.5f' % (score))
    
test_mlp()

Test MLP score - should be near 0.5:  0.49900


# Apply Stochastic Hill Climb to Multi Layer Perceptron

To take a step, we randomly change network weights and look at the results.

Instead of changing all the weights in the network, we randomly pick a layer, and make changes to the layer nodes.

In [13]:
def step(network, step_size):
    layer_to_change = random.choice(network)
    new_nw = list()
    for layer in network:
        new_layer = list()
        for node in layer:
            if layer is layer_to_change:
                new_W = node[0] + (np.random.randn(node[0].shape[0]) * step_size)
                new_b = node[1] + (np.random.randn(node[1].shape[0]) * step_size)
                new_node = (new_W, new_b)
            else:
                new_node = node
            new_layer.append(new_node)
        new_nw.append(new_layer)
    return new_nw
            


The hill climb logic simply takes a step by creating a randomly changed network, and see if score improves.  

If it does, use the new network and repeat.


In [14]:
def mlp_hill_climb(X, y, objective, network, n_iter, step_size):
    # print('X shape %s, Y shape %s, W shape %s' % (X.shape, y.shape, W.shape))
    score = objective(X, y, network)
    
    for i in range(n_iter):
        candidate = step(network, step_size)
        score_next = objective(X, y, candidate)
        
        if score_next > score:
            network, score = candidate, score_next
            print('>%d %f' % (i, score))
    return [network, score]

## Train MLP using Hill Climb

In [15]:
X, y = make_data(DATA_SIZE, FEATURES)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

# construct network
n_hidden = 20
hidden1 = [ (np.random.randn(FEATURES), np.random.rand(1)) for _ in range(n_hidden)]
hidden2 = [ (np.random.randn(n_hidden), np.random.rand(1)) for _ in range(n_hidden)]
output1 = [ (np.random.randn(n_hidden), np.random.rand(1)) ]
network = [hidden1, hidden2, output1]

n_iter = 3000
step_size = 0.2
network, score = mlp_hill_climb(X_train, y_train, mlp_objective, network, n_iter, step_size)
print('Training score: %f' % (score))

>0 0.502985
>1 0.532836
>3 0.619403
>5 0.632836
>7 0.641791
>9 0.655224
>11 0.691045
>13 0.708955
>18 0.735821
>22 0.737313
>30 0.765672
>32 0.800000
>37 0.801493
>46 0.810448
>47 0.828358
>53 0.847761
>56 0.852239
>62 0.855224
>69 0.859701
>70 0.865672
>87 0.870149
>113 0.873134
>117 0.877612
>124 0.880597
>129 0.882090
>131 0.888060
>132 0.897015
>221 0.904478
>222 0.910448
>251 0.916418
>339 0.917910
>352 0.922388
>375 0.928358
>417 0.932836
>439 0.940299
>512 0.941791
>535 0.947761
>1543 0.949254
>1575 0.950746
>1735 0.952239
>1765 0.953731
>1770 0.959701
>2114 0.962687
>2124 0.964179
>2268 0.965672
>2311 0.971642
>2471 0.973134
>2481 0.976119
>2570 0.977612
>2606 0.979104
>2621 0.982090
Training score: 0.982090


In [16]:
test_score = mlp_objective(X_test, y_test, network)
print('Test score: %.5f ' % (score))

Test score: 0.98209 


We see that Hill Climb has trained the network and able to return >85% accuracy on test data.