# Neural Network Foundations

In [1]:
import tensorflow as tf

In [2]:
tf.logging.set_verbosity(tf.logging.ERROR)

In [3]:
from keras.models import Sequential

Using TensorFlow backend.


In [4]:
from keras.layers import Dense

In [5]:
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='random_uniform'))

The net is dense, meaning that each neuron in a layer is connected
to all neurons located in the previous layer and to all the neurons
in the following layer.

### Problems in Training the Perceptron and a Solution

We cannot simply use accuracy as the metric for optimization, because it is not continuous, we need a function that progressively changes from 0 to 1 with no discontinuity.

Mathematically, this means that we need
a continuous function that allows us to compute the derivative

Activation function
- sigmoid
- rectified linear unit (ReLU)

### A real example — recognizing handwritten digits

In [6]:
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
from keras.utils import np_utils

In [7]:
np.random.seed(1671)

In [8]:
nb_epoch = 20
batch_size = 10000
verbose = 1
nb_classes = 10
optimizer = SGD()
validation_split = 0.2

In [9]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [10]:
X_train = X_train.reshape(60000, 784).astype('float32')
X_test = X_test.reshape(10000, 784).astype('float32')
X_train = X_train / 255
X_test = X_test / 255
print(X_train.shape[0], 'training samples')
print(X_test.shape[0], 'testing samples')

60000 training samples
10000 testing samples


In [11]:
y_train = np_utils.to_categorical(y_train, num_classes=10)
y_test = np_utils.to_categorical(y_test, num_classes=10)

In [12]:
model = Sequential()
model.add(Dense(nb_classes, input_shape=(784,)))
model.add(Activation('softmax'))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 10)                7850      
_________________________________________________________________
activation_1 (Activation)    (None, 10)                0         
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________


In [13]:
model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

In [16]:
%%time
history = model.fit(X_train, y_train, epochs=nb_epoch, verbose=True, validation_split=validation_split)

Train on 48000 samples, validate on 12000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
CPU times: user 1min 4s, sys: 10 s, total: 1min 14s
Wall time: 46.5 s


In [17]:
model.evaluate(X_test, y_test)



[0.2939101231336594, 0.9189]

### Add more hidden layers to improve the performance

In [18]:
nb_epoch = 20
batch_size = 128
verbose = 1
nb_classes = 10
optimizer = SGD()
validation_split = 0.2
n_hidden = 128

In [19]:
model = Sequential(layers=[
    Dense(n_hidden, input_shape=(784,)),
    Activation('relu'),
    Dense(n_hidden),
    Activation('relu'),
    Dense(nb_classes),
    Activation('softmax')
])
model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=batch_size, epochs=nb_epoch, validation_split=validation_split)

Train on 48000 samples, validate on 12000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0xb2acb5c18>

In [21]:
model.evaluate(X_test, y_test)



[0.1923598583072424, 0.945]

### Further improving the model with dropout

In [22]:
from keras.layers.core import Dropout

In [23]:
nb_epoch = 250
batch_size = 128
nb_classes = 10
optimizer = SGD()
validation_split = 0.2
n_hidden = 128
dropout = 0.3

model = Sequential(layers=[
    Dense(n_hidden),
    Activation('relu'),
    Dropout(dropout),
    Dense(n_hidden),
    Activation('relu'),
    Dropout(dropout),
    Dense(nb_classes),
    Activation('softmax')
])

model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=batch_size, epochs=nb_epoch, validation_split=validation_split)

Train on 48000 samples, validate on 12000 samples
Epoch 1/250
Epoch 2/250
Epoch 3/250
Epoch 4/250
Epoch 5/250
Epoch 6/250
Epoch 7/250
Epoch 8/250
Epoch 9/250
Epoch 10/250
Epoch 11/250
Epoch 12/250
Epoch 13/250
Epoch 14/250
Epoch 15/250
Epoch 16/250
Epoch 17/250
Epoch 18/250
Epoch 19/250
Epoch 20/250
Epoch 21/250
Epoch 22/250
Epoch 23/250
Epoch 24/250
Epoch 25/250
Epoch 26/250
Epoch 27/250
Epoch 28/250
Epoch 29/250
Epoch 30/250
Epoch 31/250
Epoch 32/250
Epoch 33/250
Epoch 34/250
Epoch 35/250
Epoch 36/250
Epoch 37/250
Epoch 38/250
Epoch 39/250
Epoch 40/250
Epoch 41/250
Epoch 42/250
Epoch 43/250
Epoch 44/250
Epoch 45/250
Epoch 46/250
Epoch 47/250
Epoch 48/250
Epoch 49/250
Epoch 50/250
Epoch 51/250
Epoch 52/250
Epoch 53/250
Epoch 54/250
Epoch 55/250
Epoch 56/250
Epoch 57/250
Epoch 58/250
Epoch 59/250
Epoch 60/250
Epoch 61/250
Epoch 62/250
Epoch 63/250
Epoch 64/250
Epoch 65/250
Epoch 66/250
Epoch 67/250
Epoch 68/250
Epoch 69/250
Epoch 70/250
Epoch 71/250
Epoch 72/250
Epoch 73/250
Epoch 74/2

<keras.callbacks.History at 0xb2d95be80>

In [24]:
model.evaluate(X_test, y_test)



[0.07172030183279421, 0.9789]

### Testing different optimizers

### Increasing the number of epochs

### Increasing the size of batch computation