In [1]:
import numpy as np
from keras.datasets import mnist 
from keras import Model
from keras.layers import Input, Dense
from keras.optimizers import SGD, Adam

import keras

Using TensorFlow backend.


## Loading data

In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [3]:
X_train = X_train.reshape(60000, 28 * 28).astype('float32') / 255
X_test = X_test.reshape(10000, 28 * 28).astype('float32') / 255
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)
n_classes = 10

print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

60000 train samples
10000 test samples


# Input as a matrix
Our input is stored as a matrix of shape **(samples, N_parameters)**

In [4]:
X_train.shape

(60000, 784)

In [5]:
y_train.shape

(60000, 10)

# Slicing Data to have a batch

In [6]:
batch_size = 128

In [7]:
batch = X_train[:batch_size]
batch.shape

(128, 784)

# Creating Input for NN
Let's create input for the NN and check its shape

In [8]:
from keras.layers import Input

input = Input(shape=(784,))
input.shape

TensorShape([Dimension(None), Dimension(784)])

You see **Dimension(None)** as a first dimension because Keras assume that you will have a batch as an input and **None** is the way it treats currently unknown batch size.

But you can create an input with a predefined batch size.

In [9]:
input_with_predefined_batch_size = Input(batch_shape=(None, 784))
input_with_predefined_batch_size.shape

TensorShape([Dimension(None), Dimension(784)])

# Creating a Dense layer
Dense is an Python object which has some parameters underneath it like `activation` or `weights` and others.

Let's create one

In [10]:
dense_layer = Dense(units=20, activation='relu')

In [11]:
dense_layer.activation

<function keras.activations.relu>

In [12]:
dense_layer.name

'dense_1'

**Weights are currently don't exist because Keras doesn't know the size of the input**

In [13]:
dense_layer.get_weights()

[]

**But if we apply our *intput* as an input it will immediately create weights**

In [14]:
output = dense_layer(input) 
W, b = dense_layer.get_weights()
print(W.shape)
print(b.shape)

(784, 20)
(20,)


In [15]:
input.shape

TensorShape([Dimension(None), Dimension(784)])

Let's check out weights

In [16]:
W

array([[ 0.00068538, -0.04857699, -0.01828426, ...,  0.07748862,
        -0.06513265, -0.05795071],
       [-0.00205623, -0.00357658,  0.07572918, ...,  0.01060609,
        -0.05427582,  0.02552512],
       [ 0.01380771, -0.08358473,  0.00861149, ...,  0.05522162,
         0.04254219,  0.06668249],
       ..., 
       [ 0.0543088 ,  0.03764609, -0.05569163, ..., -0.02976143,
        -0.02610113,  0.07869057],
       [ 0.0007116 , -0.07004897,  0.00306445, ...,  0.05331257,
         0.08208139, -0.002878  ],
       [-0.05740255,  0.04877225, -0.03829492, ..., -0.05502157,
        -0.02743964, -0.00472681]], dtype=float32)

In [17]:
b

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)

The way **W** and **b** are initialized is specified by `kernel_initializer` and `bias_initializer` parameters

In [18]:
dense_layer = Dense(20, kernel_initializer='glorot_uniform')
output = dense_layer(input)
W, b = dense_layer.get_weights()

In [19]:
W

array([[ 0.01439472, -0.08547954, -0.02212189, ..., -0.05895409,
        -0.01558902, -0.01598257],
       [ 0.00842161, -0.00561894,  0.07029867, ...,  0.04761259,
        -0.0202677 , -0.05342303],
       [ 0.02258281,  0.07764402, -0.07556458, ...,  0.00333908,
        -0.01894531, -0.01562199],
       ..., 
       [-0.01779757, -0.08238944, -0.04750065, ..., -0.05237376,
         0.01987977,  0.04878347],
       [-0.05955774,  0.08158387, -0.02777416, ..., -0.03329681,
        -0.0443873 , -0.00208463],
       [ 0.01618721, -0.05800885, -0.00615563, ..., -0.01020557,
        -0.05010811,  0.04970358]], dtype=float32)

# Keras Model
In Keras, to train a Neural Network we need create a model using inputs of NN and its outputs

In [20]:
input = Input(shape=(784,))

# creating the dense layer object
dense_layer = Dense(10, kernel_initializer='zeros', activation='softmax')

# applying dense layer to input
output = dense_layer(input)

model = Model(inputs=input, outputs=output)

## Keras model main methods

We can use even untrained NN to predict something, which will be mostly random

In [21]:
y_prediction = model.predict(X_train)
y_prediction.shape

(60000, 10)

Let's check how accurate that was

In [22]:
'Accuracy', np.mean(np.equal(np.argmax(y_prediction, axis=-1), np.argmax(y_train, axis=-1)))

('Accuracy', 0.098716666666666661)

To use the model more thoroughly we need to compile it with **model.compile()**

<img src="modelcompile.jpg"/>

In [23]:
from keras.metrics import categorical_accuracy

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=[categorical_accuracy])

You can actually compute accuracy right away using Keras

In [24]:
model.evaluate(X_train, y_train, batch_size=2000)



[2.3025989532470703, 0.098716666301091507]

# Training the model

We can train our model using **model.fit** function

<img src="modelfit.png"/>

In [25]:
model.fit(x=X_train, y=y_train, batch_size=16, epochs=4, validation_split=0.1, shuffle=True)

Train on 54000 samples, validate on 6000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1f268666d8>

# Computing Neural Network via Matrix Operations

Let's make a batch of data and pass it to network using only straight matrix operations

In [26]:
# Setting new batch size
batch_size = 64

# Slicing training data to obtain our batch
X_batch = X_train[:batch_size]
y_batch = y_train[:batch_size]

Let's check the shapes. It should have `batch_size` number of rows

In [27]:
print(X_batch.shape)
print(y_batch.shape)

(64, 784)
(64, 10)


We need to obtain weights from our trained NN from Keras as matrices

In [28]:
W, b = dense_layer.get_weights()  # obtaining W (weights) and b (biases)

Let's check out the shapes of the parameters
It should be consistent with number of inputs (784) and outputs (10)

In [29]:
print(W.shape)
print(b.shape)

(784, 10)
(10,)


Let's compute output of our Dense layer using pure matrix operations

In [30]:
output = X_batch @ W + b  # sign '@' stands for matrix-matrix (and matrix-vector) multiplications

Let's check the shape of the output, it should be consistent with the batch size and the number of outputs from the network

In [31]:
output.shape

(64, 10)

Now we need to apply `softmax` function to our output. Luckily, we don't have it in NumPy library so we need to write it ourselves.

Let's recap how it looks like:

$$\sigma(x)_i=\frac{e^{x_i}}{\sum_{k=1}^n e^{x_k}}$$

In NumPy we have $e^x$ function. We can compute it directly for matrix

In [32]:
numetrator = np.exp(output)

Let's compute denomenator which is a sum. Please note that we need to sum per our rows in `output`. For that we are going to use `axis` parameter which specifies according to what dimension ot sum up.

In [33]:
denominator = np.sum(np.exp(output), axis=1)

To be able to devide numerator by denominator directly we need to reshape it to matrix instead of vector to be able to use broadcasting. This is just a feature from NumPy, so you don't need to worry about that now, but if you are curious check it out [here](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html).

In [34]:
denominator = np.expand_dims(denominator, axis=1)

Final computation of softmax function:

In [35]:
nn_output = numetrator / denominator

Let's check the outputs from the first item in batch

In [36]:
nn_output[0]

array([  5.01630246e-04,   5.39006294e-07,   1.08486309e-03,
         2.01682478e-01,   4.75699977e-07,   7.95593858e-01,
         9.43825569e-07,   4.44057310e-04,   5.38417837e-04,
         1.52730106e-04], dtype=float32)

We can check whether it is looks like the output from the keras directly (**you can see that they are identical**)

In [37]:
keras_output = model.predict(X_batch)
keras_output[0]

array([  5.01629838e-04,   5.39007431e-07,   1.08486309e-03,
         2.01682359e-01,   4.75699608e-07,   7.95594037e-01,
         9.43826592e-07,   4.44057194e-04,   5.38417429e-04,
         1.52730077e-04], dtype=float32)

To obtain actual prediction labels we need to compute index of the max value of the output

In [38]:
nn_predictions = np.argmax(nn_output, axis=1)
batch_predictions = np.argmax(y_batch, axis=1)

Let's check accuracy on this batch

In [39]:
'Manual NN Accuracy', np.mean(nn_predictions == batch_predictions)

('Manual NN Accuracy', 0.96875)

Which is consistent with keras evaluation accuracy on this batch

In [40]:
'Keras Accuracy', model.evaluate(X_batch, y_batch)[1]



('Keras Accuracy', 0.96875)