# Creating a Neural Network from Scratch with Autograd

---
## Introduction

---
## Our Network
We will be creating a neural network from scratch using `numpy` and `autograd`. This post is intended to be a fun experiment and get us a bit closer to underlying implementation than we usually are in Keras. In fact, when I first started this project I thought it would be cool if we did everything from scratch. I realized, however, that it is incredibly time consuming and it is far more important to be able to iterate quickly. 

There is one other thing that ~~sucks~~ is unpleasant when creating neural networks from scratch – backpropagation. In my opinion, backpropagation is the trickiest part to get right when implementing a neural network. Fortunately, we can use gradient checking to get an idea whether or not our implementation is correct, but that means we have to implement gradient checking. Today, instead of implementing backpropagation from scratch, we are going to use [`autograd`](https://github.com/HIPS/autograd) which is a neat little tool that can automatically compute derivatives of native Python and Numpy code. 

To install autograd, run the following
```bash
pip install autograd
```

---
## Loading the Data
We are going to use the Keras MNIST data. The following steps should look familiar.

In [13]:
from keras.datasets import mnist
from keras.utils import to_categorical

(train_data, train_labels), (test_data, test_labels) = mnist.load_data()

# Reshape
train_data = train_data.reshape(60000, 28*28)
test_data = test_data.reshape(10000, 28*28)

# Scale
train_data = train_data.astype('float32') / 255
test_data = test_data.astype('float32') / 255

# One-hot Encode
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Check Shapes
print(train_data.shape)
print(test_data.shape)
print(train_labels.shape)
print(test_labels.shape)

# Train / Val / Test
x_val = train_data[:10000]
x_train = train_data[10000:]
y_val = train_labels[:10000]
y_train = train_labels[10000:]

(60000, 784)
(10000, 784)
(60000, 10)
(10000, 10)


---
## Experiment

Imports.

In [14]:
import autograd.numpy as np 
from autograd.scipy.misc import logsumexp
from autograd import grad 
from autograd.misc.optimizers import adam

Initialize parameters.

In [33]:
def init_parameters(layer_sizes, scale):
    """Returns a list of (weights, bias) tuples representing NN."""
    return [(scale * np.random.randn(m, n), scale * np.zeros(n))
            for m, n in zip(layer_sizes[:-1], layer_sizes[1:])]

Forward propagation.

In [22]:
def feed_forward(params, inputs): 
    """Performs forward propagation for the given inputs."""
    for W, b in params:
        outputs = np.dot(inputs, W) + b
        inputs = np.tanh(outputs)
    return outputs - logsumexp(outputs, axis=1, keepdims=True)

Make predictions.

In [27]:
def predict(params, inputs):
    """Return the predicted classes for the given inputs."""
    outputs = feed_forward(params, inputs)
    return np.argmax(outputs, axis=1)

Accuracy of predictions.

In [34]:
def accuracy(params, inputs, targets):
    """Return the network's accuracy for the given inputs and targets."""
    targets = np.argmax(targets, axis=1)
    predictions = predict(params, inputs)
    print("Assert")
    assert(targets.shape == predictions.shape)
    return np.mean(predictions == targets)

In [35]:
def log_posterior(params, inputs, targets):
    return np.sum(feed_forward(parameters, inputs) * targets)

The big test.

In [None]:
# Model
layer_sizes = [784, 128, 128, 10]

# Training Parameters
batch_size = 128
scale = 0.1
epochs = 20
lr = 0.001

parameters = init_parameters(layer_sizes, scale)

# Minibatches
num_batches = int(np.ceil(len(x_train) / batch_size))
def batch_indices(iter):
    idx = iter % num_batches
    return slice(idx * batch_size, (idx+1) * batch_size)

# Define training objective 
def objective(params, iter):
    idx = batch_indices(iter)
    return - log_posterior(params, x_train[idx], y_train[idx])

# Get gradient using autograd 
objective_grad = grad(objective)

def print_perf(params, iter, gradient):
    if iter % num_batches == 0:
        train_acc = accuracy(params, x_train, y_train)
        val_acc = accuracy(params, x_val, y_val)
        print("{:15}|{:20}|{:20}".format(iter//num_batches, train_acc, val_acc))

# Optimize parameters 
optimize_params = adam(objective_grad, parameters, step_size=lr, num_iters=epochs * num_batches, callback=print_perf)