# Implementation of Multilayer Perceptrons from Scratch

In this Jupyter we train a model to fit the the Fashion-MNIST image classification dataset. using the softmax regression activation function.

In [None]:
import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
from dl import tensorflow as dl

In [None]:
batch_size = 256
train_iter, test_iter = dl.load_data_fashion_mnist(batch_size)

## Initializing Model Parameters


In [None]:
num_inputs, num_outputs, num_hiddens = 784, 10, 256

W1 = tf.Variable(
    tf.random.normal(shape=(num_inputs, num_hiddens), mean=0, stddev=0.01))
b1 = tf.Variable(tf.zeros(num_hiddens))
W2 = tf.Variable(
    tf.random.normal(shape=(num_hiddens, num_outputs), mean=0, stddev=0.01))
b2 = tf.Variable(tf.random.normal([num_outputs], stddev=.01))

params = [W1, b1, W2, b2]

## Activation Function



In [None]:
def relu(X):
    return tf.math.maximum(X, 0)

## Model


In [None]:
def net(X):
    X = tf.reshape(X, (-1, num_inputs))
    H = relu(tf.matmul(X, W1) + b1)
    return tf.matmul(H, W2) + b2

## Loss Function

In [None]:
def loss(y_hat, y):
    return tf.losses.sparse_categorical_crossentropy(y, y_hat,
                                                     from_logits=True)

## Training


In [None]:
num_epochs, lr = 10, 0.1
updater = dl.Updater([W1, W2, b1, b2], lr)
dl.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)

To evaluate the learned model,
we apply the learned model to some test data.


In [None]:
dl.predict_ch3(net, test_iter)

## Summary

* We saw that implementing a simple MLP is easy, even when done manually.
* However, with a large number of layers, implementing MLPs from scratch can still get messy (e.g., naming and keeping track of our model's parameters).


## Exercises

1. Change the value of the hyperparameter `num_hiddens` and see how this hyperparameter influences your results. Determine the best value of this hyperparameter, keeping all others constant.
1. Try adding an additional hidden layer to see how it affects the results.
1. How does changing the learning rate alter your results? Fixing the model architecture and other hyperparameters (including number of epochs), what learning rate gives you the best results?
1. What is the best result you can get by optimizing over all the hyperparameters (learning rate, number of epochs, number of hidden layers, number of hidden units per layer) jointly?
1. Describe why it is much more challenging to deal with multiple hyperparameters.
1. What is the smartest strategy you can think of for structuring a search over multiple hyperparameters?
