# Exploring Layers

In this notebook, we'll use the MNIST dataset to explore different optimizers and their parameters

In [23]:
# get the MNIST data functions
# matplotlib for plotting
from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

## Let's start with the MNIST Dataset again!

In [24]:
## mnist.load_data() will automatically download the dataset if you don't have it
(MNIST_train_X, MNIST_train_y), (MNIST_test_X, MNIST_test_y) = mnist.load_data()

### A bit of preprocessing on the data before we can train (teach) our network

For today, just ignore this.  Consider it a "Necessary evil". It's really not evil, but it is necessary.

In [25]:
MNIST_train_X = MNIST_train_X.reshape((60000, 28 * 28))
MNIST_train_X = MNIST_train_X.astype('float32') / 255

MNIST_test_X = MNIST_test_X.reshape((10000, 28 * 28))
MNIST_test_X = MNIST_test_X.astype('float32') / 255

from keras.utils import to_categorical

MNIST_train_y = to_categorical(MNIST_train_y)
MNIST_test_y = to_categorical(MNIST_test_y)

In [26]:
from keras import models
from keras import layers

# Build the Topology of the network

OK! We are going to finally change this up.  Let's experiment with different amounts of layers.
First we'll try more and less width (on a single layer). Then we'll try extra depth.
Let's try a few values, `128, 512, 1024, 2048` for the width and see what happens.  


In [27]:
network = models.Sequential() #we'll stick to sequential for this course

# the first parameter of the layers.Dense() is the # of units. Which is how wide the layer is. Let's start with 128
network.add(layers.Dense(2, activation='relu', input_shape=(784,)))  # Dense is the same as fully connected.
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='adam',
                loss='categorical_crossentropy',# change this parameter here.
                metrics=['accuracy'])

network.fit(MNIST_train_X, MNIST_train_y, epochs=5, batch_size=128)


test_loss, test_acc = network.evaluate(MNIST_test_X, MNIST_test_y)
print('test_acc:', test_acc)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
test_acc: 0.5759000182151794


# Record your results here

5000 = 0.9815

1024 = 0.9804

128 = 0.9728

12 = 0.9329

6 = 0.8993

2 = 0.5476

# Can you draw any conclusions about the width?

your answer here:



## More experiments, don't be shy!

Let's also try something really small, like `5, 16` or somthing really large like `10000` what do you notice?

your answer here :

# Now let's adjust the depth

We adjust the depth, by creating *more* layers.
We always need one input layer, and one output layer. But we can add more in the middle.
I'll add one below for you. Then you should try it with, `2,3, and 4` hidden layers.
I'm not sure what the best amount of nodes per layer should be!  Let's try slowly reducing it as we go.

`512 >> 256 >> 128 >> 64` 


In [37]:
network = models.Sequential() #we'll stick to sequential for this course

#input layer
network.add(layers.Dense(256, activation='relu', input_shape=(784,)))  # Dense is the same as fully connected.

# let's add another layer -- keep the activation function as 'relu'
network.add(layers.Dense(64, activation='relu'))
#network.add(layers.Dense(32, activation='relu'))
#network.add(layers.Dense(1, activation='relu'))

#output layer
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

In [38]:
network.summary()

Model: "sequential_14"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_40 (Dense)             (None, 256)               200960    
_________________________________________________________________
dense_41 (Dense)             (None, 64)                16448     
_________________________________________________________________
dense_42 (Dense)             (None, 10)                650       
Total params: 218,058
Trainable params: 218,058
Non-trainable params: 0
_________________________________________________________________


In [39]:
network.fit(MNIST_train_X, MNIST_train_y, epochs=5, batch_size=128)

test_loss, test_acc = network.evaluate(MNIST_test_X, MNIST_test_y)
print('test_acc:', test_acc)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
test_acc: 0.9771000146865845


# Record your results 

512 >> 256 >> 128 >> 10:  0.9764

12 >> 6 >> 3 >> 10 : 0.8815

12 >> 6 >>3 >> 1 >> 10: 

6 >> 3 >> 10 : 0.347

6 >> 128 >> 10 : 0.9144

# Conclusions about depth in layers?

your answer here:

# Keep experimenting.

What happens if you invert the direction, make the layers get larger? `10 >> 50 >> 500 > output`

