## Neural Net Tutorial

Fashion MNIST is a dataset of ten categories of clothing and accessories, in grayscales. The purpose of the tutorial is to accurately assign each item into one of the ten categories.

In [1]:
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

In [2]:
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


## Single Layer Perceptron
Our image is 28x28, and therefore is two-dimensional. Because of our perceptron only able to read one-dimensional data, let’s flatten them.

In [3]:
x_train = x_train.reshape(x_train.shape[0], -1) / 255.0
x_test = x_test.reshape(x_test.shape[0], -1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [5]:
print(y_train.shape)
print(y_test.shape)

(60000, 10)
(10000, 10)


- For the hidden layer, let’s set an arbitrary number of neurons. The number should be simple and small enough to follow our step number 1. Let’s choose 10 neurons.
- For the output layer, because we have 10 categories to categorize, we need to set it to 10 output neurons. For each image, each of these neurons will be filled with 1 if it is the correct category and 0 if not.
- Always use categorical_crossentropy for multi-categories, and binary_crossentropy for two categories. Use adam or rmsprop as the optimizer since both of them are pretty good. And you need accuracy as the metric to check your network performance.

In [7]:
# Sequential model
model = Sequential()
model.add(Dense(10, input_dim=784, activation='relu')) # hidden layer
model.add(Dense(10, activation='softmax')) # output layer

model.compile(loss='categorical_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

In [8]:
# Train model
model.fit(x_train, y_train, 
          epochs=10, # epochs is the number of training loops 
          validation_split=0.1) # use 10% of the training data as the validation data

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x144f4b908>

Pretty good. We get 85% accuracy on validation data.

In [9]:
# Testing performance
_, test_acc = model.evaluate(x_test, y_test)
print(test_acc)

0.8449


- 84% accuracy on test data means the network guessed right for around 8400 images from the 10K test data.
- A higher accuracy on test data means a better network. If you think the accuracy should be higher, maybe you need the next step(s) in building your Neural Network.

## Wider network: more hidden layer cells

Change the number of the hidden layer cells. We’ve increased these from 10 to 50

In [10]:
model2 = Sequential()
model2.add(Dense(50, input_dim=784, activation='relu'))
model2.add(Dense(10, activation='softmax'))
model2.compile(loss='categorical_crossentropy', 
               optimizer='adam', 
               metrics=['accuracy'])
model2.fit(x_train, y_train, epochs=10, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x145839438>

A whopping 88% accuracy on validation data. Good! It proves that making a bigger network can increase the performance

In [11]:
_, test_acc = model2.evaluate(x_test, y_test)
print(test_acc)

0.8732


## Deeper network: more hidden layers

In [12]:
# Add one more hidden layer with 50 cells
model3 = Sequential()
model3.add(Dense(50, input_dim=784, activation='relu'))
model3.add(Dense(50, activation='relu'))
model3.add(Dense(10, activation='softmax'))
model3.compile(loss='categorical_crossentropy', 
               optimizer='adam', 
               metrics=['accuracy'])
model3.fit(x_train, y_train, epochs=10, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1460f4f60>

In [13]:
_, test_acc = model3.evaluate(x_test, y_test)
print(test_acc)

0.8773


The improvement is not so big, maybe our approach is not right by using perceptron on images. How about we change network models?

## Convolutional neural network (CNN)

A convolutional neural network (CNN) is a neural network that can “see” a subset of our data. It can detect a pattern in images better than perceptron.

In [14]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten
import numpy as np

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train[:,:,:,np.newaxis] / 255.0
x_test = x_test[:,:,:,np.newaxis] / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

In [15]:
x_train.shape # 60,000 x 28 x 28 x 1 data

(60000, 28, 28, 1)

- The data CNN needs to read must be like this: total_data x width x height x channels.
- Height and width are self-explanatory. Channels are like Red or Green or Blue in RGB images. In RGB, because there are 3 channels, we need to make the data x 3. But because we work with grayscale images, every value on Red, Green, or Blue channel is the same and we reduce to one channel.

In [16]:
model4 = Sequential()
model4.add(Conv2D(filters=64, kernel_size=2, 
                  padding='same', activation='relu', 
                  input_shape=(28,28, 1))) 

model4.add(MaxPooling2D(pool_size=2))
model4.add(Flatten())
model4.add(Dense(10, activation='softmax'))
model4.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [17]:
model4.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 28, 28, 64)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
dense_9 (Dense)              (None, 10)                125450    
Total params: 125,770
Trainable params: 125,770
Non-trainable params: 0
_________________________________________________________________


- conv2d changes your 28x28x1 image to 28x28x64. Just imagine this as 64 hidden layer cells.
- MaxPooling2D reduces the width and height so that you will not need to compute all the cells. It reduces the size to 14x14x64.
- Flatten just flattens out the output of MaxPooling into a hidden layer of 12544 cells.

In [18]:
model4.fit(x_train, y_train, epochs=10, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x14725b390>

In [19]:
_, test_acc = model4.evaluate(x_test, y_test)
print(test_acc)

0.9019


Better accuracy!