In [4]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

import tensorflow as tf

Our goal is to classify an input image into one of the 10 classes of clothing, so we will define our neural network to take as input a matrix of shape (28, 28) and output a vector of size 10, where the index of the largest value in the output corresponds to the integer label for the class of clothing in the image. For example, if we use an image of an ankle boot as input, we might get an output vector $y'$ like this:

$$
y' =
\begin{bmatrix}
  0.0000 \\
  5.3003 \\
  2.1616 \\
  1.9145 \\
  0.0000 \\
  5.1698 \\
  0.0000 \\
  2.2152 \\
  0.0000 \\
  7.0417 \\
\end{bmatrix}
$$

In this particular example, the largest value appears at index 9 (counting from zero) &mdash; and as we showed in the previous module, index 9 corresponds to the "Ankle Boot" category. So this indicates that our neural network correctly classified the image of an ankle boot.

Here's a visualization of the structure of the neural network we chose for this scenario:

![Diagram of Fashion MNIST classification neural network](images/1-fashion-nn.png)

Because each image has 28 &times; 28 = 784 pixels, we need 784 nodes in the input layer (one for each pixel value). We decided to add two hidden layers with 512 nodes, each followed by a ReLU (rectified linear unit) activation function. We want the output of our network to be a vector of size 10, therefore our output layer needs to have 10 nodes.

Here's the Keras code that defines this neural network:

In [5]:
model = tf.keras.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(10)
])

The [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer turns our input matrix of shape (28, 28) into a vector of size 728. The [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layers are also known as "fully connected" or "linear" layers because they connect all nodes from the previous layer with each of their own nodes using a linear function. Notice that they specify "ReLU" as the activation &mdash; that's because we want the results of the linear mathematical operation to get passed as input to a "Rectified Linear Unit" function, which adds non-linearity to the calculations. Finally, [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) combines all the other layers into a model.

We can print a description of our model using the `summary` method:

In [6]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 512)               401920    
_________________________________________________________________
dense_4 (Dense)              (None, 512)               262656    
_________________________________________________________________
dense_5 (Dense)              (None, 10)                5130      
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________


This is all the code needed to define our neural network. Now that we have a neural network and some data, it's time to train the neural network using that data. 