# My first NN

Import MNIST.

In [1]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [2]:
train_images.shape

(60000, 28, 28)

In [7]:
test_images.shape

(10000, 28, 28)

In [3]:
train_images.dtype

dtype('uint8')

In [4]:
len(train_labels)

60000

In [5]:
test_images.dtype

dtype('uint8')

In [8]:
train_labels

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [9]:
test_images.shape

(10000, 28, 28)

In [10]:
len(test_labels)

10000

In [11]:
test_labels

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

The workflow will be as follows: 
1.   we’ll feed the neural network the training data, `train_images` and `train_labels`. The network will then learn to associate images and labels. 
2.   we’ll ask the network to produce predictions for `test_images`, and we’ll verify whether these predictions match the labels from test_labels.

In [17]:
# the network architecture
from tensorflow.keras import models
from tensorflow.keras import layers
my_network = models.Sequential([
  layers.Dense(512, activation='relu'),
  layers.Dense(10, activation='softmax')
])

A **layer** is a data-processing module that you can think of as a filter for data. Layers extract *representations* out of the data fed into them.


In this specific case, our NN consists of:

*   a sequence of two `Dense` layers, which are `densely connected` (also called *fully connected*) neural layers
*   the second (and last) layer is a bit special: it is a 10-way `softmax` layer, which will return an array of 10 numbers, our *probability scores* (summing to 1). 

To make the network ready for training, we need to pick 3 more things, in the compilation step:

*   A **loss function** (`loss` below)
*   An **optimizer** (`optimizer` below)
*   Some **metrics** (`metrics` below) 


In [18]:
# the compilation step
my_network.compile(optimizer='rmsprop',
                   loss='sparse_categorical_crossentropy',
                   metrics=['accuracy'])

Before training, we’ll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. 



In [12]:
# Preparing the image data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
#
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

In [13]:
train_images.shape

(60000, 784)

In [14]:
train_images.dtype

dtype('float32')

In [15]:
test_images.shape

(10000, 784)

In [16]:
test_images.dtype

dtype('float32')

We’re now ready to train the network, via the `fit` method. 

In [19]:
# Fit the NN
my_network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f7370f402d0>

Two quantities are displayed during training:

*   the loss of the network over the training data
*   the accuracy of the network over the training data.

We quickly reach a high accuracy e.g. 0.98 or 0.99 (98% or 99%) on the training data. 

Now let’s check that the model performs well on the test set, too:

In [20]:
test_loss, test_acc = my_network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

test_acc: 0.9799000024795532


The test set accuracy should turn out to be a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of overfitting.

# Tensors

**0D tensors**: rank=0, scalars


In [21]:
import numpy as np

In [22]:
# 0d tensors
x = np.array(123)
x.ndim

0

**1D tensors**: rank=1, vectors


In [23]:
#1d tensors
x = np.array([1,2,3])
x.ndim

1

**2D tensors**: rank=2, matrices


In [24]:
#2d tensors
x = np.array([[1, 2, 3],
              [4, 5, 5]])
x.ndim

2

**3D tensors**: rank=3


In [25]:
#3d tensors
x = np.array([[[1, 2, 3],
              [4, 5, 5]],
             [[1, 2, 3],
              [4, 5, 5]],
             [[1, 2, 3],
              [4, 5, 5]],
             [[1, 2, 3],
              [4, 5, 5]]])
x.ndim

3

Of course, this can continue indefinitely, up to generic **ND tensors**: rank=N, i.e. higher-dimensional tensors. 

**Summary of key attributes of a tensor**

A tensor is defined by 3 key attributes:

*   **# axes** (**rank**). 
*   **shape**. 
*   **data type**. 

Check all this on MNIST. In case not done earlier, load back the dataset:

In [None]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [None]:
print(train_images.ndim)

In [None]:
print(train_images.shape)

In [None]:
print(train_images.dtype)

In [None]:
digit = train_images[0]
import matplotlib.pyplot as plt
plt.imshow(digit, cmap=plt.cm.binary)
plt.show()

In [None]:
train_labels[0]

Let's briefly manipulate tensors.



The example below selects digits #10 (included) to #100 (excluded) and puts them in an array of shape (90, 28, 28).

In [None]:
my_slice = train_images[10:100]
print(my_slice.shape)

In [None]:
my_slice = train_images[10:100, :, :] 
print(my_slice.shape)

In [None]:
my_slice = train_images[10:100, 0:28, 0:28]
print(my_slice.shape)

In [None]:
my_slice_bottomright = train_images[:, 14:, 14:]

In [None]:
my_slice_middle = train_images[:, 7:-7, 7:-7]

**Concept of data batches**


In [None]:
# batch_1 = train_images[:128]

# batch_2 = train_images[128:256]

# batch_N = train_images[128 * n:128 * (n + 1)]

# Tensor operations

See slides for intro.

In [None]:
def naive_relu(x):
    assert len(x.shape) == 2                  # x is a 2D tensor
    x = x.copy()                              # avoid overwriting the input tensor

    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

In [None]:
def naive_add(x, y):
    assert len(x.shape) == 2                  # x is a 2D tensor
    assert x.shape == y.shape                 # y same as x
    x = x.copy()                              # avoid overwriting the input tensor

    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x

In [None]:
import numpy as np

In [None]:
x = np.array([[1, 1],
              [1, 1]])
y = np.array([[2, 2],
              [2, 2]])

In [None]:
z = x + y 
z

In [None]:
%%time
z = x + y 
z

In [None]:
# if you want to compute time a bit better..
import time

t0 = time.time()
for _ in range(1000):
  z = naive_add(x, y)
  z = naive_relu(z)
print('Took: %.2f s' % (time.time() - t0))

Element-wise tensor operations.

In [None]:
def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2             # x is a 2D tensor        
    assert len(y.shape) == 1             # y is a vector       
    assert x.shape[1] == y.shape[0]      # prepare for broadcasting
    x = x.copy()                         # avoid overwriting the input tensor

    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[j]
    return x

In [None]:
x = np.random.random((64, 3, 32, 10))        
y = np.random.random((32, 10))               
z = np.maximum(x, y)                      # z will have the same shape as x

In [None]:
x.shape

In [None]:
y.shape

In [None]:
z.shape

In [None]:
# You can inspect the tensors, if interested.
# x
# y
# z

**Dot product**

In [None]:
x = np.random.random((32,))
y = np.random.random((32,))
z = np.dot(x, y)

In [None]:
z

In [None]:
z.shape

**Dot product of 2 vectors**



In [None]:
def naive_vector_dot(x, y):
    assert len(x.shape) == 1              # x is a 1D tensor (a vector)
    assert len(y.shape) == 1              # y is a 1D tensor (a vector)
    assert x.shape[0] == y.shape[0]

    z = 0.
    for i in range(x.shape[0]):
        z += x[i] * y[i]
    return z

In [None]:
x = np.random.random((32,))
y = np.random.random((32,))

In [None]:
z = np.dot(x, y)
z

In [None]:
z = naive_vector_dot(x, y)
z

**Dot product of a matrix and a vector**



In [None]:
def naive_matrix_vector_dot(x, y):
    assert len(x.shape) == 2               # x is a 2D tensor (i.e. a matrix) 
    assert len(y.shape) == 1               # y is a 1D tensor (i.e. a vector)
    assert x.shape[1] == y.shape[0]        # the 1st dimension of x must be the same as the 0th dimension of y

    z = np.zeros(x.shape[0])               # prepare z as a vector of 0s with same shape as y

    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            z[i] += x[i, j] * y[j]
    return z

Or just reuse this, that we wrote previously. It highlights the relationship between a matrix-vector product and a vector product.

In [None]:
#def naive_matrix_vector_dot(x, y):
#    z = np.zeros(x.shape[0])
#    for i in range(x.shape[0]):
#        z[i] = naive_vector_dot(x[i, :], y)
#    return z

**Dot product of matrices**



In [None]:
def naive_matrix_dot(x, y):
    assert len(x.shape) == 2                  # x is a 2D tensor (i.e. matrix)
    assert len(y.shape) == 2                  # y is a 2D tensor (i.e. matrix)
    assert x.shape[1] == y.shape[0]           # the 1st dimension of x must be the same as the 0th dimension of y

    z = np.zeros((x.shape[0], y.shape[1]))    # prepare z as a matrix of 0s with the desired shape
    
    for i in range(x.shape[0]):               # iterate over the rows of x ..
        for j in range(y.shape[1]):           # .. and the columns of y
            row_x = x[i, :]
            column_y = y[:, j]
            z[i, j] = naive_vector_dot(row_x, column_y)
    return z

And so on, towards higher ranks.

**Tensor reshaping**




In [26]:
x = np.array([[0., 1.],
              [2., 3.],
              [4., 5.]])
print(x.shape)

(3, 2)


In [27]:
x = x.reshape((6, 1))
x

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.]])

In [28]:
x = x.reshape((2, 3))
x

array([[0., 1., 2.],
       [3., 4., 5.]])

**Transposition**



In [29]:
x = np.zeros((300, 20))
x = np.transpose(x)        # basically, x[i, :] becomes x[:, i]
print(x.shape)

(20, 300)


# My first NN (reloaded)

Here you load the dataset. Just I/O.

In [30]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

This was the input data.

In [31]:
# Preparing the image data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
#
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

This was our NN architecture:

In [32]:
# the network architecture
from tensorflow.keras import models
from tensorflow.keras import layers
my_network = models.Sequential([
  layers.Dense(512, activation='relu'),
  layers.Dense(10, activation='softmax')
])

This was the network-compilation step:

In [33]:
# the compilation step
my_network.compile(optimizer='rmsprop',
                   loss='sparse_categorical_crossentropy',
                   metrics=['accuracy'])

Finally, this was the training loop:

In [34]:
# Fit the NN
my_network.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f736a290310>

In [35]:
test_loss, test_acc = my_network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

test_acc: 0.9785000085830688
