## Chapter2: The mathematical building blocks of neural networks

### 2.1 Simple MINST

The MNIST dataset comes preloaded in Keras, in the form of a set of four NumPy arrays.

*I guess that most of the datasets which are by defualt in keras are numpy arrays*

In [None]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [8]:
#lets take a look at the data

print(f"Shape of the train data: {train_images.shape}")
print(f"Number of samples: {len(train_labels)}")

print(f"Train lables snippet: {train_labels}")

Shape of the train data: (60000, 28, 28)
Number of samples: 60000
Train lables snippet: [5 0 4 ... 5 6 8]


The core building block of neural networks is the layer. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation.

**Our model** -> sequence of two Dense layers, which are densely connected (also called fully connected)
**last layer** -> is a 10-way softmax classification layer, return an array of 10 probability scores (summing to 1) digit belonging to class

**optimizer** —> mechanism through which model updates itself
**loss function** —> measure performance on the training data being able to find the right direction.

Metrics to monitor during training and testing—Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

In [9]:
from tensorflow import keras 
from tensorflow.keras import layers
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(10, activation="softmax")
])

In [10]:
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

Preprocess data: reshaping it into the shape the model expects and scaling it so that all values are in the [0, 1] interval.

In [11]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255 
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

In [12]:
#train the model
model.fit(train_images, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x2538a093400>

In [13]:
#make the model output some predictions
test_digits = test_images[0:10]
predictions = model.predict(test_digits)
print(predictions[0])

[7.5498967e-08 1.4373538e-09 8.7867323e-07 6.6553468e-05 6.7920322e-12
 6.8914566e-08 1.6641861e-12 9.9993145e-01 2.6962686e-08 9.5122914e-07]


numpy.argmax -> Returns the indices of the maximum values along an axis
Find the index (which in this case is the same as the digit) of the maximum value

In [17]:
print(f"Max index: {predictions[0].argmax()}")
print(f"True value: {test_labels[0]}")

Max index: 7
True value: 7


In [18]:
#evaluate the model on all the data from the test dataset
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

Test accuracy: 0.9796000123023987


### 2.2 Data types - Tensors

At its core, a tensor is a container for data—usually numerical data. So, it’s a container for numbers. You may be already familiar with matrices, which are rank-2 tensors: tensors are a generalization of matrices to an arbitrary number of dimensions (note that in the context of tensors, a dimension is often called an axis).

Tensor  -> number of axis

        -> shape

        -> data type

In [22]:
import numpy as np
x = np.array(12)
print(f"{x} with shape {x.shape} and dim {x.ndim}\n")

x = np.array([12, 3, 6, 14, 7])
print(f"{x} with shape {x.shape} and dim {x.ndim}\n")

x = np.array([[5, 78, 2, 34, 0],
                  [6, 79, 3, 35, 1],
                  [7, 80, 4, 36, 2]])
print(f"{x} with shape {x.shape} and dim {x.ndim}\n")

x = np.array([[[5, 78, 2, 34, 0],
                   [6, 79, 3, 35, 1],
                   [7, 80, 4, 36, 2]],
                  [[5, 78, 2, 34, 0],
                   [6, 79, 3, 35, 1],
                   [7, 80, 4, 36, 2]],
                  [[5, 78, 2, 34, 0],
                   [6, 79, 3, 35, 1],
                   [7, 80, 4, 36, 2]]])
print(f"{x} with shape {x.shape} and dim {x.ndim}\n")

12 with shape () and dim 0

[12  3  6 14  7] with shape (5,) and dim 1

[[ 5 78  2 34  0]
 [ 6 79  3 35  1]
 [ 7 80  4 36  2]] with shape (3, 5) and dim 2

[[[ 5 78  2 34  0]
  [ 6 79  3 35  1]
  [ 7 80  4 36  2]]

 [[ 5 78  2 34  0]
  [ 6 79  3 35  1]
  [ 7 80  4 36  2]]

 [[ 5 78  2 34  0]
  [ 6 79  3 35  1]
  [ 7 80  4 36  2]]] with shape (3, 3, 5) and dim 3



### 2.3 Tensor operations

naive implementation -> element-wise, for loop


pro implementation   -> use numpy

In [1]:
def naive_relu(x):
    assert len(x.shape) == 2
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

In [2]:
def naive_add(x, y):
    assert len(x.shape) == 2
    assert x.shape == y.shape
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x

In [6]:
import time
import numpy as np

x = np.random.random((20, 100))
y = np.random.random((20, 100))
  
t0 = time.time() 
for _ in range(1000):
    z = x + y
    z = np.maximum(z, 0.) 
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 0.00 s


In [7]:
t0 = time.time() 
for _ in range(1000):
    z = naive_add(x, y)
    z = naive_relu(z) 
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 1.16 s


pro implementation   -> use numpy


naive implementation -> element-wise, for loop

In [14]:
t0 = time.time() 
x = np.random.random((320000,))
y = np.random.random((320000,))
z = np.dot(x, y)
print(z)
print("Took: {0:.2f} s".format(time.time() - t0))

79893.34877696962
Took: 0.01 s


In [15]:
def naive_vector_dot(x, y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    assert x.shape[0] == y.shape[0]
    z = 0. 
    for i in range(x.shape[0]):
        z += x[i] * y[i]
    return z

In [16]:
t0 = time.time() 
z = naive_vector_dot(x, y)
print(z)
print("Took: {0:.2f} s".format(time.time() - t0))

79893.34877696872
Took: 0.07 s


Reshaping tensors

In [19]:
x = np.array([[0., 1.],
                  [2., 3.],
                  [4., 5.]])
print(f"x :\n{x} \n x.shape={x.shape}\n\n")

x = x.reshape((6, 1))
print(f"x :\n{x} \n x.shape={x.shape}\n\n")

x = x.reshape((2, 3))
print(f"x :\n{x} \n x.shape={x.shape}\n\n")

x :
[[0. 1.]
 [2. 3.]
 [4. 5.]] 
 x.shape=(3, 2)


x :
[[0.]
 [1.]
 [2.]
 [3.]
 [4.]
 [5.]] 
 x.shape=(6, 1)


x :
[[0. 1. 2.]
 [3. 4. 5.]] 
 x.shape=(2, 3)




Transposing vectors -> switching rows with columns

In [20]:
x = np.zeros((300, 20))
x = np.transpose(x)
x.shape

(20, 300)

Reimplementing MINST first example from scratch
![image.png](attachment:image.png)

In [24]:
import tensorflow as tf
  
class NaiveDense:
    def __init__(self, input_size, output_size, activation):
        self.activation = activation
 
        w_shape = (input_size, output_size)
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)
  
        b_shape = (output_size,)
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)
  
    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)
  
    @property
    def weights(self):
        return [self.W, self.b]

Here we stack all the layers to one another. (We have an array of layers (NaiveDense) in self.layers)

In [25]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers
  
    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
           x = layer(x)
        return x
  
    @property 
    def weights(self):
       weights = []
       for layer in self.layers:
           weights += layer.weights
       return weights

In [34]:
#instantiating the model
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
]) 
assert len(model.weights) == 4

Find a proper batch size. Iterate through each batch and just return the part of the dataset of the corresponding batch

In [27]:
import math
  
class BatchGenerator:
    def __init__(self, images, labels, batch_size=128):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)
 
    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

At each step we 

(1) take the batch

(2) make a forward pass ->  predictions = model(images_batch) ->  def __call__(self, inputs):

                                                       x = inputs
                                                       for layer in self.layers:
                                                           x = layer(x)
                                                        return x
                                                        
(3) compute the loss

(4) compute the gradients

(5) update weights

In [28]:
def one_training_step(model, images_batch, labels_batch):
    with tf.GradientTape() as tape:
        predictions = model(images_batch)
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
            labels_batch, predictions)
        average_loss = tf.reduce_mean(per_sample_losses)
    gradients = tape.gradient(average_loss, model.weights)
    update_weights(gradients, model.weights)
    return average_loss

In [29]:
learning_rate = 1e-3 
  
def update_weights(gradients, weights):
    for g, w in zip(gradients, weights):
        w.assign_sub(g * learning_rate)

In [30]:
from tensorflow.keras import optimizers
  
optimizer = optimizers.SGD(learning_rate=1e-3)
  
def update_weights(gradients, weights):
    optimizer.apply_gradients(zip(gradients, weights))

In [31]:
def fit(model, images, labels, epochs, batch_size=128):
    for epoch_counter in range(epochs):
        print(f"Epoch {epoch_counter}")
        batch_generator = BatchGenerator(images, labels)
        for batch_counter in range(batch_generator.num_batches):
            images_batch, labels_batch = batch_generator.next()
            loss = one_training_step(model, images_batch, labels_batch)
            if batch_counter % 100 == 0:
                print(f"loss at batch {batch_counter}: {loss:.2f}")

In [32]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
  
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255  
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255 
  
fit(model, train_images, train_labels, epochs=10, batch_size=128)

Epoch 0
loss at batch 0: 4.47
loss at batch 100: 2.22
loss at batch 200: 2.17
loss at batch 300: 2.06
loss at batch 400: 2.23
Epoch 1
loss at batch 0: 1.86
loss at batch 100: 1.86
loss at batch 200: 1.80
loss at batch 300: 1.69
loss at batch 400: 1.84
Epoch 2
loss at batch 0: 1.54
loss at batch 100: 1.56
loss at batch 200: 1.48
loss at batch 300: 1.40
loss at batch 400: 1.51
Epoch 3
loss at batch 0: 1.29
loss at batch 100: 1.32
loss at batch 200: 1.22
loss at batch 300: 1.19
loss at batch 400: 1.27
Epoch 4
loss at batch 0: 1.09
loss at batch 100: 1.15
loss at batch 200: 1.02
loss at batch 300: 1.03
loss at batch 400: 1.11
Epoch 5
loss at batch 0: 0.95
loss at batch 100: 1.01
loss at batch 200: 0.89
loss at batch 300: 0.92
loss at batch 400: 0.99
Epoch 6
loss at batch 0: 0.85
loss at batch 100: 0.91
loss at batch 200: 0.79
loss at batch 300: 0.83
loss at batch 400: 0.90
Epoch 7
loss at batch 0: 0.77
loss at batch 100: 0.82
loss at batch 200: 0.71
loss at batch 300: 0.76
loss at batch 40

In [33]:
predictions = model(test_images)
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {matches.mean():.2f}")

accuracy: 0.82
