<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction-to-tensorflow" data-toc-modified-id="Introduction-to-tensorflow-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction to tensorflow</a></span></li><li><span><a href="#Implementing-a-model-with-tf.keras.models.Sequential" data-toc-modified-id="Implementing-a-model-with-tf.keras.models.Sequential-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Implementing a model with <code>tf.keras.models.Sequential</code></a></span></li><li><span><a href="#Inspecting-intermediate-activations-of-a-particular-layer" data-toc-modified-id="Inspecting-intermediate-activations-of-a-particular-layer-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Inspecting intermediate activations of a particular layer</a></span></li><li><span><a href="#Custom-train-loop" data-toc-modified-id="Custom-train-loop-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Custom train loop</a></span></li><li><span><a href="#CategoricalCrossentropy()" data-toc-modified-id="CategoricalCrossentropy()-5"><span class="toc-item-num">5&nbsp;&nbsp;</span><code>CategoricalCrossentropy()</code></a></span></li><li><span><a href="#SparseCategoricalCrossentropy()" data-toc-modified-id="SparseCategoricalCrossentropy()-6"><span class="toc-item-num">6&nbsp;&nbsp;</span><code>SparseCategoricalCrossentropy()</code></a></span></li><li><span><a href="#Using-next-in-the-iterators" data-toc-modified-id="Using-next-in-the-iterators-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Using next in the iterators</a></span></li></ul></div>

## Introduction to tensorflow

In [2]:
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [7]:
type(x_train),  type(x_test)

(numpy.ndarray, numpy.ndarray)

## Implementing a model with `tf.keras.models.Sequential`


Many models can be build stacking different types of layers that are chained using the **`tf.keras.models.Sequential`** function. 

- **`tf.keras.models.Sequential`**

    - groups a linear stack of layers into a `tf.keras.Model`.
    
    -  provides training and inference features on this model.
    
    
Let us build a feedforward neural network with a single hidden layer of 128 units.

In [30]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

Once the model is build it has to be compiled with an optimizer and a loss.

In [508]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


In [503]:
?model.fit

In [505]:
model.fit(x_train, y_train, epochs=2,batch_size=16)
model.evaluate(x_test,  y_test, verbose=0)

Epoch 1/2
Epoch 2/2


[0.08194960653781891, 0.9797000288963318]

We can use `model.predict` to generate predictions for a single example or for a batch

In [47]:
model.predict(x_train[0:1])

array([[3.2092703e-14, 3.8814462e-11, 6.4564543e-12, 7.1231596e-04,
        3.8425109e-25, 9.9928766e-01, 3.3170677e-17, 1.4139238e-13,
        3.1058227e-15, 8.6925600e-10]], dtype=float32)

In [133]:
batch = x_train[0:3] 
model.predict(batch)

array([[3.2092703e-14, 3.8814462e-11, 6.4564660e-12, 7.1231532e-04,
        3.8425109e-25, 9.9928766e-01, 3.3170677e-17, 1.4139238e-13,
        3.1058227e-15, 8.6925600e-10],
       [9.9999869e-01, 6.5247768e-12, 1.3151504e-06, 6.4917568e-12,
        9.2095436e-15, 1.4805157e-10, 1.1843172e-09, 3.6655631e-10,
        3.7651793e-10, 2.3940161e-09],
       [1.5755981e-10, 2.2667291e-06, 9.3890430e-06, 5.4509592e-08,
        9.9942857e-01, 1.5296407e-10, 8.9152143e-09, 5.2883424e-04,
        9.6269346e-08, 3.0765470e-05]], dtype=float32)

To get the class prediction we just need to pick the coordinates with highest score

In [134]:
np.argmax(model.predict(batch),axis=1)

array([5, 0, 4])

In [136]:
y_train[0:3]

array([5, 0, 4], dtype=uint8)

## Inspecting intermediate activations of a particular layer

The layers from a model created with `tf.keras.models.sequential` are stored in  `.layers`

In [79]:
model.layers

[<tensorflow.python.keras.layers.core.Flatten at 0x1455a2d90>,
 <tensorflow.python.keras.layers.core.Dense at 0x1455a2490>,
 <tensorflow.python.keras.layers.core.Dropout at 0x142cff6d0>,
 <tensorflow.python.keras.layers.core.Dense at 0x1455a2810>]

We can retrieve the activations for all layers as follows:

In [137]:
output_names = [l.name for l in model.layers]
model.outputs = [l.ou tput for l in model.layers]
model.build(input_shape=x_train[0:1].shape)
batch = x_train[0:2]
output_values = model(batch)
#layer_name_to_output_value = dict(zip(output_names, output_values))

In [139]:
batch.shape

(2, 28, 28)

In [113]:
output_values[0].shape

TensorShape([2, 784])

In [114]:
output_values[1].shape

TensorShape([2, 128])

In [115]:
output_values[2].shape

TensorShape([2, 128])

In [116]:
output_values[3].shape

TensorShape([2, 10])

In [119]:
output_values[3]

<tf.Tensor: shape=(2, 10), dtype=float32, numpy=
array([[3.2092703e-14, 3.8814462e-11, 6.4564786e-12, 7.1231596e-04,
        3.8425109e-25, 9.9928766e-01, 3.3170677e-17, 1.4139238e-13,
        3.1058346e-15, 8.6925600e-10],
       [9.9999869e-01, 6.5247768e-12, 1.3151516e-06, 6.4917568e-12,
        9.2095783e-15, 1.4805157e-10, 1.1843194e-09, 3.6655701e-10,
        3.7651790e-10, 2.3940208e-09]], dtype=float32)>

Note that the output values of the last layer are precisely  the predictions of the model

In [140]:
model.predict(batch)

array([[3.2092703e-14, 3.8814462e-11, 6.4564786e-12, 7.1231596e-04,
        3.8425109e-25, 9.9928766e-01, 3.3170677e-17, 1.4139238e-13,
        3.1058346e-15, 8.6925600e-10],
       [9.9999869e-01, 6.5247768e-12, 1.3151516e-06, 6.4917568e-12,
        9.2095783e-15, 1.4805157e-10, 1.1843194e-09, 3.6655701e-10,
        3.7651790e-10, 2.3940208e-09]], dtype=float32)

## Custom train loop

In [474]:
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
cce(y_true, y_pred).numpy()

0.9868951

In [475]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [476]:
batch = x_train[0:3]
output_batch = model(batch)

In [477]:
output_batch

<tf.Tensor: shape=(3, 10), dtype=float32, numpy=
array([[0.11727271, 0.1837656 , 0.09804658, 0.06271391, 0.09450665,
        0.16868013, 0.09000888, 0.06515361, 0.04264797, 0.07720402],
       [0.23056899, 0.1483027 , 0.08469895, 0.05953145, 0.08288971,
        0.1659654 , 0.0652004 , 0.03859435, 0.04766007, 0.07658791],
       [0.11386129, 0.11304498, 0.10339391, 0.05170143, 0.10766418,
        0.05228689, 0.10536055, 0.1355347 , 0.09026898, 0.12688312]],
      dtype=float32)>

In [519]:
def get_batch(X,Y, batch_size):
    n_samples = X.shape[0]
    start = 0
    indices = np.arange(n_samples)
    
    for start in range(0, n_samples, batch_size):
        end = start + batch_size
        batch_idx = indices[start:end]
        yield X[batch_idx], Y[batch_idx]
    
    

## `CategoricalCrossentropy()`

In [517]:
n_epochs  = 2
n_classes = len(np.unique(y_train))
n_batch   = 16
n_samples = x_train.shape[0]
n_batches_per_epoch = int(np.ceil(n_samples/n_batch))

accuracy  = tf.keras.metrics.CategoricalAccuracy()
loss_fn   = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
batch_generator = get_batch(x_train, y_train, n_batch)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              #loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


In [518]:
acc = 0
for epoch in range(n_epochs):
    print(f"epoch {epoch}")
    batch_generator = get_batch(x_train, y_train, n_batch)
    for step in range(n_batches_per_epoch):
        x_batch, y_batch = next(batch_generator)

        with tf.GradientTape() as tape:
            out_batch = model(x_batch)
            # we NEED to do the one hot encoding because
            # loss_fn = CategoricalCrossentropy
            y_batch_onehot = tf.one_hot(y_batch, n_classes)
            loss_value     = loss_fn(y_batch_onehot, out_batch)
        
        # Update the state of the `accuracy` metric.
        accuracy.update_state(y_batch, np.argmax(out_batch,1))
        #acc = np.mean(y_batch == np.argmax(out_batch,1))  

        # Get a list of gradients for each layer in the model
        gradients = tape.gradient(loss_value, model.trainable_weights)

        # Update the weights of the model to minimize the loss value.
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))
        
        # Logging the current accuracy value so far.
        if step % 1000 == 0:
            print('Step:', step)        
            print(f'step: {step}, accuracy so far: %.3f' % accuracy.result())
        

epoch 0
Step: 0
step: 0, accuracy so far: 0.000
Step: 1000
step: 1000, accuracy so far: 0.734
Step: 2000
step: 2000, accuracy so far: 0.772
Step: 3000
step: 3000, accuracy so far: 0.794
epoch 1
Step: 0
step: 0, accuracy so far: 0.809
Step: 1000
step: 1000, accuracy so far: 0.822
Step: 2000
step: 2000, accuracy so far: 0.829
Step: 3000
step: 3000, accuracy so far: 0.839


## `SparseCategoricalCrossentropy()`

In [None]:
n_epochs  = 2
n_classes = len(np.unique(y_train))
n_batch   = 16
n_samples = x_train.shape[0]
n_batches_per_epoch = int(np.ceil(n_samples/n_batch))

accuracy  = tf.keras.metrics.CategoricalAccuracy()
loss_fn   = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
batch_generator = get_batch(x_train, y_train, n_batch)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')])

model.compile(optimizer='adam',
              #loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


In [None]:
acc = 0
for epoch in range(n_epochs):
    print(f"epoch {epoch}")
    batch_generator = get_batch(x_train, y_train, n_batch)
    for step in range(n_batches_per_epoch):
        x_batch, y_batch = next(batch_generator)

        with tf.GradientTape() as tape:
            out_batch = model(x_batch)

            # we DO NOT HAVE to pass the one hot encoding of the classes
            # in the batch because loss_fn = SparseCategoricalCrossentropy()
            loss_value = loss_fn(y_batch, out_batch)
        
        # Update the state of the `accuracy` metric.
        accuracy.update_state(y_batch, np.argmax(out_batch,1))
        # acc = np.mean(y_batch == np.argmax(out_batch,1))  
        
        # Get a list of gradients for each layer in the model
        gradients = tape.gradient(loss_value, model.trainable_weights)

        # Update the weights of the model to minimize the loss value.
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))

        # Logging the current accuracy value so far.
        if step % 1000 == 0:
            print('Step:', step)        
            print(f'step: {step}, accuracy so far: %.3f' % accuracy.result())

            

## Using next in the iterators

In [522]:
n_epochs  = 2
n_classes = len(np.unique(y_train))
n_batch   = 16
n_samples = x_train.shape[0]
n_batches_per_epoch = int(np.ceil(n_samples/n_batch))

accuracy = tf.keras.metrics.CategoricalAccuracy()
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()
batch_generator = get_batch(x_train, y_train, n_batch)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Now we will iterate directly form the generator, instead of using `next`

In [523]:
acc = 0
for epoch in range(n_epochs):
    print(f"epoch {epoch}")
    batch_generator = get_batch(x_train, y_train, n_batch)
    for step,(x_batch, y_batch ) in enumerate(batch_generator):

        #x_batch, y_batch = next(batch_generator)
        with tf.GradientTape() as tape:
            out_batch = model(x_batch)

            # Compute the loss value for this batch.
            y_batch_onehot = tf.one_hot(y_batch, n_classes)
            loss_value     = loss_fn(y_batch_onehot, out_batch)

        # Update the state of the `accuracy` metric.
        accuracy.update_state(y_batch, np.argmax(out_batch,1))
    
        # acc = np.mean(y_batch == np.argmax(out_batch,1))  
        # Get a list of gradients for each layer in the model
        gradients = tape.gradient(loss_value, model.trainable_weights)

        # Update the weights of the model to minimize the loss value.
        optimizer.apply_gradients(zip(gradients, model.trainable_weights))

        # Logging the current accuracy value so far.
        if step % 1000 == 0:
            print(f'batch index: {step}, accuracy so far: %.3f' % accuracy.result())

epoch 0
batch index: 0, accuracy so far: 0.000
batch index: 1000, accuracy so far: 0.731
batch index: 2000, accuracy so far: 0.772
batch index: 3000, accuracy so far: 0.793
epoch 1
batch index: 0, accuracy so far: 0.809
batch index: 1000, accuracy so far: 0.822
batch index: 2000, accuracy so far: 0.831
batch index: 3000, accuracy so far: 0.841
