Answers
-------
1. Computation graph framework that allows building complex functions efficiently and in parallel.
  TF serves as the backend for Keras. Another deep learning framework like TF is PyTorch

2. TF and NP share many functions, especially around mathematical computation. However, TF serves as a heavy duty computation
  framework that is meant for large mathematical models. It offers distributed computation for example and has a tight 
  integration with the GPU. Therefore, for simple operations, numpy is preferred.

3. No, numpy works with 64 bit variable by default whereas tf works with 32 bit variables by default. The second one
  will generate a tensor of type int64 and the first one a tensor of int32

4. String, arrays, sparse tensors, ragged tensors, sets and queues

5. Using Keras' Loss as a parent class offers built-in facilities such as tracking and serialization which are useful when
  building a model.

6. For tracking (?)

7. Custom layers can be used when you need to define a custom operation inside a given layer and then use it in a model. It 
  does not control the flow of the model. Using a custom model is useful when you need to control the flow of information 
  between layers. For example, you may want to call a given layer 3 times during training.

8. If we need to have more control over which layers affect the loss function or which parameters should be taken
  into account during backpropagation.

9. Yes, Keras components can contain arbitrary python code. It'll get converted to tf functions automatically.

10. ?

11. ? 

In [1]:
import tensorflow as tf
import numpy as np

In [3]:
print (tf.range(10))
print (tf.constant(np.arange(10)))


tf.Tensor([0 1 2 3 4 5 6 7 8 9], shape=(10,), dtype=int32)
tf.Tensor([0 1 2 3 4 5 6 7 8 9], shape=(10,), dtype=int64)


In [31]:
fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()

# train, validation and test split
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist
X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]

X_train, X_valid, X_test = X_train / 255, X_valid / 255, X_test / 255

num_classes = len(np.unique(y_train_full))

### Exercise 12

Custom layer that performs layer normalization

In [21]:
class NormalizedLayer(tf.keras.layers.Layer):
  def __init__(self, **kwargs):
    super().__init__(**kwargs)
    self.eps = 0.001
  
  def build(self, input_shape):
    super().build(input_shape)
    self.alpha = self.add_weight("alpha", input_shape[-1:], dtype="float32", initializer="ones")
    self.beta = self.add_weight("beta", input_shape[-1:], dtype="float32", initializer="zeros")
  
  def call(self, X):
    mean, var = tf.nn.moments(X, axes=-1, keepdims=True)
    std = tf.sqrt(var)

    return self.alpha * ((X - mean) / (std + self.eps)) + self.beta
    

In [22]:
model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=[28, 28]),
        NormalizedLayer(),
        NormalizedLayer(),
        tf.keras.layers.Dense(10, activation="softmax")
])

model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_5 (Flatten)         (None, 784)               0         
                                                                 
 normalized_layer_2 (Normal  (None, 784)               1568      
 izedLayer)                                                      
                                                                 
 normalized_layer_3 (Normal  (None, 784)               1568      
 izedLayer)                                                      
                                                                 
 dense_3 (Dense)             (None, 10)                7850      
                                                                 
Total params: 10986 (42.91 KB)
Trainable params: 10986 (42.91 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [10]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=5, validation_data=(X_valid, y_valid))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [16]:
norm_0 = model.get_layer("normalized_layer")
norm_1 = model.get_layer("normalized_layer_1")

weights_0, biases_0 = norm_0.get_weights()
print ("Norm 0: ", weights_0.sum(), biases_0.sum())

weights_1, biases_1 = norm_1.get_weights()
print ("Norm 1: ", weights_1.sum(), biases_1.sum())

Norm 0:  783.2976 4.2915344e-06
Norm 1:  785.9258 -0.009297669


In [26]:
# Comparing it to Keras LayerNormalization layer which does the same thing

model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=[28, 28]),
        tf.keras.layers.LayerNormalization(),
        tf.keras.layers.LayerNormalization(),
        tf.keras.layers.Dense(10, activation="softmax")
])

model.summary()



Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_6 (Flatten)         (None, 784)               0         
                                                                 
 layer_normalization_4 (Lay  (None, 784)               1568      
 erNormalization)                                                
                                                                 
 layer_normalization_5 (Lay  (None, 784)               1568      
 erNormalization)                                                
                                                                 
 dense_4 (Dense)             (None, 10)                7850      
                                                                 
Total params: 10986 (42.91 KB)
Trainable params: 10986 (42.91 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [23]:

model.compile(loss="sparse_categorical_crossentropy", optimizer="sgd", metrics=["accuracy"])
history = model.fit(X_train, y_train, epochs=5, validation_data=(X_valid, y_valid))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [27]:
# Comparing the sum of weights and biases we got to the custom normalization layer

norm_0 = model.get_layer("layer_normalization_4")
norm_1 = model.get_layer("layer_normalization_5")

weights_0, biases_0 = norm_0.get_weights()
print ("Norm 0: ", weights_0.sum(), biases_0.sum())

weights_1, biases_1 = norm_1.get_weights()
print ("Norm 1: ", weights_1.sum(), biases_1.sum())

Norm 0:  784.0 0.0
Norm 1:  784.0 0.0


### Exercise 13

Custom training loop

In [51]:
model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=[28, 28]),
        tf.keras.layers.Dense(300, activation="relu"),
        tf.keras.layers.Dense(100, activation="relu"),
        tf.keras.layers.Dense(10, activation="softmax")
])

def random_batch(X, y, batch_size=32):
  idx = np.random.randint(len(X), size=batch_size)
  X_batch = X[idx]
  y_batch = y[idx]
  y_batch_one_hot = tf.keras.utils.to_categorical(y_batch, num_classes=num_classes)
  return X_batch, y_batch_one_hot

def print_status_bar(step, total, loss, metrics=None):
  metrics = " - ".join([f"{m.name}: {m.result()}" for m in [loss] + (metrics or [])])
  end = "" if step < total else "\n"
  print (f"\r{step}/{total} - " + metrics, end=end)

n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.01)
loss_fn = tf.keras.losses.categorical_crossentropy
mean_loss = tf.keras.metrics.Mean(name="mean_loss")
valid_loss = tf.keras.metrics.Mean(name="valid_loss")
metrics = [tf.keras.metrics.Accuracy()]

for epoch in range(1, n_epochs + 1):
  print ("Epoch {}/{}".format(epoch, n_epochs))
  for step in range(1, n_steps + 1):
    X_batch, y_batch = random_batch(X_train, y_train)
    with tf.GradientTape() as tape:
      # Forward pass
      y_pred = model(X_batch, training=True)
      main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
      loss = tf.add_n([main_loss] + model.losses)
    
    # Backpropagation
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    mean_loss(loss)
    for metric in metrics:
      metric(np.argmax(y_batch, axis=1), np.argmax(y_pred.numpy(), axis=1))
    
    print_status_bar(step, n_steps, mean_loss, metrics)
  

  # Print validation accuracy
  X_valid_batch, y_valid_batch = random_batch(X_valid, y_valid)
  y_valid_pred = model(X_valid_batch, training=True)
  valid_main_loss = tf.reduce_mean(loss_fn(y_valid_batch, y_valid_pred))
  valid_loss_n = tf.add_n([valid_main_loss] + model.losses)
  valid_loss(valid_loss_n)

  for metric in metrics:
    metric(np.argmax(y_batch, axis=1), np.argmax(y_pred.numpy(), axis=1)) 

  print ("Validation metrics: ")
  print_status_bar(step, n_steps, valid_loss, metrics)

  for metric in [mean_loss] + metrics:
    metric.reset_states()

Epoch 1/5
1718/1718 - mean_loss: 0.711555004119873 - accuracy: 0.77004510164260869
Validation metrics: 
1718/1718 - valid_loss: 0.4127207100391388 - accuracy: 0.7701061367988586
Epoch 2/5
1436/1718 - mean_loss: 0.4920012354850769 - accuracy: 0.82990944385528568

KeyboardInterrupt: 