<a id = 'Top'></a> 

# Keras 

In [1]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers # here, this module is referred to as layers and a layers subclass instance is referred to l

In [2]:
#Will be used later on
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255

y_train = y_train.astype("float32")
y_test = y_test.astype("float32")

x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

## Table of Contents
#### <a href = '#Making a Model (Built-in)'>Section: Making a Model (Built-in)</a>
#### <a href = '#Making a Model (Subclassing)'>Section: Making a Model (Subclassing)</a>
#### <a href = '#Training and Evaluation (Built-in)'>Section: Training and Evaluation (Built-in)</a>
#### <a href = '#Customizing Training'>Section: Customizing Training (Built-in)</a>
#### <a href = '#Index'>The Index</a>

<a id = 'Making a Model (Built-in)'></a>

## Making a Model (Built-in)

### Sequential Model 

#### Making Layers

This is the easiest model to make but it is also the least flexible. You should use sequential when you just have a plain stack of layers, one after the other, where each layer has one input tensor and one output tensor. Creating one is extremely simple. 

First note that you can get layers from tensorflow.keras.layers. The reason you don't need to pass in input dimensions is that the first time you call a layer it internally calls a build() method that uses the input to configure the variables. One possible layer could be:
* <b>layers.Dense(output_features, name = 'None'...)</b> => Creates your favorite classic layer with output_features neurons at the end. 

* NOTE: There is a ton more arguments you can pass here visit $ \hspace{2mm} $
<a href='https://keras.io/api/layers/regularizers/'>regularization</a> $\hspace{2mm}$ 
<a href = 'https://keras.io/api/layers/activations/'>activation functions</a> $ \hspace{2mm} $
<a href = 'https://keras.io/api/layers/initializers/'>initializers</a>


A couple attributes of a specific layer(see <a href = 'https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer#attributes_1'>attr</a> for more)

* <b>l.activation</b> => Returns the activation function used by the specific layer
* <b>l.trainable_weights</b> => Returns a list of all the variables that gradient descent can be applied on
* <b>l.non_trainable_weights</b> => Returns a list of all the variables not being trained
* <b>l.weights</b> => Returns the concatenation of both of the lists above
* <b>l.name</b> => Returns name of layer
* <b>l.output</b> => Returns output of that layer

Note here we use:
* <b>tf.random.normal(shape, name = None)</b> => creates an array from a normal distribution with shape shape and name name

In [45]:
dense_l = layers.Dense(2, name = 'dense_l', kernel_initializer = 'random_normal')
dense_l(tf.random.normal([1,5]))

<tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[ 0.17495963, -0.03159937]], dtype=float32)>

#### Making a Sequential Model 

You can create a Sequential model by using the first method or using the other 2 methods
* <b>pass a list of layers/models to the Sequential constructor </b>
* <b>smodel.add(l)</b> => Appends l as a layer to the end
* <b>smodel.pop()</b> => Pops layer at the end

You can also check what layers currently exist with 
* <b>smodel.layers</b> => Returns list of current layers 

Note that simply making a instantiating Dense layers won't create the biases and the weights (because of how build() works). Normally you won't notice since the first input you pass is used to create this. However, this also means you won't be able to use methods like <b>model.summary()</b> or <b>model.weights</b>. If you want to specifiy the input shape using either works
* <b>keras.Input(input_features, name = 'None'...)</b> => tells the model the number of features in the input. Don't include batch size as that is assumed to be variable. dtype is also assumed to be tf.float32
* <b>layers.Dense(..., input_shape, ...)</b> => passing in a tuple with the input shape to the first layer in a model automatically builds it as well

If you want to see a summary of what the model looks like you can use
* <b>smodel.summary()</b> => Prints a summary of what the model looks like. Useful in debugging when used with <b>add()</b>. Note that the None dimension simply means that dimension can have any shape. For us this is because batch sizes can be variable

In [5]:
smodel = keras.Sequential([
        keras.Input(5, name = 'inp'),
        layers.Dense(5, activation="relu", name = 'dense_1'),
        layers.Dense(3, activation="relu", name = 'dense_2'),
        layers.Dense(2, name = 'dense_3'),
], name = 'MySequential')

smodel.summary()

Model: "MySequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 5)                 30        
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 18        
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 8         
Total params: 56
Trainable params: 56
Non-trainable params: 0
_________________________________________________________________


### Functional API

#### The main idea
This API is a way to create models that are more flexible than Sequential. It can handle non-linear topologies, shared layers, and multiple inputs and outputs. The way it works is based on the assumption that the model is a directed acyclic graph (no closed loops) so the API builds a graph of layers. The way it works is that everytime you call a function on some tensor it makes a connection between that tensor and the function. When you have multiple calls they all build up and form the graph. 

Specifically, you start with an input tensor (usually keras.Input). Then you call your layers (or models ;) ) in order on input. Finally, you use the first one. You can also see what you have so far by (again) using the second one
* <b>keras.Model(inputs, outputs, name = None)</b> => creates a model using the functions used to get from input to output
* <b>model.summary()</b> => Prints out what layers and output shapes the model has


In [4]:
inp = keras.Input(5, name='inp')
x = layers.Dense(5, activation="relu", name = 'dense_1')(inp)
x = layers.Dense(3, activation="relu", name = 'dense_2')(x)
out = layers.Dense(2, name = 'dense_3')(x)

model = keras.Model(inp, out, name = 'my sequential')
model.summary()

Model: "my sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
inp (InputLayer)             [(None, 5)]               0         
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 30        
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 18        
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 8         
Total params: 56
Trainable params: 56
Non-trainable params: 0
_________________________________________________________________


#### Some things you can do 

You can reuse layers/models to create models, that is, you can proceed normally, but create another model using some input and output between what you already had. See <a href='https://www.tensorflow.org/guide/keras/functional#all_models_are_callable_just_like_layers'>this</a> to how this can be used to create an autoencoder model

You can also have multiple inputs and outputs and have layers/models just enter in the middle of your structure (still a graph). With these you can also assign your own losses to each output and weights on the losses too! See <a href='https://www.tensorflow.org/guide/keras/functional#all_models_are_callable_just_like_layers'>this</a> to how this can be used to create something cool

By using layer.output you can also create a feature extraction model in one line of code. 

<a id = 'Making a Model (Subclassing)'></a>

## Making a Model (Subclassing)


#### Making layers for models

Layers are a key part of Keras. They encapsulate both a state (consisting of the layer's variables) and a transformation from inputs to outputs. Creating a layer looks the same as it would in tensorflow, except for the way building variables is done. To start, subclass <b>tf.keras.layers.Layer</b>. You will need to implement the following methods:

1. <b>\_\_init__(self)</b> => call super and store basic information about the layer like the output dimension. Hold off on the specifying the input dimension. 
2. <b>build(self, input_shape)</b> => takes in an input_shape and creates the models variables accordingly through the method 
    * <b>self.add_weights(shape, initializer=None)</b> => Add trainable variable with specific shape and initializer provided. This can be the string identifier or an initializer object (like <b>tf.random_normal_initializer()</b>)
3. <b>call(self, x)</b> => Put in whatever logic you use to tranform the input to the output. If you are writing a layer that depends on whether you are training or not (like dropout) you can also accept the boolean argument training and keras will be able to use it.
    
Note that if you assign a layer instance as an attribute of another layer (to create a block), the outer layer automatically tracks the variables of the inner layer. It is a good idea to create these layers in the <b>\_\_init()__</b> method. 


In [13]:
class MyDense(layers.Layer):
    
    def __init__(self, output_dimension, **kwargs):
        super(MyDense, self).__init__(**kwargs)
        self.output_dim = output_dimension
    
    def build(self, input_shape):
        self.W = self.add_weight(shape=(input_shape[-1], self.output_dim), initializer= tf.random_normal_initializer())
        self.B = self.add_weight(shape= (self.output_dim,), initializer = 'zeros')
    
    def call(self, x): 
        return tf.nn.tanh(x @ self.W + self.B)
    
class MyMLPBlock(layers.Layer):
    
    def __init__(self): #kwargs not needed but helpful for using all of keras
        super(MyMLP, self).__init__()
        self.l1 = MyDense(300)
        self.l2 = MyDense(100)
        self.l3 = MyDense(10)
    
    def call(self, x):
        x = self.l1(x)
        x = self.l2(x)
        x = self.l3(x)
        return x
        
m = MyMLPBlock()
m(keras.layers.Input(784)) #This is optional but put here to demonstrate it works
m(tf.random.normal((1,784)))

<tf.Tensor: shape=(1, 10), dtype=float32, numpy=
array([[-0.14019845,  0.23774377, -0.3617659 ,  0.33904552,  0.26466587,
         0.22282706,  0.21561241, -0.11508423,  0.40910867,  0.12016662]],
      dtype=float32)>

#### Adding losses and metrics to your layers

Losses refer to a quantity your model's endgoal is to minimize and metrics refer to things you track during training. Note that every time you call the layer its losses are reset so that it is fresh from the most recent pass. Additionally, any out layers also track inner layer's losses. To add a loss use the following.
* <b>self.add_loss(value)</b> => Makes specific layer also have this loss. 
To retrieve this loss later on use 
* <b>layer.losses</b> => Returns the loss that layer has

The way metrics work is that when you add a metric it adds a metric object that keeps track of whatever you make it keep track of. To add a metric use
* <b>self.add_metric(value, name)</b> => Makes specific layer track this metric. Name is useful to know which number is from which metric. 
To then retrieve the result you want to use this method of a metric
* <b>metric.result()</b> => Returns the value it stored



In [26]:
class MyDense(layers.Layer):
    
    def __init__(self, output_dimension, **kwargs):
        super(MyDense, self).__init__(**kwargs)
        self.output_dim = output_dimension
    
    def build(self, input_shape):
        self.W = self.add_weight(shape=(input_shape[-1], self.output_dim), initializer='random_normal')
        self.B = self.add_weight(shape= (self.output_dim,), initializer = 'zeros')
        self.B.assign_add(tf.fill(self.B.shape, 0.5)) #Increase bias so that activations are more likely above 0.5
    
    def call(self, x): 
        output = tf.nn.tanh(x @ self.W + self.B)
    
        #New stuff here:
        self.add_loss(tf.norm(self.W)**2)     
        num_under_half = tf.cast(tf.math.count_nonzero(tf.math.greater(0.5, output)), dtype=tf.float32)
        self.add_metric(num_under_half, name='num under 5')
        
        return output

layer = MyDense(5)
layer(keras.layers.Input(4))
output = layer(tf.constant([[1,1,0,0]], dtype = tf.float32))

print(layer.losses)                
print(output)
print(layer.metrics[0].result())

[<tf.Tensor: shape=(), dtype=float32, numpy=0.035382714>]
tf.Tensor([[0.4227244  0.51615036 0.39882913 0.44017404 0.40596846]], shape=(1, 5), dtype=float32)
tf.Tensor(4.0, shape=(), dtype=float32)


#### Making models using layers

Generally layers are used to describe the inner computation blocks and models are used to define the outer model you will be training. The Model class has the same API as the Layer class but has a couple extra features like being able to use built-in training methods <b>model.fit(), model.predict(), model.evaluate()</b>, you can use <b>model.layers</b> to see the layers the model has, and you can save and serialize your model. Basically if your training or saving you should use Model

Don't worry about the code that follows. That will be explained in the next section. For now just note the l2 loss gets factored in and the metric is seen in fit()!

In [13]:
class MyDense(layers.Layer):
    
    def __init__(self, output_dimension, activation=tf.nn.relu, **kwargs):
        super(MyDense, self).__init__(**kwargs)
        self.output_dim = output_dimension
        self.regular = 0.005
        self.activation = activation
    
    def build(self, inp):
        
        self.W = self.add_weight(shape=(inp[-1], self.output_dim))
        init = tf.math.divide(tf.random.normal([inp[-1], self.output_dim]), tf.sqrt(tf.cast(inp[-1], dtype = tf.float32)))
        self.W.assign(init)
        self.B = self.add_weight(shape= (self.output_dim,), initializer = 'zeros')
    
    def call(self, x): 
        l2 = self.regular*tf.norm(self.W)**2
        self.add_loss(l2)   
        self.add_metric(l2, name = 'l2_loss')
        return self.activation(x @ self.W + self.B)
    
class MyModel(keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        self.l1 = MyDense(300)
        self.l2 = MyDense(100)
        self.l3 = MyDense(10, tf.nn.softmax)
    
    def call(self, inputs):
        x = self.l1(inputs)
        x = self.l2(x)
        x = self.l3(x)
        return x
    
digit_predictor = MyModel()

optimizer = tf.keras.optimizers.SGD(learning_rate=7e-2)

digit_predictor.compile(
        optimizer=optimizer,
        loss="sparse_categorical_crossentropy",
        metrics=["sparse_categorical_accuracy"],
    )
digit_predictor.fit(x_train, y_train, epochs=2, batch_size = 64)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x201861c7e50>

<a id = 'Training and Evaluation (Built-in)'></a>

## Training and Evaluation (Built-in)

In [4]:
#Code will be used frequently!

inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, activation="softmax", name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

#### Basic Info 

This section looks at how you can use built-in Keras methods to train and see your model's performance. When using data with built-in models you should make sure they are either NumPy Arrays or a <b>tf.data.Dataset</b> instance. This section is long so here is what you can expect:
* Specifying optimizers, losses and metrics 
* Custom losses
* Custom metrics
* Training and evaluating the model 
* Sample/Class Weighting 
* Models with multiple inputs and outputs
* Using callbacks 
* Learning Rate Schedules
* Using the tensorboard callback

#### Specifying optimizers, losses and metrics

Making a model is great but you also need to train it. Before training you need to specify the optimizer, loss, and optional metrics you want to track. These are all arguments you can pass into this method:
* <b>model.compile(optimzer, loss, metrics = None)</b> => Makes the model use specified functions when training with built-in methods

You can find options for <a href = 'https://keras.io/api/optimizers'>optimizers</a>, <a href = 'https://keras.io/api/losses'>losses</a>, and <a href = 'https://keras.io/api/metrics'>metrics</a> in the keras docs. Since we're using built-in methods we can also just pass in the string identifier instead of the object. 

Here are some of the most useful ones:
1. Optimizers: <b>SGD()</b> (with or without momentum), <b>RMSprop()</b>, <b>Adam()</b>
2. Losses: <b>MeanSquaredError()</b> - regression, <b>CosineSimilarity()</b> - regression, <b>KLDivergence()</b> - probabilistic, <b>CategoricalCrossentropy</b> - probabilistic
3. Metrics: <b>BinaryAccuracy()</b> - 2 targets, <b>CategoricalAccruacy()</b> - one-hot encoded targets, <b>SparseCategorialAccuracy()</b> - integer targets

In [6]:
model.compile(optimizer = 'rmsprop', loss="sparse_categorical_crossentropy", metrics=["sparse_categorical_accuracy"])

#### Custom losses

To create custom losses you can do 2 things. The first way is to create a function that accepts two inputs, y_true and y_pred and then return the error.

If you need to accept other arguments you can subclass <b>tf.keras.losses.Loss</b>. You will need to implement
* <b>\_\_init(self)__</b> => accept parameters to use during call. Remember to call super
* <b>call(self, y_true, y_pred)</b> => Use the targets and predictions to compute the model's loss

Note that certain losses like regularization don't require targets and predicitions. For those kinds of losses you can add losses from within a custom layer (as seen above)

In [7]:
def custom_mean_squared_error(y_true, y_pred):
    return tf.math.reduce_mean(tf.square(y_true-y_pred))

#model.compile(optimizer=keras.optimizers.Adam(), loss=custom_mean_squared_error, metrics = ['accuracy'])

class CustomMSE(keras.losses.Loss):
    def __init__(self, reg_factor=0.05, **kwargs):
        super().__init__(**kwargs)
        self.reg = reg_factor
    
    def call(self, y_true, y_pred):
        mse = tf.math.reduce_mean(tf.square(y_true-y_pred))
        confidence_unbooster = tf.math.reduce_mean(tf.square(0.5-y_pred)) #Reduce overfitting by making it less confident?
        return mse + self.reg * confidence_unbooster

model.compile(optimizer='Adam', loss = CustomMSE(name='nice'), metrics = ['accuracy'])

#Note that both need one-hot encoded labels so we would need to convert the sparse y_train. 
#y_train_one_hot = tf.one_hot(y_train, depth=10)

#### Custom Metrics

To create a custom metric you have to subclass <b>tf.keras.metrics.Metric</b>. You will need to implement 
1. <b>\_\_init(self)</b> => Create state variables that you will need to track for your metric. You might need to use 
    * self.add_weight()
2. <b>update_state(self, y_true, y_pred, sample_weight=None)</b> => Use the targets, predicitions, and optional weights to update the state variables keeping track of the metric
3. <b>result(self)</b> => Use the state variables to return the final result
4. <b>reset_states(self)</b> => Reinitialize the state of the metric

Note that certain metrics like the one we made above don't require targets and predicitions. For those kinds of metrics you can use add_metric() from within a custom layer (as seen above). 

In [9]:
# Count how many samples were correctly identified as belonging to the right class
class CategoricalTruePositives(keras.metrics.Metric):
    def __init__(self, name = 'categorical_true_positives'):
        super(CategoricalTruePositives, self).__init__(name=name)
        self.true_positives = self.add_weight(name='ctp', initializer='zeros')
        
    def update_state(self, y_true, y_pred, sample_weight=None): 
        y_pred = tf.reshape(tf.argmax(y_pred, axis = 1), shape = (-1,1)) 
        values = tf.cast(y_true, "int32") == tf.cast(y_pred, "int32")
        values = tf.cast(values, "float32")
        
        if sample_weight:
            sample_weight = tf.cast(sample_weight, tf.float32)
            values *= sample_weight
            
        self.true_positives.assign_add(tf.reduce_sum(values))
    
    def result(self):
        return self.true_positives
    
    def reset_states(self):
        self.true_positives.assign(0.0)
        
model.compile(optimizer='Adam', loss = 'sparse_categorical_crossentropy', metrics=[CategoricalTruePositives(), 'accuracy'])

#### Training and evaluating the model

keras makes things super easy. All you need to do is call one function. If your using <b>np.array</b> use it like this
* <b>model.fit(x_train, y_train, batch_size = None, validation_split = 0.0, epochs = 1, validation_data = None)</b> => Trains the model for specified epochs. If validation_split is given it takes away that much data to reserve for validation. If validation_data (a tuple containing x_train and y_train) is given it keeps that for validation. Note that validation_split should only be used with numpy data 

You can also use a <b>tf.data.Dataset</b> object. To start we use the following methods to get the input into the form of a Dataset (Note that although there are many different dataset classes they can all be used with fit). Finally, we use fit in a different way
* <b>tf.data.Dataset.from_tensor_slices((x_train, y_train))</b> => Take an aeeay 
* <b>dataset.shuffle(buffer_size)</b> => Creates a buffer with the specified size. When getting data it chooses a random element in the buffer and then replaces it with the next element in the dataset. (So buffer_size of 1 actually gets no shuffling). This returns a ShuffleDataset Object
* <b>dataset.batch(batch_size)</b> => Returns BatchDataset. This contains data but in batches with size that you specify
* <b>model.fit(dataset, epochs=1, steps_per_epoch=None, validation_data=None, validation_steps=None)</b> => Trains the model on the dataset for specified number of epochs. If you don't want to use the entire dataset per epoch you can use steps_per_epoch to specify how many batches you want to see per epoch. You can also pass in validation data and do the same thing with steps with the last two arguments

You should use these depending on the circumstances. Usually you should use:
* NumPy data when your data is small and fits in memory 
* Dataset objects if your data is large and you need to do distributed training
* Sequence objects if you have large datasets and you need to use a lot of custom Python processing. See <a href='https://www.tensorflow.org/guide/keras/train_and_evaluate#other_input_formats_supported'>this</a> to learn more

In [10]:
data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
data = data.shuffle(x_train.shape[0]).batch(64)
model.fit(data, epochs=2, steps_per_epoch=100)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x1f2e493eee0>

In [11]:
# Use one of the above compiling schemes
model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=1)



<tensorflow.python.keras.callbacks.History at 0x1f28c286d30>

#### Sample/Class weighting

Sometimes your data doesn't have a lot of samples of one class. In these cases it is a good idea to weight them. By default fit() uses the frequency of classes to weight the data but you can set them yourself as well. To weight by class use:
* <b>model.fit(..., class_weights=None, ...)</b> => Weights the data during training based on class_weights. This argument should be a dictionary with keys being the indexes of the class (0-n-1) and the values being the weight of the corresponding class.
When weighting by samples it depends on whether you are using NumPy or Dataset. With NumPy use the former in fit() and with Dataset use the latter when creating the dataset:
* <b>model.fit(..., sample_weight=None, ...)</b> => Gives each specific sample weightage based on sample_weight. This argument should be a array of numbers that has the number of elements in your dataset in its size. Each index should have the corresponding weight of the sample. 
* <b>tf.data.Dataset.from_tensor_slices((x_train, y_train, sample_weight))</b> => Here sample_weight should be the exact same as above

EXAMPLES NOT SHOWN HERE. CHECK OUT <a href='https://www.tensorflow.org/guide/keras/train_and_evaluate#using_sample_weighting_and_class_weighting'>DOCS</a> TO LEARN MORE

#### Models with multiple inputs and outputs

Sometimes your models have multiple inputs and outputs and they might need different losses and different metrics that they need to follow. Furthermore, each loss might need to be weighted differently. To accomplish this use:
* <b>model.compile(loss, metrics, loss_weights)</b> => instead of passing one object pass in a list for loss and metrics. If you gave the output layers a name you could also pass in a dict and have the keys be the names. If you don't want to train on one output layer you can also make its corresponding loss None. loss_weights can be a list or a dictionary containing the real valued weights.

Similarly, when using fit(), just pass in a list to fit() or the Dataset() object. 

In [65]:
inp1 = keras.Input(9, name='in1')
inp2 = keras.Input(4, name='in2')

x = layers.concatenate([inp1, inp2])

out1 = layers.Dense(4, name='out1')(x)
out2 = layers.Dense(2, name='out2')(x)

model = keras.Model(inputs=[inp1, inp2], outputs=[out1, out2])

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
    loss= {'out1' : keras.losses.MeanSquaredError(), 'out2' : keras.losses.CategoricalCrossentropy()},
    loss_weights = [0.3,1],
    metrics=[[keras.metrics.MeanAbsolutePercentageError(), keras.metrics.MeanAbsoluteError()], 
             [keras.metrics.CategoricalAccuracy()]])

# Dummy NumPy data
x_1 = np.random.random_sample(size=(1000,9))
x_2 = np.random.random_sample(size=(1000, 4))
y_1 = np.random.random_sample(size=(1000, 4))
y_2 = np.random.random_sample(size=(1000, 2))

# Fit on lists
model.fit({'in1':x_1, 'in2':x_2}, [y_1, y_2], batch_size=32, epochs=2)

Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x182c31314c0>

#### Using Callbacks 
Callbacks are objects that are called at different points during training (start/end of epoch/batch etc). They have many applications like checkpointing, implementing learning rate schedules, sending an email when training is done, basically anything dynamic. They get passed as a list to a callbacks argument to fit() like metrics. Here are some useful callbacks:
* <b>keras.callbacks.EarlyStopping(monitor="val_loss", min_delta=1e-2, patience=2, verbose=1)</b> => Implements Early stopping into your model. monitor is a string that refers to the name of what you are monitoring. min_delta is the threshold below which it doesn't count as improving. patience is how many epochs you want to give it a chance. verbose decides how much detail is in the updates it provides. 
* <b>keras.callbacks.ModelCheckpoint(filepath="mymodel_{epoch}", save_best_only=True, monitor="val_loss", verbose=1, save_freq=None)</b> => Saves the model. filepath is where you save the model. save_best_only and monitor make it so that the current checkpoint gets overwritten only if whatever you are monitoring has improved. verbose once again decides how much it displays. save_freq decides how many batches pass before it saves. Check out the <a href='https://www.tensorflow.org/guide/keras/train_and_evaluate#checkpointing_models'>DOCS</a> to see it in action being more organized (having its own directory, etc.)

If you want to create your own custom callback you would need to subclass <b>keras.callbacks.Callback</b> and use functions you can check out <a href='https://www.tensorflow.org/guide/keras/custom_callback/'>here</a>

In [12]:
callbacks = [keras.callbacks.EarlyStopping(monitor="val_loss", min_delta=1e-2, patience=2, verbose=1), 
            keras.callbacks.ModelCheckpoint(filepath="mymodel_{epoch}", save_best_only=True, monitor="val_loss", verbose=1)] 

model.fit(x_train, y_train, epochs=20, batch_size=64, callbacks=callbacks, validation_split=0.2)

Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.15724, saving model to mymodel_1
INFO:tensorflow:Assets written to: mymodel_1\assets
Epoch 2/20

Epoch 00002: val_loss improved from 0.15724 to 0.14180, saving model to mymodel_2
INFO:tensorflow:Assets written to: mymodel_2\assets
Epoch 3/20

Epoch 00003: val_loss improved from 0.14180 to 0.13010, saving model to mymodel_3
INFO:tensorflow:Assets written to: mymodel_3\assets
Epoch 4/20

Epoch 00004: val_loss did not improve from 0.13010
Epoch 5/20

Epoch 00005: val_loss improved from 0.13010 to 0.12584, saving model to mymodel_5
INFO:tensorflow:Assets written to: mymodel_5\assets
Epoch 00005: early stopping


<tensorflow.python.keras.callbacks.History at 0x1f28e68e820>

#### Learning Rate Schedules

You can either have a static schedule which has all the options preset, or you can have a dynamic schedule that uses callbacks to decide what to do. 

If you are using a static schedule, first create the object using the function below (other built-in schedules inlcude: ExponentialDecay, PiecewiseConstantDecay, PolynomialDecay, and InverseTimeDecay). Then pass that object in as the argument learning_rate to the optimzer: 
* <b>keras.optimizers.schedules.ExponentialDecay(initial_learning_rate, decay_steps, decay_rate, staircase=False)</b> => Creates a learning rate schedule object which calculates the rate with the following formula: inital_lr * drate ^ (steps/dsteps). Making staircase True makes that exponent use integer division. Think about what the formula means.

If you are using a dynamic shedule you can make your own callback or even just use the ReduceLROnPlateau callback

In [13]:
lr_schedule = keras.optimizers.schedules.ExponentialDecay(0.1, decay_steps=1e-5, decay_rate=0.96, staircase=True)
optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)

#### Using the Tensorboard Callback

Tensorboard is epic for visualizing your model so this is nice to use. To start, since you have installed tensorflow with pip (right?), you can launch tensorboard from the in the notebook using <b>%tensorboard --logdir=/full_path_to_your_logs</b> or from the command line by taking out the %

Then just use the tensorboard callback:
* <b>keras.callbacks.TensorBoard(log_dir, histogram_freq=0, embeddings_freq=0,  update_freq="epoch")</b> => Lets you use TensorBoard. log_dir should be the FULL PATH to your logs. the frequencies tell it how often to log those components. update_freq decides how often it writes logs in the first place.  

CURRENTLY ON THIS MACHINE SOME ERROR OCCURS THAT PREVENTS TENSORBOARD FROM PROPERLY RUNNING

In [None]:
path = "C:/Users/megat/Yash/yashcoding/Python Coding/jupyter/Reinforcement Learning"
callbacks = [keras.callbacks.TensorBoard(log_dir=path, histogram_freq=0, embeddings_freq=0, update_freq="epoch")]
model.fit(x_train, y_train, epochs=2, batch_size=64, callbacks=callbacks, validation_split=0.2)

<a id = 'Customizing Training'></a>

## Customizing Training

#### Basic info

Using fit() is nice and easy but sometimes you need that additional functionality. There are basically two different things you can do. Either control what happens in fit or just completely write a training loop from scratch. Both methods are relatively easy to use, the latter allowing you more control over what is happening. 

#### Customizing fit() and evaluate()

To customize what happens in fit() you need to create a class that subclasses <b>keras.Model</b>. With that in hand all that is left to do is to override this function:
* <b>train_step(self, data)</b> => A function that gets run in fit to train your model. This function should use the data passed into it to train the model and updates its variables (and metrics). At the very end it should return a dictionary with keys being the metric's names and values being the result. 

It is useful to use the following attributes and methods in your code: 
* <b>self.compiled_loss(target, pred, regularization_losses = None, sample_weight = None)</b> => Calculates the loss using the loss function passed in compile. Regularization losses are extra losses that get append like the losses from each layer and sample_weights are the importantance for each sample 
* <b>self.losses</b> => Returns a list of all the layer losses
* <b>self.trainable_variables</b> => Returns a list containing all the variables that are trainable
* <b>self.optimizer.apply_gradients(g_v_tuples)</b> => Using the optimizer passed in compile updates variables based on input. Input should be a list containing tuples of gradient - variable pairs. It is common to zip() tape.gradient() and trainable variables to accomplish this.  
* <b>self.compiled_metrics</b> => An object containing all the gradients passed in compile. Useful to call update_state() and result() from it during the train_step.

If for some reason you don't want to pass metrics in compile you can also just use it directly in train_step. To make sure reset_states() is called however, you need to make sure you keep those metrics in a property called metrics and don't pass in your selected metrics in compile. (Note this isn't shown below)

Customizing evaluate also works in the exact same way. The only difference would be that you are not updating your trainable_variables and you are instead overriding test_step()


In [47]:
import numpy as np

class CustomModel(keras.Model):
    
    def train_step(self, data):       
        if len(data) == 3:
            x, y, sample_weight = data
        else:
            x, y = data
            sample_weight = None
        
        with tf.GradientTape() as tape:
            y_pred = self(x, training=True)
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)
        grad = tape.gradient(loss, self.trainable_variables)
        
        self.optimizer.apply_gradients(zip(grad, self.trainable_variables))
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
        
        return {m.name: m.result() for m in self.metrics}

        
# Construct and compile an instance of CustomModel
inputs = keras.Input(784)
x = keras.layers.Dense(100)(inputs)
outputs = keras.layers.Dense(10)(x)
model = CustomModel(inputs, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])

# Just use `fit` as usual
model.fit(x_train, y_train, epochs=3, batch_size = 64)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x1af0a049100>

#### Creating a custom training loops 

To write your own training loop you will have to use <b>tf.GradientTape()</b>.

You can handle metrics by calling the following three functions of a metric at the right time:
* <b>metric.update_state(y_true, y_pred)</b> => Call after every batch is done
* <b>metric.result()</b> => Get the results to display the metric when you want to (typically at the end of epoch)
* <b>metric.reset_states()</b> => Clear the data the metric has stored (typically also at the end of epoch)

It is a good idea to put the computation heavy stuff into a graph. To do this decorate a function with
* <b>@tf.function()</b> => traces a graph of the whatever is inside, useful for speeding up computation greatly. From a simple test the code below was about 3x slower without tf.function(). An amazing speedup!

Your structure should basically be:
1. Make the model & Instantiate the loss, optimizer, and metrics & Get the data into the desired format
2. Create a trainstep function that 
    - runs the model under a tape context
    - updates the metrics and does gradient descent
    - is decorated by tf.function()
3. Iterate over the epochs
    - iterate over batches and run trainstep/ display any tracking information
    - display metrics and other information at end of epoch

In [24]:
@tf.function()
def train_step(x, y):
    # Open a tape context to record operations so we can do gradient descent 
    with tf.GradientTape() as tape:
        pred = model(x)
        loss = loss_fn(y, pred)
        loss += sum(model.losses) # Add in the regularization losses from the layers as well!
        
    # Update metric
    train_acc_metric.update_state(y, pred)

    # Do Gradient Descent!
    gradient = tape.gradient(loss, model.trainable_weights)
    optimizer.apply_gradients(zip(gradient, model.trainable_weights))

# Make the model 
inp = keras.Input(784)
x = layers.Dense(100)(inp)
out = layers.Dense(10, activation=tf.nn.softmax)(x) #Our loss function is probabilistic 
model = keras.Model(inputs=inp, outputs=out)

# Instantiate the loss, optimizer, metrics
optimizer = keras.optimizers.SGD(learning_rate=0.05, momentum=0.04, name="SGD")
loss_fn = keras.losses.SparseCategoricalCrossentropy()
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()

# Get the data into a dataset (easier to shuffle and take batches)
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size=64)
num_steps = len(train_dataset)

# Iterate over epochs and batches
for epoch in range(3):
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        
        # Use the fast training step
        train_step(x_batch_train, y_batch_train)
        
        # Progress check
        print(f"Epoch {epoch}, {step}/{num_steps} steps done", end='\r')
    
    # Get the results of the metric and display them
    acc = train_acc_metric.result()
    train_acc_metric.reset_states()
    print(f"Finished Epoch {epoch} with accuracy {acc:.4f}")

Finished Epoch 0 with accuracy 0.8713
Finished Epoch 1 with accuracy 0.9084
Finished Epoch 2 with accuracy 0.9143


<a id = 'Index'> </a>

## Index

<a href = '#Top'>Back to top?</a>

#### <a href = '#Making a Model (Built-in)'>Section: Making a Model(Built-in)</a>
* <b>layers.Dense(output_features, name = 'None'...)</b> => Creates your favorite classic layer with output_features neurons at the end.  
NOTE: There is a ton more arguments you can pass here visit $ \hspace{2mm} $
<a href='https://keras.io/api/layers/regularizers/'>regularization</a> $\hspace{2mm}$ 
<a href = 'https://keras.io/api/layers/activations/'>activation functions</a> $ \hspace{2mm} $
<a href = 'https://keras.io/api/layers/initializers/'>initializers</a>
* <b>l.activation</b> => Returns the activation function used by the specific layer
* <b>l.trainable_weights</b> => Returns a list of all the variables that gradient descent can be applied on
* <b>l.non_trainable_weights</b> => Returns a list of all the variables not being trained
* <b>l.weights</b> => Returns the concatenation of both of the lists above
* <b>l.name</b> => Returns name of layer
* <b>l.output</b> => Returns output of that layer
* <b>tf.random.normal(shape, name = None)</b> => creates an array from a normal distribution with shape shape and name name
* <b>pass a list of layers/models to the Sequential constructor </b>
* <b>smodel.add(l)</b> => Appends l as a layer to the end
* <b>smodel.pop()</b> => Pops layer at the end
* <b>smodel.layers</b> => Returns list of current layers 
* <b>layers.Input(input_features, name = 'None'...)</b> => tells the model the number of features in the input
* <b>layers.Dense(..., input_shape, ...)</b> => passing in a tuple with the input shape to the first layer in a model automatically builds it as well
* <b>smodel.summary()</b> => Prints a summary of what the model looks like. Useful in debugging when used with <b>add()</b>. Note that the None dimension simply means that dimension can have any shape. For us this is because batch sizes can be variable
* <b>keras.Model(inputs, outputs, name = None)</b> => creates a model using the functions used to get from input to output
* <b>model.summary()</b> => Prints out what layers and output shapes the model has

#### <a href = '#Making a Model (Subclassing)'>Section: Making a Model (Subclassing)</a>
* <b>tf.keras.layers.Layer</b> => Class to deal with layers
* <b>\_\_init__()</b> => Method to initialize object. Should be implemented 
* <b>build()</b> => Method to initialize variables when first input is passed. Should be implemented 
* <b>call()</b> => Method to transform input to output of layer. Should be implemented
* <b>self.add_weights(shape, initializer=None)</b> => Add trainable variable with specific shape and initializer provided.
* <b>tf.random_normal_initializer()</b> => Initializer object. Many other exist, also possible to use string identifiers
* <b>self.add_loss(value)</b> => Makes specific layer also have this loss. 
* <b>layer.losses</b> => Returns the loss that layer has
* <b>self.add_metric(value, name)</b> => Makes specific layer track this metric. Name is useful to know which number is from which metric. 
* <b>metric.result()</b> => Returns the value it stored  

<b>model.layers</b>

#### <a href = '#Training and Evaluation (Built-in)'>Section: Training and Evaluation (Built-in)</a>

* <b>model.compile(optimzer, loss, metrics = None)</b> => Makes the model use specified functions when training with built-in methods
* <b>model.compile(loss, metrics, loss_weights)</b> => instead of passing one object pass in a list for loss and metrics. If you gave the output layers a name you could also pass in a dict and have the keys be the names. If you don't want to train on one output layer you can also make its corresponding loss None. loss_weights can be a list or a dictionary containing the real valued weights.

<b>tf.keras.losses.Loss</b>
* <b>\_\_init(self)__</b> => accept parameters to use during call. Remember to call super
* <b>call(self, y_true, y_pred)</b> => Use the targets and predictions to compute the model's loss

<b>tf.keras.metrics.Metric</b> 
1. <b>\_\_init(self)</b> => Create state variables that you will need to track for your metric. You might need to use 
    * self.add_weight()
2. <b>update_state(self, y_true, y_pred, sample_weight=None)</b> => Use the targets, predicitions, and optional weights to update the state variables keeping track of the metric
3. <b>result(self)</b> => Use the state variables to return the final result
4. <b>reset_states(self)</b> => Reinitialize the state of the metric

1. Optimizers: <b>SGD()</b> (with or without momentum), <b>RMSprop()</b>, <b>Adam()</b>
2. Losses: <b>MeanSquaredError()</b> - regression, <b>CosineSimilarity()</b> - regression, <b>KLDivergence()</b> - probabilistic, <b>CategoricalCrossentropy</b> - probabilistic
3. Metrics: <b>BinaryAccuracy()</b> - 2 targets, <b>CategoricalAccruacy()</b> - one-hot encoded targets, <b>SparseCategorialAccuracy()</b> - integer targets  

<b>np.array</b> 
* <b>model.fit(x_train, y_train, batch_size = None, validation_split = 0.0, epochs = 1, validation_data = None)</b> => Trains the model for specified epochs. If validation_split is given it takes away that much data to reserve for validation. If validation_data (a tuple containing x_train and y_train) is given it keeps that for validation. Note that validation_split should only be used with numpy data 

<b>tf.data.Dataset</b> 
* <b>tf.data.Dataset.from_tensor_slices((x_train, y_train))</b> => Take an aeeay 
* <b>dataset.shuffle(buffer_size)</b> => Creates a buffer with the specified size. When getting data it chooses a random element in the buffer and then replaces it with the next element in the dataset. (So buffer_size of 1 actually gets no shuffling). This returns a ShuffleDataset Object
* <b>dataset.batch(batch_size)</b> => Returns BatchDataset. This contains data but in batches with size that you specify
* <b>model.fit(dataset, epochs=1, steps_per_epoch=None, validation_data=None, validation_steps=None)</b> => Trains the model on the dataset for specified number of epochs. If you don't want to use the entire dataset per epoch you can use steps_per_epoch to specify how many batches you want to see per epoch. You can also pass in validation data and do the same thing with steps with the last two arguments

<b>keras.callbacks.Callback</b>
* <b>keras.callbacks.EarlyStopping(monitor="val_loss", min_delta=1e-2, patience=2, verbose=1)</b> => Implements Early stopping into your model. monitor is a string that refers to the name of what you are monitoring. min_delta is the threshold below which it doesn't count as improving. patience is how many epochs you want to give it a chance. verbose decides how much detail is in the updates it provides. 
* <b>keras.callbacks.ModelCheckpoint(filepath="mymodel_{epoch}", save_best_only=True, monitor="val_loss", verbose=1, save_freq=None)</b> => Saves the model. filepath is where you save the model. save_best_only and monitor make it so that the current checkpoint gets overwritten only if whatever you are monitoring has improved. verbose once again decides how much it displays. save_freq decides how many batches pass before it saves. Check out the <a href='https://www.tensorflow.org/guide/keras/train_and_evaluate#checkpointing_models'>DOCS</a> to see it in action being more organized (having its own directory, etc.)

<b>Other</b>
* <b>model.fit(..., class_weights=None, ...)</b> => Weights the data during training based on class_weights. This argument should be a dictionary with keys being the indexes of the class (0-n-1) and the values being the weight of the corresponding class.
* <b>model.fit(..., sample_weight=None, ...)</b> => Gives each specific sample weightage based on sample_weight. This argument should be a array of numbers that has the number of elements in your dataset in its size. Each index should have the corresponding weight of the sample. 
* <b>keras.optimizers.schedules.ExponentialDecay(initial_learning_rate, decay_steps, decay_rate, staircase=False)</b> => Creates a learning rate schedule object which calculates the rate with the following formula: inital_lr * drate ^ (steps/dsteps). Making staircase True makes that exponent use integer division. Think about what the formula means.

NOTE TENSORBOARD DOESN'T CURRENTLY WORK ON THIS MACHINE

#### <a href = '#Customizing Training'>Section: Customizing Training (Built-in)</a>

* <b>train_step(self, data)</b> => A function that gets run in fit to train your model. This function should use the data passed into it to train the model and updates its variables (and metrics). At the very end it should return a dictionary with keys being the metric's names and values being the result. 
* <b>self.compiled_loss(target, pred, regularization_losses = None, sample_weight = None)</b> => Calculates the loss using the loss function passed in compile. Regularization losses are extra losses that get append like the losses from each layer and sample_weights are the importantance for each sample 
* <b>self.losses</b> => Returns a list of all the layer losses
* <b>self.trainable_variables</b> => Returns a list containing all the variables that are trainable
* <b>self.optimizer.apply_gradients(g_v_tuples)</b> => Using the optimizer passed in compile updates variables based on input. Input should be a list containing tuples of gradient - variable pairs. It is common to zip() tape.gradient() and trainable variables to accomplish this.  
* <b>self.compiled_metrics</b> => An object containing all the gradients passed in compile. Useful to call update_state() and result() from it during the train_step.
* <b>metric.update_state(y_true, y_pred)</b> => Call after every batch is done
* <b>metric.result()</b> => Get the results to display the metric when you want to (typically at the end of epoch)
* <b>metric.reset_states()</b> => Clear the data the metric has stored (typically also at the end of epoch)
* <b>@tf.function()</b> => traces a graph of the whatever is inside, useful for speeding up computation greatly. From a simple test the code below was about 3x slower without tf.function(). An amazing speedup!