In [63]:
import tensorflow as tf
from tensorflow import keras

In [64]:
import numpy as np
import pandas as pd
import scipy as sp

***
***
# Quick Overview of TensorFlow

High level summary:
- Highly efficient computationaly library (like NumPy)
- Can optimize via JIT (just-in-time) compiler by creating a "computation graph" 
- Export in one environment (Python in Linux) and run it in another (Java on Android)
- Lowest level is in hyperefficient C++ code
- Distribute computations across multiple devices and servers

Some important features/packages:
- AI front-end... tf.keras
- Data loading and pre-processing... tf.data, tf.io
- Image processing... tf.image
- Signal processing... tf.signal




## Using TensorFlow like Numpy

**Tensor**: Multidimensional array (like [ndarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html)). Could also hold a scalar.

***
## Tensors and Operations

We can create **constant** tensors:

In [3]:
tfmatrixexample = tf.constant([[1.0, 2.0, 3.0],[3.0, 4.0, 5.0]])
print(tfmatrixexample)

tf.Tensor(
[[1. 2. 3.]
 [3. 4. 5.]], shape=(2, 3), dtype=float32)


In [4]:
tfscalarexample = tf.constant([42])
print(tfscalarexample)

tf.Tensor([42], shape=(1,), dtype=int32)


We can **index** tensors like we can with NumPy arrays:

In [5]:
tfmatrixexample[:, 1:]

<tf.Tensor: id=5, shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [4., 5.]], dtype=float32)>

In [6]:
tfmatrixexample[-1, 0]

<tf.Tensor: id=9, shape=(), dtype=float32, numpy=3.0>

In [7]:
tfmatrixexample[..., 1, tf.newaxis]

<tf.Tensor: id=13, shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [4.]], dtype=float32)>

We can perform **tensor operations** with some simple calls:

In [8]:
tfmatrixexample + 10.0

<tf.Tensor: id=15, shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [13., 14., 15.]], dtype=float32)>

In [9]:
tf.square(tfmatrixexample)

<tf.Tensor: id=16, shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [ 9., 16., 25.]], dtype=float32)>

We can use the **@** symbol for matrix multiplication:

In [10]:
tfmatrixexample @ tf.transpose(tfmatrixexample)

<tf.Tensor: id=19, shape=(2, 2), dtype=float32, numpy=
array([[14., 26.],
       [26., 50.]], dtype=float32)>

## TensorFlow and NumPy Interchange

TensorFlow uses 32-bit precision (to save on computation efficiency), since often that's precise enough. NumPy uses 64-bit precision. So, when playing the two together, set the numpy precision:

> dtype=float32

Some interchangable operations are seen below:

In [11]:
a = np.array([2., 4., 5.])
tf.constant(a)

<tf.Tensor: id=20, shape=(3,), dtype=float64, numpy=array([2., 4., 5.])>

In [12]:
tfmatrixexample.numpy()

array([[1., 2., 3.],
       [3., 4., 5.]], dtype=float32)

In [13]:
tf.square(a)

<tf.Tensor: id=22, shape=(3,), dtype=float64, numpy=array([ 4., 16., 25.])>

In [14]:
np.square(tfmatrixexample)

array([[ 1.,  4.,  9.],
       [ 9., 16., 25.]], dtype=float32)

In [15]:
a.dtype = np.dtype(np.float32)
tf.square(a)

<tf.Tensor: id=24, shape=(6,), dtype=float32, numpy=
array([0.       , 4.       , 0.       , 5.0625   , 0.       , 5.3476562],
      dtype=float32)>

## Type Conversions

Computationally expensive. Avoid it. They're done automatically in NumPy, but NOT IN TensorFlow. See the error below:

In [16]:
tf.constant(2.0) + tf.constant(40)

InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a int32 tensor [Op:AddV2] name: add/

## TensorFlow Variables

tf.**constant**: Immutable, so not good for a weight structure.

tf.**Variable**: Mutable.

In [None]:
tfvar_example = tf.Variable([[1., 2., 3.], [3., 4., 5.]])
print(tfvar_example)

Update the values or slices in place with the **.assign()** method. 

In [17]:
tfvar_example.assign(2 * tfvar_example)

NameError: name 'tfvar_example' is not defined

In [18]:
tfvar_example[0, 1].assign(42)

NameError: name 'tfvar_example' is not defined

In [19]:
tfvar_example[:, 2].assign([0., 1.])

NameError: name 'tfvar_example' is not defined

In [20]:
tfvar_example.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])

NameError: name 'tfvar_example' is not defined

### Other Data Structures

| Data Structure | TensorFlow | Description |
| --- | --- | --- |
| Spare Tensor | tf.SpareTensor | Tensor that contains mostly zeroes. |
| Tensor Array | tf.TensorArray | List of Tensors of the same shape and data type. Length can be made dynamic. |
| Ragged Tensor | tf.RaggedTensor | Static list of lists of Tensors of same shape and data types. |
| String Tensor | tf.string | Byte strings (encoded in utf8). Can convert to Unicode. |
| Sets | tf.sets package |  |
| Queues | tf.queue package | Implementing FIFO, Random, Priority, etc. queues of Tensors. |



***
***
# Customizing Models and Training Algorithms

Often, can be defined as a...

- function (def func():)
- class (class MyFunc(): )

The rule of thumb is that when a hyperparameter or aspect of the class needs to be saved to the model, the class implementation is probably the way to go. Plus, it looks fancier.

***
## Custom Implementations - Loss Functions

Pretending the Huber Loss function is not available in...

> keras.losses.Huber

... we can create a function that takes labels and predictions as arguments and uses Tensors to compute the Huber loss function. Keep a few things in mind:

- Use TensorFlow operations to take advantage of graph features
- Vectorize inputs when you can
- Preferable to return Tensor w/ one loss per instance

Implementation below:

In [21]:
def huber_fn(y_true, y_pred):
    """
    """
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss  = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

An example on how to call this new loss function is seen with the line below...

> model.compile(loss=huber_fn, optimizer="nadam")



***
## Saving Models that Contain Custom Components

- When saving the model, the model will save the _name_ of the custom function. 
- When loading the model, provide a dict that maps the function name to the function definition.

An example is seen below:

> model = keras.models.load_model("my_model_with_a_custom_loss.h5",
>                                 custom_objects={"huber_fn": huber_fn})

### Passing Hyperparameters to the Custom Function

#### Via Nested Methods

If we want to pass something to this loss function, it is a good idea to encapsulate that loss function definition in an initializing function. For example, if we wanted to change the threshold of the Huber function, we can do so like this:

In [22]:
def create_huber(threshold=1.0):
    """Huber loss function initializer.
    """
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss  = threshold * tf.abs(error) - threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

Then it can be called when fitting to data as...

> model.compile(loss=create_huber(threshold=2.0), optimizer="nadam")

When loading this sort of function, the hyperparameter is not going to be saved, so it has to be passed again. Also, what Keras saves is the nested function inside the "constructor", which in the case above is the "huber_fn". So, loading a model that was built with the "create_huber" function as the loss function would be done as...

> model = keras.models.load_model("my_model_with_a_custom_loss_threshold_2.h5",
>                                 custom_objects={"huber_fn": create_huber(2.0)})

#### Via Class Definition

Keras API supports custom class definition for... 

- layers
- models
- callbacks
- regularizers

Other components such as...

- losses
- metrics
- initializers
- constraints

may not port to other Keras impleentations. But, it can still be useful for your targeted tasks. 

The reason it can still be useful is it can solve the problem of saving the configuration (including the hyperparameters you use) by creating a custom "keras.losses.Loss" class:

In [23]:
class HuberLoss(keras.losses.Loss):
    """Custom HuberLoss class.
    """
    def __init__(self, threshold=1.0, **kwargs):
        self.threshold = threshold
        super().__init__(**kwargs)
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < self.threshold
        squared_loss = tf.square(error) / 2
        linear_loss  = self.threshold * tf.abs(error) - self.threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

- __init__:
 - Acceps "**kwargs" to pass to parent constructor (Keras.losses.loss.__init__(**kwargs)). Includes all the args [here](https://www.tensorflow.org/api_docs/python/tf/keras/losses/Loss)
 
- call:
 - Takes labels, predictions, computes instance losses, returns them
 
- get_config:
 - Maps hyperparameter name to value and can add whatever other hyperparameter your custom function uses, which in this case is "threshold".
 
 Use the above class definition as...
 
 > model.compile(loss=HuberLoss(2.), optimizer="nadam")
 
 Load the model that uses the above class as...
 
 > model = keras.models.load_model("my_model_with_a_custom_loss_class.h5",
 >                                 custom_objects={"HuberLoss": HuberLoss})
 
 Saving the model will automatically call the loss instance's "get_config()" method and save the configuration as JSON in HDF5 file. Loading the model grabs that info and passes it to the class constructor (when the class is defined during the load, I am pretty sure). 

***
## Custom Components - Activation Functions, Initializers, Regularizers, and Constraints

Examples seen below:

In [24]:
def my_softplus(z): # return value is just tf.nn.softplus(z)
    return tf.math.log(tf.exp(z) + 1.0)

def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

def my_positive_weights(weights): # return value is just tf.nn.relu(weights)
    return tf.where(weights < 0., tf.zeros_like(weights), weights)

The arguments clearly depend on the type of function. See what arguments are necessary for which custom function.

Use them as...:

In [25]:
layer = keras.layers.Dense(30, activation=my_softplus,
                           kernel_initializer=my_glorot_initializer,
                           kernel_regularizer=my_l1_regularizer,
                           kernel_constraint=my_positive_weights)

Some notes...
- Activation function ("my_softplus()") applied to output of the Dense layer; result is passed to the next layer
- Weights initialized with the value returned by the initializer ("my_glorot_initializer()")
- @ each training step, weights are passed to the regularizer ("my_l1_regularizer()") and the output is added to the loss function
- Contraint function ("my_positive_weights()") is called after each training step to adjust weights falling outside of the constraint

To create custom classes, have your class inherit...
- Initializer: "keras.initializers.Initializer"
- Regularizer: "keras.regularizers.Regularizer"
- Constraint: "keras.constraints.Constraint"
- Arbitrary Layer (including activation functions): "keras.layers.Layer"

Again, classes can save and retrieve hyperparameters when a model is saved. An example is seen below:

In [26]:
class MyL1Regularizer(keras.regularizers.Regularizer):
    """
    """
    def __init__(self, factor):
        self.factor = factor
    def __call__(self, weights):
        return tf.reduce_sum(tf.abs(self.factor * weights))
    def get_config(self):
        return {"factor": self.factor}

Note...

- Use "\_\_call__():" for...
 - initializers
 - regularizers
 - constraints
 
- Use "call():" for..
 - losses
 - arbitrary layers (including activation functions)
 - models

***
## Custom Components - Metrics

Losses and metrics are not the same. Losses need to be differentiable so that the model can be trained. Metrics have fewer constraints (for instance, don't have to be differentiable for Gradient Descent, like losses do) and are used to evaluate the final product. 

That said, creation is a similar process. Could even use the Huber Loss function we defined above as a metric:

> model.compile(loss="mse", optimizer="nadam", metrics=[create_huber(2.0)])

Some notes on this:
- Keeps track of its mean since beginning of the epoch, which is good but not always what we need.
- If we want to keep track of the number of true positives and false positives, however, other than just a total running average of the precision, we can use...


> keras.metrics.Precision()



In [27]:
# Example of using the Precision class
precision = keras.metrics.Precision()
precision([0, 1, 1, 1, 0, 1, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1])

<tf.Tensor: id=93, shape=(), dtype=float32, numpy=0.8>

In [28]:
precision([0, 1, 0, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 0, 0, 0])

<tf.Tensor: id=141, shape=(), dtype=float32, numpy=0.5>

Example of the above is of a _streaming metric_, where the precision (numpy=__ field in the output) is updated batch after batch, not just averaging the precisions of batches together. It's a subtle difference.

We can see the result of the current metric, get its stored variables, or reset its state:

In [29]:
precision.result()

<tf.Tensor: id=151, shape=(), dtype=float32, numpy=0.5>

In [30]:
precision.variables

[<tf.Variable 'true_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>,
 <tf.Variable 'false_positives:0' shape=(1,) dtype=float32, numpy=array([4.], dtype=float32)>]

In [31]:
precision.reset_states()

In [32]:
precision.result()

<tf.Tensor: id=167, shape=(), dtype=float32, numpy=0.0>

In [33]:
precision.variables

[<tf.Variable 'true_positives:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>,
 <tf.Variable 'false_positives:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]

For a custom _streaming metric_, inheret the Keras class...

> keras.metrics.Metric 

An example for the Huber Loss can be seen below:

In [34]:
class HuberMetric(keras.metrics.Metric):
    """
    """
    def __init__(self, threshold=1.0, **kwargs):
        super().__init__(**kwargs) # handles base args (e.g., dtype)
        self.threshold = threshold
        self.huber_fn = create_huber(threshold)
        self.total = self.add_weight("total", initializer="zeros")
        self.count = self.add_weight("count", initializer="zeros")
    def update_state(self, y_true, y_pred, sample_weight=None):
        metric = self.huber_fn(y_true, y_pred)
        self.total.assign_add(tf.reduce_sum(metric))
        self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))
    def result(self):
        return self.total / self.count
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "threshold": self.threshold}

Walkthrough:
- self.add_weight() - used to keep track of any variables (sum of all Huber losses in self.total, and number of instances seen so far in self.count). That's why it's in the \_\_init_\_ constructor.
- update_state(*args) -  method that is called when instance of the class is called as a function (did this with the Precision() class)
- result() - computes and returns final result (but update_state() is called first when the metric is called as a function)
- get_config() - method to save any hyperparameters (in this case the threshold as a member of the JSON dict, which again, gets passed into the constructor when it is loaded in)
- reset_states() - inherited in the example above, but defaults to setting all instance vars to 0.

Any metric that can't be averaged over a batch, a class implementation is best (or required). 

*** 
## Custom Components - Layers

Will need this if...
- using an architecture with a very contemporary and exotic layer
- for convenience with repetative architectures (A B C A B C A B C A B C to D D D D where $D = A, B, C$)

### Layers with No Weights

Examples we've seen:
- keras.layers.Flatten
- keras.layers.ReLU

Solution:
- Write a function.
- Wrap it in keras.layers.Lambda

In [35]:
# Example of using the Keras Lambda layer for layers 
# without weights.
exponential_layer = keras.layers.Lambda(lambda x: tf.exp(x))

Can use these layers as activation functions (exponential used often if regression models are expected to have logarithmic outputs, maybe if something were to try and predict an RF ouptut power, or any other sorts of signals measured in logarithmic units like dB)



### Layers with Weights

Create a subclass of...

> keras.layers.Layer

A simplified version of the Dense layer is seen below:


In [36]:
class MyDense(keras.layers.Layer):
    """Simplified version of Keras Dense layer.
    """
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)

    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name="kernel", shape=[batch_input_shape[-1], self.units],
            initializer="glorot_normal")
        self.bias = self.add_weight(
            name="bias", shape=[self.units], initializer="zeros")
        super().build(batch_input_shape) # must be at the end

    def call(self, X):
        return self.activation(X @ self.kernel + self.bias)

    def compute_output_shape(self, batch_input_shape):
        return tf.TensorShape(batch_input_shape.as_list()[:-1] + [self.units])

    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units,
                "activation": keras.activations.serialize(self.activation)}

Walkthrough:
- **\_\_init__**: takes hyperparameters (example above, units and activation) as inputs along with the special kwargs dict, which takes care of any other things you can pass to the parent class, which is the keras.layers.Layer object.
- **build**: Called when the layer is first used. It is used to create layer's variables, which includes a weight for each needed weight. Keras passes the shape of the layer inputs to the build method to know how many weights are needed (connection weights matrix known as the "kernel"). Must call the parent's build method to let Keras know the layer has been build.
- **call**: performs the different operations needed for that layer. In this case, it's the activation function. The layer could do other things, like 2D convolution, or filtering, etc.
- **compute_output_shape**: returns shape of layer outputs. 
- **get_config**: used just like other layers to see the parameters of the layer

Multiple inputs to a layer (like Concatenate) require special attention:
- **call** and **compute_output_shape** input arguments requires a tuple containing all inputs
- Recall that this must be used with Functional or Subclassing Keras API (Sequential takes one input and one output).

Example of this can be seen below:

In [37]:
class MyMultiLayer(keras.layers.Layer):
    def call(self, X):
        X1, X2 = X
        return [X1 + X2, X1 * X2, X1 / X2]

    def compute_output_shape(self, batch_input_shape):
        b1, b2 = batch_input_shape
        return [b1, b1, b1] # should probably handle broadcasting rules

For different behavior between training and testing (like with some RNNs or with Dropout/Batch Normalization):
- Add training argument to **call** method. 
- Example is a layer to add Gaussian noise during training but not during testing

In [38]:
class MyGaussianNoise(keras.layers.Layer):
    def __init__(self, stddev, **kwargs):
        super().__init__(**kwargs)
        self.stddev = stddev

    def call(self, X, training=None):
        if training:
            noise = tf.random.normal(tf.shape(X), stddev=self.stddev)
            return X + noise
        else:
            return X

    def compute_output_shape(self, batch_input_shape):
        return batch_input_shape

*** 
## Custom Components - Models

### Subclassing API

Just like above, we will create our own class that inherits the keras.Model parent class.

![Custom Model](custom_model_tensorflow.PNG)

We'll give an example of building a small block of a model and then connecting the entire thing together. The entire model has a few small blocks, residual blocks, where the data will travel through a layer three times before heading to the next one. 

In [39]:
class ResidualBlock(keras.layers.Layer):
    def __init__(self, n_layers, n_neurons, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(n_neurons, activation="elu",
                                          kernel_initializer="he_normal")
                       for _ in range(n_layers)]

    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        return inputs + Z

In [40]:
class ResidualRegressor(keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden1 = keras.layers.Dense(30, activation="elu",
                                          kernel_initializer="he_normal")
        self.block1 = ResidualBlock(2, 30)
        self.block2 = ResidualBlock(2, 30)
        self.out = keras.layers.Dense(output_dim)

    def call(self, inputs):
        Z = self.hidden1(inputs)
        for _ in range(1 + 3):
            Z = self.block1(Z)
        Z = self.block2(Z)
        return self.out(Z)

For the **ResidualBlock** class, Keras checks for the self.hidden attribute. 


Using the **Subclassing API**, we can define the model itself. The constructor makes the layers themselves. The layers are then used in the call method. The model is using Z as the output/input during the call method. It is being fed through the first ResidualBlock 3 times before it is moved on to the second. Must implement **get_config** in both ResidualBlock and the ResidualRegressor if we want to use the keras.models.save() or use the keras.models.load_model() functions. 



### Losses and Metrics Based on Model Internals

Some losses are not going to be dependent on just the predictions or the outputs. Some losses need to be defined by the weights or the activations of hidden layers.

These types of losses can be created using the inherited self.add_loss() method. 

Example below creates a model with..
- 5 hidden layers
- 1 regular output layer and 1 aux output layer
- Loss will be associated with aux output layer 
 - **reconstruction loss**, used in generative modeling
 - Mean Square Difference b/w reconstruction and inputs 
 - Encourages preserving info through the layers



In [41]:
class ReconstructingRegressor(keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",
                                          kernel_initializer="lecun_normal")
                       for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)

    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        super().build(batch_input_shape)

    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        self.add_loss(0.05 * recon_loss)
        return self.out(Z)

Walkthrough:
- **__init__**: Creates 5 hidden layers, 30 neurons each, with 1 output layer
- **build**: Creates extra dense layer to reconstruct inputs. Created with build so that it can match the output to the number of inputs, which is unknown until build is called.
- **call**: 
 - Process input through all 5 hidden layers and passes data through the reconstruction layer.
 - Computes reconstruction loss and adds it to model's list of losses with add_loss() method, but scaled down by a tunable hyperparameter
 - Passed output of hidden layers to output layer and return the output
 
 

For custom metrics:
- Create an object that inherits from or is an instance of a keras.metrics object in the **__init__** method. 
- Call that metric in the **call** method, passing it whatever aspects of the model is needed to compute said metric.
- Add it to the model with the inherited self.add_metric() method such that Keras will display said metric during training.

### Computing Gradients using Autodiff

Consider the equation below:

$$ f(w_1 , w_2) = 3w_{1}^{2} + 2w_1 w_2 $$

We can analytically perform these partial derivatives quickly:

$$ \frac{\partial f}{\partial w_1} = 6w_1 + 2w_2 $$

$$  \frac{\partial f}{\partial w_1} = 3w_{1}^{2} + 2w_1 $$

And then we can immediately evaluate these partial derivatives given the certain parameters $w_1$ or $w_2$. But neural networks span a much larger and much more complex parameter space. 

To better navigate this space, TensorFlow uses **autodiff**. 

In [42]:
def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

In [43]:
w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

gradients = tape.gradient(z, [w1, w2])

In [44]:
gradients

[<tf.Tensor: id=209, shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: id=201, shape=(), dtype=float32, numpy=10.0>]

Calling gradients more than once results in a runtime error. If it has to be done, make an instance of the tf.GradientTape object, then delete it:

In [45]:
with tf.GradientTape(persistent=True) as tape:
    z = f(w1, w2)

dz_dw1 = tape.gradient(z, w1) # => tensor 36.0
dz_dw2 = tape.gradient(z, w2) # => tensor 10.0, works fine now!
del tape

To have the gradients watch any sort of parameter outside of just the variables, use the code below, which involves explicit watch commands.

In [46]:
c1, c2 = tf.constant(5.), tf.constant(3.)
with tf.GradientTape() as tape:
    tape.watch(c1)
    tape.watch(c2)
    z = f(c1, c2)

gradients = tape.gradient(z, [c1, c2]) # returns [tensor 36., tensor 10.]

Useful in the case where you want to penalize something like the activations that vary significantly given an input. 

Important notes:
- Typically, just taking the gradient of one value w/ repsect to other parameters
 - Grad of loss with respect to model parameters
- Reverse-mode autodiff
 - Gets all gradients w/ a forward and reverse pass
- Can compute the **Jacobian** (elementwise gradients of the loss of a vector rather than the gradient of the sum) and can compute the **Hessians** or second derivatives, if needed.

Can also not let a gradient through during backpropagation using tf.stop_gradient() (it does not perform any gradient on the input on the forward pass and treats the input like a constant on the backwards pass).

In [47]:
def f(w1, w2):
    return 3 * w1 ** 2 + tf.stop_gradient(2 * w1 * w2)

with tf.GradientTape() as tape:
    z = f(w1, w2) # same result as without stop_gradient()

gradients = tape.gradient(z, [w1, w2]) # => returns [tensor 30., None]

If you ever get into an issue where the gradients are returning NaNs due to precision errors or the like, can define custom gradients. We do this for a softplus function below as an example. The custom output returns both the regular function and the custom function that computes its gradient. 

In [48]:
@tf.custom_gradient
def my_better_softplus(z):
    exp = tf.exp(z)
    def my_softplus_gradients(grad):
        return grad / (1 + 1 / exp)
    return tf.math.log(exp + 1), my_softplus_gradients

### Custom Training Loops

Defer to the Keras fit() method, as custom training loops make for harder code maintenance. However, for a few different papers (say, a network whose paths use two different optimizers), a custom training loop is needed.

Below is an example network we will use to build the custom loop around.


In [49]:
l2_reg = keras.regularizers.l2(0.05)
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="elu", kernel_initializer="he_normal",
                       kernel_regularizer=l2_reg),
    keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

We can define custom functions to...
- randomly sample a batch of instances from the training data (though TensorFlow Data API has a function that will do the same thing)
- display the training status with various custom metrics

In [50]:
def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

In [51]:
def print_status_bar(iteration, total, loss, metrics=None):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
                         for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{}/{} - ".format(iteration, total) + metrics,
          end=end)

Using these custom functions, we can define the custom training loop. 



In [53]:
mnist = keras.datasets.mnist.load_data()

X_train, y_train = mnist[0]
X_test, y_test = mnist[1]

In [54]:
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr=0.01)
loss_fn = keras.losses.mean_squared_error
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.MeanAbsoluteError()]

In [56]:
# Scaling for Grad Descent and splitting into validation set
X_train_scaled = X_train / 255.0

In [57]:
for epoch in range(1, n_epochs + 1):
    print("Epoch {}/{}".format(epoch, n_epochs))
    for step in range(1, n_steps + 1):
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape() as tape:
            y_pred = model(X_batch, training=True)
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            loss = tf.add_n([main_loss] + model.losses)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        mean_loss(loss)
        for metric in metrics:
            metric(y_batch, y_pred)
        print_status_bar(step * batch_size, len(y_train), mean_loss, metrics)
    print_status_bar(len(y_train), len(y_train), mean_loss, metrics)
    for metric in [mean_loss] + metrics:
        metric.reset_states()

Epoch 1/5


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

60000/60000 - mean: 8.7057 - mean_absolute_error: 2.5428
60000/60000 - mean: 8.7057 - mean_absolute_error: 2.5428
Epoch 2/5
60000/60000 - mean: 8.4753 - mean_absolute_error: 2.5338
60000/60000 - mean: 8.4753 - mean_absolute_error: 2.5338
Epoch 3/5
60000/60000 - mean: 8.4229 - mean_absolute_error: 2.5312
60000/60000 - mean: 8.4229 - mean_absolute_error: 2.5312
Epoch 4/5
60000/60000 - mean: 8.3815 - mean_absolute_error: 2.5265
60000/60000 - mean: 8.3815 - mean_absolute_error: 2.5265
Epoch 5/5
60000/60000 - mean: 8.4046 - mean_absolute_error: 2.5292
60000/60000 - mean: 8.4046 - mean_absolute_error: 2.5292


Walkthrough:
- 1 loop for epochs, 1 loop for batches within the epochs
- Sample a random batch from the training set
- tf.GradientTape() block makes a prediction for one batch, computes the loss over the entire batch using reduce_mean(), which can apply different weights for each instance as opposed to each batch, and the regularization loss gets summed with this loss.
- Compute gradient of loss with respect to each trainable variable
- Update mean loss, the metrics, and display status bar
- Update status bar at end of each epoch

To add constraints to the model weights, add a conditional loop just after the apply_weights() function:

In [58]:
for variable in model.variables:
    if variable.constraint is not None:
        variable.assign(variable.constraint(variable))

To handle training and testing differently, like we do during BatchNormalization or Dropout, call the model with training=True and propagate this to each layer.



***
## TensorFlow Functions and Graphs

### Introduction

Starting with a simple function:

$$ f(x) = x^3 $$

In [59]:
def cube(x):
    return x ** 3

Can call this with either a Pythonic variable or a TensorFlow variable:

In [60]:
cube(3.0)

27.0

In [61]:
cube(tf.constant(3.0))

<tf.Tensor: id=2738037, shape=(), dtype=float32, numpy=27.0>

We can add a decorator to make this a TensorFlow function:

In [62]:
@tf.function
def tf_cube(x):
    return x ** 3

TF functions will simplify expressions where it can, and run the operations in parallel when it can. It will almost always run faster, so run these when you can.

>> Do NOT pass Pythonic variables to these TensorFlow functions unless they are not going to vary much or there are not many of them. A new graph is made for each instance call, so reserve these for stuff like hyperparameters

### Autograph and Tracing

TensorFlow graph creation process:

1. Analyze all control statements
 - for
 - while
 - if, elif, else
 - break
 - continue
 - return
    and call _AutoGraph_. 
    
2. Outputs an upgraded version of said function replacing those control statements with their TensorFlow operative equivalents (tf.while_loop(), tf.cond(), etc.).

![Example of TensorFlow Graph Creation](tensorflow_graph_creation.PNG)

3. Passes a "symbolic tensor" such that each TensorFlow operation will add a node to the graph to represent itself and its outputs. Nodes represent operations, and arrows represent Tensors. 

### TF Function Rules

- Regarding the @tf.function decorator and external libraries:
 - TensorFlow graphs can only use TensorFlow constructs, so use tf.reduce_sum() instead of np.sum(), tf.sort() instead of a built-in sorted(), etc.
 - If the TensorFlow function is updating other Pythonic elements, like a counter or something, those will only be updated when the graph is traced, not every time the graph is ran. So try to compartmentalize these TensorFlow functions.
- Can call other Python or TensorFlow functions (they need not have the @tf.function decorator), and their operations will be captured in the graph, but they require the same rules
- If a function is creating a TensorFlow variable or object, must do so only on very first call, or else it will throw an error. Do these creations outside of TensorFlow Functions (such as in the build method of a custom layer, and then use the assign method in the function).
- For loop iterations over the Python range() function will not be captured in the graph, but tf.range() will be.
- Take preference over vectorized implementations rather than iterative loops.

