# Chapter 12 - Custom Models and Training with TensorFlow


Keras is TF's high level API - and already got us really far. Good enough for 95% of users.


# Quick tour of TensorFlow

What does it offer:
- similar core to numpy, with GPU support
- supports distributed computing
- just-in-time to optimize computation for speed/memory usage - extracting the computation graph from a python function, then optimizing it
- can train a TF model in one environment and run in another
- implements autodiff - with great optimizers like RMSProp, nadam, ftrl, etc
- most import is tf.keras - but also data loading & preprocessing ops (tf.data, tf.io), image processing ops (tf.image), signal processing ops (tf.signal), and more.


<div>
    <img src="attachment:image.png" width=500 />
</div>


At low level, it uses highly efficient C++. Operations (ops) have implementations called kernels - kernel is dedicated to device types - CPUs, GPUs, or TPUs (Tensor Processing Units), etc. GPUs can speed up computation by splitting it into many smaller chunks and running in parallel across GPU threats. TPUs are even faster.

Can purchase own GPU devices, or TPUs from Google Cloud Machine Learning Engine.

Tensorflow Architecture:

<div>
    <img src="attachment:image-2.png" width=500 />
</div>


Can use on mobile or with Javascript.


Also more to it than library - centre of ecosystem of libraries. TensorBoard for visualization, TFX to productionize TensorFlow projects, TensorFlow Hub, etcetc. Always easiler to find existing code on GitHub.https://paperswithcode.com is good site.

In [1]:
%autosave 120

import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras
 
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

np.random.seed(24)

# Ignore useless warnings (see SciPy issue #5998)
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

Autosaving every 120 seconds


# Using TensorFlow - like numpy

API revolves around tensors - multidimensional array (like numpy ndarray), but can also hold scalar (a simple value like 24). 

## Operations

In [2]:
tf.constant([[1., 2., 3.], [4., 5., 6.]])

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [3]:
tf.constant(24)

<tf.Tensor: shape=(), dtype=int32, numpy=24>

In [5]:
t = tf.constant([[1., 2., 3.], [4., 5., 6.]])
t.shape, t.dtype

(TensorShape([2, 3]), tf.float32)

In [6]:
tf.square(t)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

In [7]:
t @ tf.transpose(t) # @ is for matrix multiplication - same as tf.matmul()

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

#### Keras low level API

Keras API has its own low-level API as well - in keras.backend. 

## Tensors and numpy

Can create tensor from numpy array, and vice versa - even apply TensorFlow operations to numpy arrays and numpy operation to tensors.

! - numpy uses 64 bit, but TF uses 32 bit - because 32 bit precision is generally enough for DNNs and uses a lot less RAM and faster. When creating tensor from numpy, use dtype = tf.float32

## Type Conversions

TF does not auto convert to avoid significantly hurting performance - it just raises exception if trying to operate on tensors with incompatible types. Can use tf.cast() to convert.


## Variables

Weights in DNN need to be tweaked by backpropagation, and other parameters may also change over time. So use tf.Variable:

In [10]:
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])

v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [None]:
# Updating

v.assign(2 * v)    # => [[2., 4., 6.], [8., 10., 12.]]
v[0, 1].assign(42) # => [[2., 42., 6.], [8., 10., 12.]]
v[:, 2].assign([0., 1.]) # => [[2., 42., 0.], [8., 10., 1.]]
v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.]) # => [[100., 42., 0.], [8., 10., 200.]]

# In practice - rarely create variables manually. Keras has the add_weight() takes care of it.
# model parameters will be updated directly by the optimizer, so don't have to do it manually.

## Other data structures

- sparce tensors - efficient tensors containing mostly 0
- tensor arrays - lists of tensors
- ragged tensors - static lists of lists of tensors
- string tensors - tensors of tf.string - represent byte strings and not unicode
- sets - regulor tensors containing one or more sets
- queues - First In, First Out queues - prioritize some items



# Customizing Models and Training Algorithms

## Custom loss function

If want to train a regression model, but training set is a bit noisy. For starters, clean up the dataset by removing/fixing the outliers but not enough.

Mean squared error might penalize large erors too much and make model imprecise, mean absolute takes too long. We can then use Huber loss - and here implmenting it from scratch.

In [16]:
def huber_fn(y_true, y_pred):
    
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)


# model.compile(loss=huber_fn, optimizer="nadam")
# model.fit(X_train, y_train, [...])

# Vectorized implementation like above is recommended. Also, use only TF operations if want to use its graph features.
# Also better to return tensor containing one loss per instance

## Saving/Loading models with custom components

Saving usually works. For loading, however, need to provide a dictionary mapping the function name to the actual function

model = keras.model.load_model("my_model_with_a_custom_loss.h5",
                               custom_objects={"huber_fn": huber_fn})
                               
We can also create a function which creates a configured loss function (to change threshold):

In [None]:
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < 1
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold ** 2/2 # Changes threshold
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

model.compile(loss=create_huber(2.0), optimizer="nadam")

# Threshold will not be saved, however, so need to specify threshold value when loading model

model = keras.models.load_model("my_model_with_a_custom_loss.h5",
                                custom_objects={"huber_fn": create_hubder(2.0)})

# Solve this by creating subclass of keras.losses.Loss, etcetcetcetcet
# Look into book for details

## Custom activation functions/initializers/regularizers/constraints

Most of the times just need to write simple function with appropirate inputs/outputs.

E.g. (equivalent to keras.activations.softplus or tf.nn.softplus)



In [27]:
def my_softplus(z): # returns tf.nn.softplus(z)
    return tf.math.log(tf.exp(z) + 1.0)

def my_glorot_initializer(shape, dtype=tf.float32): # equivalent to keras.initializers.glorot_normal
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

def my_l1_regularizer(weights): # equivalent to keras.regularizers.l1(0.01)
    return tf.reduce_sum(tf.abs(0.01 * weights))

def my_positive_weights(weights): # equivalent to keras.constraints.nonneg() or tf.nn.relu
    return tf.where(weights < 0, tf.zeros_like(weights), weights)
    # Where takes (condition, x, y) and if condition true multiplies x and y
    
layer = keras.layers.Dense(30, activation=my_softplus,
                           kernel_initializer=my_glorot_initializer,
                           kernel_regularizer=my_l1_regularizer,
                           kernel_constraint=my_positive_weights)

Quick reminder:

- activation is applied to output of this layer and passed on
- this layer's weights are intiizlied using the initializer
- at each trainin step weights are passed to the regularization function to compute regularization loss
- constraint is called after each training step - replacing layer weights with constrained weights


## Custom Metrics

Losses are used by Gradient Descent to train a model - so much be differentiable. Cross-entropy loss, for example, cannot be interpreted by humans.

Metrics are used to evaluate a model, so easily interpretable, non-differentiable and can have 0 gradient everywhere.


## Custom Layers

Might want exotic layer, or a layer containing multiple other layers to save the repetition.

exponential_layer = keras.layers.Lambda(lambda x: tf.exp(x)) - apply exponential function to inputs. Unweighted.

For a weighed layer, need to create subclass of keras.layers.Layer class

See book for overview.

Remember - sequential API only accepts single input/output layers, and multiple ones (concatenate) only work with functional and subclassing APIs. 


## Custom Models

Can build any models you want - even with loops and skip connections. 

Need ResidualBlock layer - containing 2 dense layers and an additional operation

In [None]:
class ResidualBlock(keras.layer)