In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import numpy as np
from tensorflow import keras

In [22]:
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split( housing.data, housing.target)
X_train, X_valid, y_train, y_valid = train_test_split( X_train_full, y_train_full)
scaler = StandardScaler()
X_train = scaler.fit_transform( X_train)
X_valid = scaler.transform( X_valid)
X_test = scaler.transform( X_test)

# Tensorflow:
Tensorflow is a library made by Google Brain researchers. Its purpose is to optimize large scale computations and to allow programs coded in one environment (e.g., java, python, Ruby, C++, etc.) to be ran in another. This is primarily done by Tensorflow's ability to extract a **computation graph** of the Python function being ran. The computation graph consists of nodes (tasks to be done) and edges (the tasks that are pursued as a consequence of which nodes are active). Unused nodes are then pruned from the computation graph, which speeds up the runtime. Furthermore, this computation graph can be implemented in other languages, and so allows the Essence of a program to be realized by other languages. Finally, independent paths of the computation graph can be ran in parallel, which further speeds up runtime. I visualize, initially, a graph with many unused nodes on the periphery, which then fade away (are pruned), and independent paths, which are traversed (computed) at the same time. Also, it optimizes for GPU operations.

# Using Tensorflow like Numpy:
Tensorflow has many operations that are very similar to numpy.

In [36]:
t = tf.constant([[1, 2, 3], [4, 5, 6]], dtype=tf.float32)
t

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [37]:
t.dtype, t.shape

(tf.float32, TensorShape([2, 3]))

In [38]:
tf.square(t)

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

In [39]:
tf.transpose(t)

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[1., 4.],
       [2., 5.],
       [3., 6.]], dtype=float32)>

In [40]:
t @ tf.transpose(t) == tf.matmul(t, tf.transpose(t))  # The @ operator is matrix multiplication

<tf.Tensor: shape=(2, 2), dtype=bool, numpy=
array([[ True,  True],
       [ True,  True]])>

In [41]:
(t + t)**2 == (2*t * 2*t)

<tf.Tensor: shape=(2, 3), dtype=bool, numpy=
array([[ True,  True,  True],
       [ True,  True,  True]])>

In [43]:
tf.reduce_sum(t), np.sum(t) == 21, 21

(<tf.Tensor: shape=(), dtype=float32, numpy=21.0>, True, 21)

In [47]:
tf.reduce_mean(t), np.mean(t) == 3.5, 3.5

(<tf.Tensor: shape=(), dtype=float32, numpy=3.5>, True, 3.5)

In [51]:
tf.reduce_max(t),  np.max(t)   # Notice that both find the max in the entire data structure

(<tf.Tensor: shape=(), dtype=float32, numpy=6.0>, 6.0)

**Type Conversions**: Changing the data type of an array is slow, and so tensorflow does not allow for operations inolving two tensors of diffferent datatypes. For example, if x and y are tensors such that x is float16 and y is float32 (or an int16, int8, int whatever, etc.), then you get an error for trying to do any operation on x and y. This speeds up performance. However, if you need to change the datatype of some tensor, then you can say, tf.cast(x, dtype).

In [13]:
# For example,
x = tf.constant([[5, 6], [2, 3.0]], dtype='float32')
y = tf.constant([[8, 9], [12, 10]], dtype='int8')
print(x @ tf.cast(y, 'float32'))

tf.Tensor(
[[112. 105.]
 [ 52.  48.]], shape=(2, 2), dtype=float32)


**Modifying Tensor contents**: With tensors (that is, tf.constant), we cannot modify them. However, if we make tf.Variable, then we get a tensor that can be modified.

In [17]:
x = tf.Variable([[1,2,3], [4,5,6]])
x

<tf.Variable 'Variable:0' shape=(2, 3) dtype=int32, numpy=
array([[1, 2, 3],
       [4, 5, 6]])>

In [21]:
x[:,1].assign([100, 100])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=int32, numpy=
array([[  1, 100,   3],
       [  4, 100,   6]])>

# Customizing Models and Training Algorithms:
Say you are training a regression MLP and your dataset, even after being cleaned, has many outliers. What we can do about this is choose a good loss function. MSE will be too sensitive too the outliers, which is bad. MAE will be less sensitive to the outliers, but will take a longer time to converge and yield a less precise model. So, a good solution would be somewhere inbetween MSE and MAE, which is what the Huber loss function does. The Huber loss function rewards predictions that give errors that are less than some pre-selected threshold.

**Notice** that when we implement the loss function, we use tf commands and use vectorized data structures. We do this because we therefore get performance enhancements from TensorFlow, such as being able to use the GPU support and the computation graph.

In [20]:
def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1 # This gives a vector of bools
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)
        # if error is small, we return squared loss. If not, return linear loss
        # This causes the error gradient to arise primarily from the parameters that are giving linear losses, since losses that
        # are less than threshold (here, 1) are made even smaller by being squared, and so thoses losses are relatively ignored
        # and so the linear errors are focused on. This intuitively suggests that the model will minimize loss by ignoring
        # outliers (since it's hopeless to get their loss to be squared).

In [21]:
a = tf.constant([1, 2, 3, 4, 5], dtype='float16')
b = tf.constant([1.2, 4, 2.8, 40, 12], dtype='float16')

print(huber_fn(a, b))

tf.Tensor([2.003e-02 1.500e+00 1.984e-02 3.550e+01 6.500e+00], shape=(5,), dtype=float16)


Now, we can use the huber loss function to train a network:

In [29]:
model = keras.Sequential([
    keras.layers.Dense(30, activation='elu', input_shape=(X_train.shape[1:]), kernel_initializer='he_normal'),
    keras.layers.Dense(100, activation='elu', kernel_initializer='he_normal'),
    keras.layers.Dense(100, activation='elu', kernel_initializer='he_normal'),
    keras.layers.Dense(100, activation='elu', kernel_initializer='he_normal'),
    keras.layers.Dense(1)
])

In [30]:
model.compile(loss=huber_fn, optimizer=keras.optimizers.RMSprop())

In [31]:
model.fit(X_train, y_train, validation_data=(X_valid, y_valid), callbacks=[keras.callbacks.EarlyStopping(patience=6)], epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100


<tensorflow.python.keras.callbacks.History at 0x254b5cba3a0>

In [33]:
preds = model.predict(X_test)

In [35]:
model.evaluate(X_test, y_test)



0.12845680117607117

**Custom Activation Functions, Initializers, Regularizers, and Constraints** are implemented in the same way as custom loss functions: We simply write the function and then plug it into the appropriate part of the model, such as when we wrote "loss=huber_fn".