# Tensorflow

Mac OS Monterey can now fully utilize GPUs in Tensorflow. See https://developer.apple.com/metal/tensorflow-plugin/

### Acknowledgments & Credits

This lesson is adapted largely from the excellent curriculum materials by Cliburn Chan (2021) at https://github.com/cliburn/bios-823-2021/ under the MIT License.

### References

- TensorFlow: https://www.tensorflow.org/
- **TensorFlow Guide**: https://www.tensorflow.org/guide
- TensorFlow Tutorials: https://www.tensorflow.org/tutorials


In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [None]:
import tensorflow as tf

In [None]:
tf.__version__

In [None]:
import keras
keras.__version__

## Working with tensors

Almost exactly like numpy arrays.

In [None]:
tf.constant([1., 2., 3.])

Variables are often used as weights in networks, as they need to be updated.

In [None]:
x = tf.Variable([[1.,2.,3.], [4.,5.,6.]])

In [None]:
x.shape

In [None]:
x.dtype

### Conversion to numpy

In [None]:
x.numpy()

### Indexing

In [None]:
x[:, :2]

### Assignment

_Only_ tensors created via `tf.Variable` are mutable. 

Tensors created otherwise, such as `tf.constant`, `tf.identity()`, or `tf.convert_to_tensor()` (so-called `EagerTensor`s) are immutable.

In [None]:
t1 = tf.convert_to_tensor([[1., 2.], [3., 4.]])
t2 = tf.constant([[5., 6.], [7., 8.]])
t3 = tf.Variable([[9., 10.], [11., 12.]]) + 1.0
tx = tf.Variable([[1., 2.], [3., 4.]])
t4 = tf.identity(tx)
for z in [t1, t2, t3, t4]:
    print(type(z))

`tf.Variable` tensors have methods `.assign()`, `.assign_sub()`, and `.assign_add()`. The modify the tensor in place.

In [None]:
x2 = tf.Variable(x)
x2.assign_add([[1., 1., 1.], [1., 1., 1.]])

In [None]:
x3 = tf.Variable(x)
x3.assign(x2)

However, mutability through the assign() method on index slices may not work on some GPU devices, for example the 'metal' device on Apple Silicon. (It does seem to work on NVidia GPU devices, though.) Therefore, for compatibility we need to move the tensor to the CPU before assigning. This will necessarily create a copy of the tensor, so it is not efficient.

In [None]:
with tf.device('CPU:0'):
    row_tensor = tf.constant([3., 3., 3.])
    x_cpu = tf.Variable(x)
    x_cpu[1].assign(row_tensor)
x_cpu, x_cpu.device

Then it will need to be moved (= copied) back to the GPU.

In [None]:
# This won't work without a GPU available
with tf.device('GPU:0'):
    x = tf.Variable(x_cpu)
x, x.device

### Reductions

In [None]:
tf.reduce_mean(x, axis=1).numpy()

In [None]:
keras.ops.mean(x, axis=0)

In [None]:
tf.reduce_sum(x, axis=1).numpy()

### Broadcasting

In [None]:
x + 10

In [None]:
z = tf.reduce_mean(x, axis=1)
z

In [None]:
tf.reshape(z, (-1, 1))

In [None]:
z[:, tf.newaxis]

In [None]:
x - z[:, tf.newaxis]

### Matrix operations

In [None]:
x @ tf.transpose(x)

### Ufuncs

In [None]:
tf.exp(x)

In [None]:
tf.sqrt(x)

In [None]:
keras.ops.sqrt(x)

### Random numbers

In [None]:
X = tf.random.normal(shape=(10,4))
y = tf.random.normal(shape=(10,1))

In [None]:
X

In [None]:
y

### Linear algebra

In [None]:
tf.linalg.lstsq(X, y)

### Vectorization

In [None]:
X = tf.random.normal(shape=(1000,10,4))
y = tf.random.normal(shape=(1000,10,1))

In [None]:
X

In [None]:
tf.linalg.lstsq(X, y)

### Automatic differentiation

Consider the simple function
$$
f =x^2 + 2y^2 + 3xy
$$

What are the partial derivatives with respect to $x$ and $y$ at $(1,2)$?

We have 
$$
\frac{\partial f}{\partial x} = 2x + 3y
$$

and 
$$
\frac{\partial f}{\partial y} = 4y+ 3x
$$

Evaluated at $(1,2)$, this gives $\frac{\partial f}{\partial x} = 8$ and $\frac{\partial f}{\partial y} = 11$.

We can also calculate th Hessian which in this case is the constant matrix
$$
\begin{bmatrix}
2 & 3 \\
3 & 4
\end{bmatrix}
$$

In [None]:
def f(x,y):
    return x**2 + 2*y**2 + 3*x*y

#### Gradient

In [None]:
x, y = tf.Variable(1.0), tf.Variable(2.0)
x, y

In [None]:
with tf.GradientTape() as tape:
    z = f(x, y)

In [None]:
tape.gradient(z, [x,y])

#### Hessian

In [None]:
with tf.GradientTape(persistent=True) as H_tape:
    with tf.GradientTape() as J_tape:
        z = f(x, y)
    Js = J_tape.gradient(z, [x,y])
Hs = [H_tape.gradient(J, [x,y]) for J in Js]
del H_tape                    

In [None]:
np.array(Hs)

Also see the corresponding [TensorFlow guide](https://www.tensorflow.org/guide/autodiff).

## Regression

In [None]:
import tensorflow_probability as tfp
tfd = tfp.distributions

In [None]:
xs = tf.Variable([0., 1., 2., 5., 6., 8.])
ys = tf.sin(xs) + tfd.Normal(loc=0, scale=0.5).sample(xs.shape[0])

In [None]:
xs.shape, ys.shape

In [None]:
xs.numpy()

In [None]:
ys.numpy()

In [None]:
xp = tf.linspace(-1., 9., 100)[:, None]
plt.scatter(xs.numpy(), ys.numpy())
plt.plot(xp, tf.sin(xp))

In [None]:
kernel = tfp.math.psd_kernels.ExponentiatedQuadratic(length_scale=1.5)
reg = tfd.GaussianProcessRegressionModel(
    kernel, xp[:, tf.newaxis], xs[:, tf.newaxis], ys
)

In [None]:
ub, lb = reg.mean() + [2*reg.stddev(), -2*reg.stddev()]
plt.fill_between(np.ravel(xp), np.ravel(ub), np.ravel(lb), alpha=0.2)
plt.plot(xp, reg.mean(), c='red', linewidth=2)
plt.scatter(xs[:], ys[:], s=50, c='k')

## Tensorflow Data

Tesnorflow provides a data API to allow it to work seamlessly with large data sets that may not fit into memory. This results in`Tensorflow Dataset (TFDS)` objects that handle multi-threading, queuing, batching and pre-fetching. 

You can think of TFDS as being a smart generator from data. Generally, you first create a TFDS from data using `from_tensor_slices` or from data in the file system or a relational database. Then you apply `transforms` to the data to process it, before handing it off to, say, a deep learning method.

### Using `from_tensor_slices`

You can pass in a list, dict, `numpy` array, or Tensorflow tensor.

In [None]:
x = np.arange(6)
ds = tf.data.Dataset.from_tensor_slices(x)
ds

In [None]:
for item in ds.take(3):
    print(item)

### Transformations

Once you have a TFDS, you can chain its transformation methods to process the data.

In [None]:
ds = ds.map(lambda x: x**2).repeat(3)

In [None]:
for item in ds.take(3):
    print(item)

In [None]:
ds = ds.shuffle(buffer_size=4, seed=0).batch(5)

In [None]:
for item in ds.take(3):
    print(item)

## Tensorflow probability

### Distributions

In [None]:
[str(x).split('.')[-1][:-2] for x in tfd.distribution.Distribution.__subclasses__()]

In [None]:
dist = tfd.Normal(loc=100, scale=15)

In [None]:
x = dist.sample((3,4))
x

In [None]:
n = 100
xs = dist.sample(n)
plt.hist(xs, density=True)
xp = tf.linspace(50., 150., 100)
plt.plot(xp, dist.prob(xp))
pass

### Broadcasting

In [None]:
means = [3,4,5,6]
dist = tfd.Normal(loc=means, scale=0.5)

In [None]:
dist.sample(5)

In [None]:
xp = tf.linspace(0., 9., 100)[:, tf.newaxis]
plt.plot(np.tile(xp, dist.batch_shape), dist.prob(xp),
         label=[f'$\mu=${m}' for m in means])
plt.legend()

### Transformations

In [None]:
[x for x in dir(tfp.bijectors) if x[0].isupper()]

In [None]:
lognormal = tfp.bijectors.Exp()(tfd.Normal(0, 0.5))

In [None]:
import seaborn as sns

In [None]:
xs = lognormal.sample(1000)
sns.displot(xs, kde=True)
xp = np.linspace(tf.reduce_min(xs), tf.reduce_max(xs), 100)
plt.plot(xp, tfd.LogNormal(loc=0, scale=0.5).prob(xp))