# TensorFlow Overview

In [1]:
#!pip install tensorflow_datasets
#!pip install pydot
#!pip install graphviz 
#!pip install ipywidgets
#!pip install jupyter

import numpy as np
import tensorflow as tf
import os
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

from tensorflow import keras
from tensorflow.keras import layers

TensorFlow is relatively young machine learning framework created  by Google in 2015 with its first stable release in 2017. TensorFlow in essence allows you to create elegant dataflow graphs that defines how data can move through a graph by taking in multi-dimensional arrays called Tensors. If this all doesn't make sense don't worry! We will be breaking this down step-by-step so you can see the whole big picture.

However, before we get started with the nitty gritty it is important to understand the broad picture of how TensorFlow is organized. TensorFlow architectures typically have 3 steps
- Clean and preprocess the data
- Build the model
- Train and evaluating the model

## Cleaning and preprocessing the data
Cleaning and preprocessing the data is one of the most difficult parts of machine learning. Luckily TensorFlow provides a plethora of built in packages that make it very easy to clean and load data in properly. 

Check out the 'TensorFlow Loading Data' notebook for more information on the best practices to load in data. 

## Building the model
> After you have your data all packaged up and ready to go you, next you need to find a way to develop a model that will best suit your needs. TensorFlow offers 3 ways to structure your models using the Sequential API, the Functional API, and the SubClassing.

#### Sequential API
> This is the simplest model structure you will find in TensorFlow and is often used for very simple networks. Essentially, as the name implies, the Sequential model allows you to create a neural network that is formatted as a plain stack of layers where each layer has **one input tensor and one output tensor.** This is sufficient for some simple projects but if you are trying to create more complex models, Sequential cannot be used. 
As a rule of thumb, you should not use Sequential model if your project requires the following:
- Your model has multiple inputs and outputs
- Your model has layers that have multiple inputs and outputs
- You need to do layer sharing
- You want to implement a model with a non-linear topology (Residual Networks, multi-branch models)

https://www.tensorflow.org/guide/keras/sequential_model

#### Example of Sequential API

In [2]:
def create_model():
  return tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
  ])

seq_model = create_model()

keras.utils.plot_model(seq_model, "basic_model.png", show_shapes=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')


2022-02-21 16:12:25.444595: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib64:/opt/slurm/lib64:
2022-02-21 16:12:25.444645: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-02-21 16:12:25.444671: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (c0706a-s2.ufhpc): /proc/driver/nvidia/version does not exist
2022-02-21 16:12:25.444957: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Functional API
> The Functional model structure was created as an expansion of the limitation of the Sequential API. The functional API can essentially do everything that the sequential API can and has better support for creating models that have directed acyclic graphs (DAG) of layers. This allows it to support more advanced models such as GANS and Autoencoders that are dependent on having graph-like model structures.
When using the Functional API you still have access to build in methods found in sequential such as fit and evaluate, but you can also customize and [modify](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit/) these loops when you are implementing routines outside of classical supervised learning.

https://www.tensorflow.org/guide/keras/functional

#### Example of Functional API with multiple input and output layers

In [15]:
num_tags = 12  # Number of unique issue tags
num_words = 10000  # Size of vocabulary obtained when preprocessing text data
num_departments = 4  # Number of departments for predictions

title_input = keras.Input(
    shape=(None,), name="title"
)  # Variable-length sequence of ints
body_input = keras.Input(shape=(None,), name="body")  # Variable-length sequence of ints
tags_input = keras.Input(
    shape=(num_tags,), name="tags"
)  # Binary vectors of size `num_tags`

# Embed each word in the title into a 64-dimensional vector
title_features = layers.Embedding(num_words, 64)(title_input)
# Embed each word in the text into a 64-dimensional vector
body_features = layers.Embedding(num_words, 64)(body_input)

# Reduce sequence of embedded words in the title into a single 128-dimensional vector
title_features = layers.LSTM(128)(title_features)
# Reduce sequence of embedded words in the body into a single 32-dimensional vector
body_features = layers.LSTM(32)(body_features)

# Merge all available features into a single large vector via concatenation
x = layers.concatenate([title_features, body_features, tags_input])

# Stick a logistic regression for priority prediction on top of the features
priority_pred = layers.Dense(1, name="priority")(x)
# Stick a department classifier on top of the features
department_pred = layers.Dense(num_departments, name="department")(x)

# Instantiate an end-to-end model predicting both priority and department
model = keras.Model(
    inputs=[title_input, body_input, tags_input],
    outputs=[priority_pred, department_pred],
)

keras.utils.plot_model(model, "multi_input_and_output_model.png", show_shapes=True)

('You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) ', 'for plot_model/model_to_dot to work.')


## Subclassing
> TensorFlow provides users with a model super class that can be subclassed and fully customized. This essentially allows you to implement a custom forward-pass of your model, however this comes with the downside that model subclassing is often far harder to use than the Sequential and the Functional API. However, if you are working with complex custom models/layers it is often impossible to implement them with the sequential/functional API alone, which is the justification for using the Subclass API. This extra control provided to the user allows for very specific implementation and allows users to  replicate research papers.

https://www.tensorflow.org/guide/keras/custom_layers_and_models

#### Functional vs Subclassing
Both models created using the Functional API and Subclassing are very powerful, but it is important to know when to use which model over the other.

Functional API Strengths:
- Functional API is **less verbose** and often cleaner in code than equivalent subclassed code
- Functional API has **more protection and validates the created model** unlike subclassed alternatives. Everytime you call a layer through the functional API, the layer has nuilt in checks that will run to verify that the specification (input, typically (shape and dtype)) passed to the layer matches its assumption and it will flag an error if not. With subclassing you lose these checks
- A functional model can be **plotted and easily inspected.**
- A functional model can **easily be serialized and safely saved as a single file**. If you wish to do the same with a subclassed model, you must specify a get_config() and from_config() method at the model level. So essentially you will need to implement saving/serialization code.

Functional API Weaknesses:
- The functional API **does not support dynamic architectures** and works with teh assumption that models are DAGs of layers which is usually trye for more architectures but not all -- recursive networks or Tree RNNs do not follow this assumption and cannot be implemented using functional.

## Training and evaluating the data
> One you have your model prebuilt and your data ready, the last step is is to train and evaluate your model. This can typically done in tensorflow by using the prebuilt [.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) or [.evaluate](https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate), assuming you are using the sequential and functional API. However if you are not or you are using a model architecture that is not typically supervised, then you must define a custom training function to train your model.

### Creating custom training functions
> TensorFlow offers a powerful suite of mathematical tools that make it easy for a user to perform operations such as gradient descent in their training functions. This gives the user strong flexibility and control over their training functions. 

Typically, the process of defining a custom training function is as follows:
1. Define the loss and gradient functions
2. Pick and optimizer
3. Put it all together in a training loop

#### Example of a custom training loop on the Penguin dataset
https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough

In [30]:
## ---------- PREPROCESS AND LOAD THE DATA ---------- ##
ds_split, info = tfds.load("penguins/processed", split=['train[:20%]', 'train[20%:]'], as_supervised=True, with_info=True)

ds_test = ds_split[0]
ds_train = ds_split[1]
assert isinstance(ds_test, tf.data.Dataset)

print(info.features)
df_test = tfds.as_dataframe(ds_test.take(5), info)
print("Test dataset sample: ")
print(df_test)

df_train = tfds.as_dataframe(ds_train.take(5), info)
print("Train dataset sample: ")
print(df_train)

ds_train_batch = ds_train.batch(32)

## ---------- DEFINE THE MODEL ---------- ##

model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(4,)),  # input shape required
  tf.keras.layers.Dense(10, activation=tf.nn.relu),
  tf.keras.layers.Dense(3)
])

## ---------- TRAIN THE MODEL ---------- ##

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

## Defines loss function needed for training
def loss(model, x, y, training):
    # training=training is needed only if there are layers with different
    # behavior during training versus inference (e.g. Dropout).
    y_ = model(x, training=training)

    return loss_object(y_true=y, y_pred=y_)

l = loss(model, features, labels, training=False)
print("Loss test: {}".format(l))

## Defines the gradient descent used to update the weights of the model
def grad(model, inputs, targets):
    with tf.GradientTape() as tape:
        loss_value = loss(model, inputs, targets, training=True)
    return loss_value, tape.gradient(loss_value, model.trainable_variables)

optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

# Keep results for plotting
train_loss_results = []
train_accuracy_results = []

num_epochs = 201

for epoch in range(num_epochs):
    epoch_loss_avg = tf.keras.metrics.Mean()
    epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()

      # Training loop - using batches of 32
    for x, y in ds_train_batch:
        # Optimize the model
        loss_value, grads = grad(model, x, y)
        optimizer.apply_gradients(zip(grads, model.trainable_variables))

        # Track progress
        epoch_loss_avg.update_state(loss_value)  # Add current batch loss
        # Compare predicted label to actual label
        # training=True is needed only if there are layers with different
        # behavior during training versus inference (e.g. Dropout).
        epoch_accuracy.update_state(y, model(x, training=True))

    # End epoch
    train_loss_results.append(epoch_loss_avg.result())
    train_accuracy_results.append(epoch_accuracy.result())

    if epoch % 50 == 0:
        print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch,
                                                                epoch_loss_avg.result(),
                                                                epoch_accuracy.result()))

[1mDownloading and preparing dataset 25.05 KiB (download: 25.05 KiB, generated: 17.61 KiB, total: 42.66 KiB) to /home/justinho/tensorflow_datasets/penguins/processed/1.0.0...[0m


ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

## Understanding how TensorFlow works
Now that we have gone over the high-level overview of TensorFlow it is time to dive deeper and understand how it works and how it is organized.

### Tensor
> TensorFlow is named after the Tensor: a vector or matrix of n-dimensions that contains a uniform type. Tensors are immutable and cannot be updated, only created. Tensors are similar to numpy arrays and provide the user a way to perform highly optimized and efficient mathematical operations.

Tensors have associated attribtues as follows:
 - **Shape**: The length of each axes of a tensor
 - **Rank**: The number of tensor axes. A scalar has rank 0 while a matrix has a rank 2
 - **Axis/Dimension**: A specific axis of a tensor
 - **Size**: The total number of items in a tensor

##### Sample mathematical operations on Tensors

In [27]:
a = tf.constant([[1, 2],
                 [3, 4]])

b = tf.constant([[1, 1],
                 [1, 1]]) # Could have also said `tf.ones([2,2])`

print("Addition of a and b\n", tf.add(a, b), "\n")
print("Element-wise Multiplication of a and b\n",tf.multiply(a, b), "\n")
print("Matrix Multiplication of a and b\n",tf.matmul(a, b), "\n")

Addition of a and b
 tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32) 

Element-wise Multiplication of a and b
 tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32) 

Matrix Multiplication of a and b
 tf.Tensor(
[[3 3]
 [7 7]], shape=(2, 2), dtype=int32) 



### Graphs
> If all mathematical operations were performed exclusively in python, it would be infeasbile. This implementation would be very slow and would also not be very portable, requiring users to install python in order to run their models. This served as the inspiration for the creation of Graphs. Graphs in tensorflow are data structures that contain combinations of tf.Operation objects, which represents computations (addition, subtract, etc.) and Tensor objects, which represent data that flows through each operation. With this in mind, the operations represent nodes in a graph and the tensors flow (Hence the name of the library) through the operation. Because graphs are data structures, they are portable and can be executed all without the original python code.

Graphs provide a lot of freedom and flexibility to the user allowing the user due to their seemingly language agnostic state. This allows graphs to be able to run on a wide variety of machines. Graphs are also very optimized, removing redundent or unecessary operations, and typically perform very operations very quickly. It is also worth noting that tensorflow models are converted to graphs when they are compiled which is what allows these models to be trained over multiple devices.

One can view the graphs of models using tensorboards (See Unit 2)