# COSC440 Exercise 01: Introduction to TensorFlow #

In this exercise, we will be introducing [TensorFlow](http://tensorflow.org/), a cutting-edge library for developing, and evaluating deep neural network models and a high-level framework, Keras, that has some pre-built TensorFlow components. For this class, **assume all labs and projects will utilize TensorFlow Version 2.X.0**. **[Pytorch](https://pytorch.org/) is an alternative deep learning framework to TensorFlow that will not be supported in this class**.

We will walk through how TensorFlow works and some of the basic functionality.

**Make sure to run each section and discuss in your group.**

### Import Tensorflow ###

Run the following python code, and check in with your group if there are errors.

In [3]:
# Install TensorFlow
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass


import tensorflow as tf
import numpy as np

print(tf.__version__)

2.17.0


## TensorFlow 101 - The Basics ##

Writing code in TensorFlow 2 is similar to writing code in Python. With eager execution (enabled in TensorFlow 2 by default), operations are evaluated immediately in an imperative programming environment. This makes operations return concrete values and makes things easier to debug, essentially what python is doing.

###Tensors###
Tensors in TensorFlow are objects that represent vectors or matrices. They are effectively represented as a n-dimensional array and they have two properties: a shape and a data type (float32, int32, or string, for example).

Consider the following example:

In [4]:
x = [[2.]]
m = tf.matmul(x, x)
print("x = {}".format(x))
print("m = {}".format(m))
print("hello, {}".format(m))

x = [[2.0]]
m = [[4.]]
hello, [[4.]]


Here, m is a Tensor. We can easily inspect the result of the multiplication with a print() statement because tensors are treated as concrete values in Tensorflow 2. Doing so doesn't interfere with training a network either!

As for actually setting up a model, we need to introduce a few important tensorflow constructs:

###Variables:###

Variables are a convenient way to represent trainable weights in your model that are constantly updated. Their values are shared throughout program execution, but can be changed by running operations on it. When using a variable, it must be initialized with starting values. The following are all valid ways of doing this:

In [5]:
# Using a python list
my_variable = tf.Variable([[1.,0.]])
print(my_variable)

# Initializing variables with a NumPy array
my_variable_from_np_array = tf.Variable(np.zeros((3,3)))
print(my_variable_from_np_array)

# You can also use some tensorflow built in variables
gaussian_initialization = tf.Variable(tf.random.normal(shape=[3,3], stddev=.1))
print(gaussian_initialization)

# To convert a variable from a tensor to a NumPy array, use the numpy() function
my_np_variable = tf.Variable([[1., 2., 5.]]).numpy()
print(my_np_variable)



<tf.Variable 'Variable:0' shape=(1, 2) dtype=float32, numpy=array([[1., 0.]], dtype=float32)>
<tf.Variable 'Variable:0' shape=(3, 3) dtype=float64, numpy=
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])>
<tf.Variable 'Variable:0' shape=(3, 3) dtype=float32, numpy=
array([[-0.13662481,  0.08411361,  0.04706925],
       [ 0.10552702,  0.06126437, -0.09235079],
       [ 0.12096814,  0.06565663,  0.05438525]], dtype=float32)>
[[1. 2. 5.]]


By default, TensorFlow variables are trainable. This means they will be watched and accounted for during automatic differentiation and backpropagation. For more information on tensorflow variables, check out the documentation:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/Variable?hl=en


###Gradient Tape:###

To compute the gradients with respect to the loss in our model, we need to make use of tf.GradientTape(). This enables us to record the graph for use with reverse auto differentiation. Any tensor computations inside the tape are accounted for and used during differentiation/backpropagation:

In [6]:
w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w + w

grad = tape.gradient(loss, w)
print(grad)

tf.Tensor([[2.]], shape=(1, 1), dtype=float32)


Here the loss is calculated as `w + w` within the scope of `tf.GradientTape()`, so we are able to differentiate the loss with respect to the variable, `w`. This gradient is calculated in the call to `tape.gradient()`. GradientTape() can also compute gradients for non-trainable variables/tensors by using the `watch()` function:

In [7]:
x = tf.ones((1, 1)) #Not a trainable variable!
with tf.GradientTape() as tape:
  tape.watch(x)
  loss = x + x

grad = tape.gradient(loss, x)
print(grad)

tf.Tensor([[2.]], shape=(1, 1), dtype=float32)


For tips and tricks on automatic differentiation, check out the documentation: https://www.tensorflow.org/tutorials/eager/automatic_differentiation


### Activation functions: Relu ###

Here is an example of calling the activation function Relu with our tensorflow variable:

In [8]:
gaussian_initialization = tf.Variable(tf.random.normal(shape=[3,3], stddev=.1))
print(gaussian_initialization)

print(tf.nn.relu(gaussian_initialization))

<tf.Variable 'Variable:0' shape=(3, 3) dtype=float32, numpy=
array([[-0.06526665,  0.21471539,  0.06197277],
       [ 0.07615405, -0.05479568, -0.02306583],
       [ 0.02092689,  0.03617748,  0.0339775 ]], dtype=float32)>
tf.Tensor(
[[0.         0.21471539 0.06197277]
 [0.07615405 0.         0.        ]
 [0.02092689 0.03617748 0.0339775 ]], shape=(3, 3), dtype=float32)


### Optimizers:###
Once we have the gradient computed with the GradientTape, we must update our weights for our model to train. For this we need to use an optimizer. In this class, you can use the Adam Optimizer. This optimizer is pretty standard, but examples of different types of optimizers are as follows:


```
adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
sgd_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01) #Stochastic Gradient Descent
adagrad_optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.001)
rms_prop_optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)
```

For now, don't worry about the details of how they work. You'll see soon in a future exercise!



## Putting it all Together##
Now let's use everything we've learned to create a TensorFlow 2 neural network. Note that because you are still working on Assignment 1, I have directly created a one layer dense neural network using the Keras package implementation. Some of the early exercises and assignments in this course require you to construct each underlying method for your Model instead of using built in components.


In [13]:
# Instantiate our model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(10, activation = 'relu', input_shape = (784,)))

###tf.one_hot():###

Next, we load in our data as well as initialize our model and optimizer. We'd like to turn our labels into one hot vectors. A one hot encoding is a vector representation where all the elements of the vector are 0 except one, which has 1 as its value. For example, [0 0 0 1 0 0] is a one-hot vector. Check out [this](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/one_hot)  documentation.

In [14]:
# Loading in and preprocessing the data
mnist = tf.keras.datasets.mnist
# x_train is your train inputs
# y_train is your train labels
# x_test is your test inputs
# y_test is your test labels
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0 # normalizing data
x_train = x_train.astype(np.float32)

# Make labels one hot vectors
y_train = tf.one_hot(y_train, depth=10)
y_test = tf.one_hot(y_test, depth=10)

# Choosing our loss function, optimizer, and metric
loss_function = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
metric = tf.keras.metrics.Accuracy()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step


Finally, we train using examples from our dataset:

In [15]:
# Loop through 10000 training images
for i in range(10000):
  image = np.reshape(x_train[i], (1,-1))
  label = np.reshape(y_train[i], (1,-1))

  # Implement backprop:
  with tf.GradientTape() as tape:
    predictions = model(image) # this calls the call function conveniently
    loss = loss_function(predictions, label)

    if i % 500 == 0:
      metric.update_state(model(x_train.reshape(-1,784)), y_train)
      train_acc = metric.result().numpy()
      print("Accuracy on training set after {} training steps: {}".format(i, train_acc))

  # The keras Model class has the computed property trainable_variables to conveniently
  # return all the trainable variables you'd want to adjust based on the gradients

  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))


Accuracy on training set after 0 training steps: 0.5202049016952515
Accuracy on training set after 500 training steps: 0.686976432800293
Accuracy on training set after 1000 training steps: 0.7475413084030151
Accuracy on training set after 1500 training steps: 0.7752020955085754
Accuracy on training set after 2000 training steps: 0.7935642004013062
Accuracy on training set after 2500 training steps: 0.8024643063545227
Accuracy on training set after 3000 training steps: 0.8101615309715271
Accuracy on training set after 3500 training steps: 0.8156802654266357
Accuracy on training set after 4000 training steps: 0.8215883374214172
Accuracy on training set after 4500 training steps: 0.8258597254753113
Accuracy on training set after 5000 training steps: 0.8289491534233093
Accuracy on training set after 5500 training steps: 0.8315682411193848
Accuracy on training set after 6000 training steps: 0.8337570428848267
Accuracy on training set after 6500 training steps: 0.8355081677436829


KeyboardInterrupt: 

###Note about model.trainable_variables:###
In the last 2 lines of the training loop above, we used `model.trainable_variables` to a get a list of all learnable variables in our model (`W1, b1, W2, b2` in this case). This feature was only available because our model subclasses `tf.Keras.Model`. In some exercises in this course we will optimize gradients and do backprop without subclassing `tf.Keras.Model` by passing in a list or array of the model's trainable variables manually like so:

`gradients = tape.gradient(loss, [model.W1, model.b1, model.W2, model.b2]))`

`optimizer.apply_gradients(zip(gradients, [model.W1, model.b1, model.W2, model.b2]))`

**Training accuracy should be near 87%**

**Check in with your group to ensure you have training accuracy at or near 87%**


## Under the hood: The computation graph##

Back in TensorFlow 1, the only way to write code was by creating a computation graph that defines the "flow" of Tensors (thus the name TensorFlow) which would be run later. This is the opposite of the eager execution that is now default in TensorFlow 2.0. A computation graph may still be used, but you will not be ever required to use it in this class, since using eager execution is considered standard now. However, it might be helpful for your understanding of deep learning to try to visualize this computation graph:
  + This graph consists of nodes and edges. Nodes are **Tensors**, or matrices of varying dimensions (i.e 3D, 4D, etc.). The edges are **Operations** that take one or more tensors, and produce a new, resulting Tensor after applying a given transformation (i.e. addition, subtraction, matrix multiplication, etc.)
  + All of these Tensors and Operations exist in this separate, high performing process. That means you can't print/peek into the Tensors like you would in TensorFlow 2.
  + The operations in the graph would be done with special variables called **placeholders**. Placeholders would provide a point of input that could be filled in later when the computation is actually formed. This allows for the same graph to be used for many train/test iterations.
  + Each of these iterations would be done by calling the run function of a **Session** object with a feed dictionary of all the values of the placeholders to use for this iteration.

<img src="https://drive.google.com/uc?export=view&id=1BpmERwt0-dIqVipEu7wozEJLb2fBGHIY" alt="gif.jpg" width="400" align="center">


## Acknowledgements & Sources ##

This exercise is modified from one written by Tim Ossowski and Rohin Bhushan, with edits by the HTAs and James Okun.