## 🔮 Deep Learning Frameworks

Another well known deep learning framework is Tensorflow by Google.

- PyTorch has a more Pythonic syntax (i.e., feels more natural)

- Has that edge over TensorFlow as well as supporting a dynamic computational graph (which was recently added to TF)

  - I.e., can modify the computation graph during runtime without incurring a performance penalty

- PyTorch overall [more popular](https://trends.google.com/trends/explore?date=today%205-y&q=%2Fg%2F11gd3905v1,Tensorflow&hl=en) and is widely preferred by researchers 

- That said, TensorFlow has a longer history of success with deployment

- Note we assume in the following tutorials that you have went over the PyTorch version (we will draw parallels/differences between them).

## Tensors

- `Tensor` array type just like `Numpy` arrays but can run on the GPU

- Other distinction from Numpy arrays is that they support differentiation (will see why later)

 - It follows that most if not all operations we can perform in Numpy have their equivalents in Tensorflow

Code in the following is adapted from PyTorch's official tutorial and [this article](https://medium.com/codex/tensor-basics-in-pytorch-252a34288f2).

In [1]:
import tensorflow as tf
import numpy as np

### 1. Initializing Tensors

In [2]:
# from a list
z = tf.Variable([[1, 2],[3, 4]])        # Gradients can be tracked for this (but technically lacks Tensor type)
z = tf.constant([[1, 2],[3, 4]])        # But not this and both of these don't support element-wise assignment

# from a Numpy array 
z = tf.constant(np.array(z))            # Can alternatively wrap in a variable

# an empty multi-dimensional (2 * 3 * 2) tensor
z = tf.zeros((2, 3, 2))

# a 1*12 vector
z = tf.range(12)

# a random 1*2 matrix 
z = tf.random.uniform((1, 2))

# a random 1*2 matrix drawn from Gaussian distribution 
z = tf.random.normal((1, 2))

# a zero-filled 1*2 matrix 
z = tf.zeros((1, 2))

# a 1*2 matrix filled with only 1
z = tf.ones((1, 2))

# Specifying the type of the elements of the tensor
z = tf.ones((2, 2), dtype=tf.int32)

# From another tensor
z = tf.zeros_like(z)

print(z)

tf.Tensor(
[[0 0]
 [0 0]], shape=(2, 2), dtype=int32)


### 2. Attributes of a Tensor

In [3]:
import tensorflow as tf

# Creating a sample tensor
Z = tf.constant([[1, 2], [3, 4]])       

# Shape of tensor
print(f"Shape of tensor: {Z.shape}")

# Datatype of tensor
print(f"Datatype of tensor: {Z.dtype}")

# Device tensor is stored on
print(f"Device tensor is stored on: {Z.device}")

Shape of tensor: (2, 2)
Datatype of tensor: <dtype: 'int32'>
Device tensor is stored on: /job:localhost/replica:0/task:0/device:CPU:0


### 3. Operations

Also mostly just like Numpy in indexing/slicing, masking, element-wise operations, mathematical operations, broadcasting, aggregation, etc. 

There will be some difference but as we argued they will be easy to get around.

In [4]:
x = tf.constant([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

# reshape
y = tf.reshape(x, (1, -1))
print(y)

# slicing
print(x[0,1])                   # no .item() like PyTorch
print(x[:, 1])

# masking
mask = x > 5
print(x[mask])

# element-wise oeprations
y = tf.constant([[2, 2, 2],
                 [3, 3, 3],
                 [4, 4, 4]])
print(x + y)
print(x * y)

# broadcasting
scalar = tf.constant(2)
print(x + scalar)

# aggregation
print(tf.reduce_mean(x, axis=0))                # it kept the "axis" term but changed whole name...

tf.Tensor([[1 2 3 4 5 6 7 8 9]], shape=(1, 9), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor([2 5 8], shape=(3,), dtype=int32)
tf.Tensor([6 7 8 9], shape=(4,), dtype=int32)
tf.Tensor(
[[ 3  4  5]
 [ 7  8  9]
 [11 12 13]], shape=(3, 3), dtype=int32)
tf.Tensor(
[[ 2  4  6]
 [12 15 18]
 [28 32 36]], shape=(3, 3), dtype=int32)
tf.Tensor(
[[ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]], shape=(3, 3), dtype=int32)
tf.Tensor([4 5 6], shape=(3,), dtype=int32)


TF is trying to extend support to Numpy operations via `import tensorflow.experimental.numpy as tnp`. See more [here](https://www.tensorflow.org/guide/tf_numpy).

#### 4. GPU Support

Unlike PyTorch, in Tensorflow tensors will default to the GPU whenever possible (i.e., operations supports it). However, manual device placement is also possible.

In [5]:
with tf.device('/CPU:0'):               # Meanwhile, /GPU:0 is your first GPU (or only one).
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
  print("a lives on", a.device)

# The operation supports GPU and is outside the with scope → move the tensors to GPU then perform the operation 
c = tf.matmul(a, b)
print("c lives on", c.device)

a lives on /job:localhost/replica:0/task:0/device:CPU:0
c lives on /job:localhost/replica:0/task:0/device:CPU:0


- Conversion to Numpy is possible and automatically takes it to CPU

In [6]:
z.numpy()                        

array([[0, 0],
       [0, 0]], dtype=int32)

#### 5. Storing Gradients 

Consider
$$Z_{1×1}=Y_{1×1}^2$$
where
$$Y_{1×1} = X_{1×4}^TW_{4×1}$$

We can implement this with:

In [7]:
import tensorflow as tf

# Define the tensors with actual values
x = tf.constant([2.0, 3.0, 4.0, 5.0], dtype=tf.float32)      # No tracking here
w = tf.Variable([1.0, 0.0, -1.0, 2.0], dtype=tf.float32)

# Need to explictly define a gradient tape to record the gradients of variables
with tf.GradientTape() as tape:
    y = tf.tensordot(tf.transpose(x), w, 1)                 # axis=1 is equivalent to matrix multiplication
    z = y ** 2

# Compute gradients
მzⳆმy = tape.gradient(z, y)                                 # works whether leave or not (let's try y, w, x). what happens for x?

print("Gradients:", მzⳆმy)

Gradients: tf.Tensor(16.0, shape=(), dtype=float32)


**We covered in this notebook:**

- Why GPUs are important for deep learning

- How deep learning frameworks (i.e., Tensorflow) solved Numpy's lack for such feature

- Initialization, Attributes and Operations over Tensorflow tensors

- GPU and automatic differentiation support!

### ⚔️ PyTorch VS. TensorFlow Revisited

Looking at the trend graph above, we see that initially TensorFlow was more successful than PyTorch and that PyTorch afterwards took over:

<img src="https://i.imgur.com/r11hV5i.png">

The problem in TensorFlow that made PyTorch much more favorable is that it used a static computational graph back then (and that it's syntax was less friendly). Back then, we would make a graph in TensorFlow as follows:
```python
# This is TensorFlow 1.x (should be no longer used)
import tensorflow as tf

# Define the graph
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = x * x * y + y + 2

# Compute gradients
gradients = tf.gradients(z, [x, y])

# Create a session and run the graph
with tf.Session() as sess:
    # Graph will be optimized and compiled first time this is run and placeholders in graph will be set to 3.0, 4.0
    z_value, gradients_value = sess.run([z, gradients], feed_dict={x: 3.0, y: 4.0})
    # We passed to sess.run the tensors we want to evaluate from the graph
    print("Result in TensorFlow 1.0:", z_value)
    print("Gradients in TensorFlow 1.0:", gradients_value)
```

The problem with this is that since the graph is compiled (hence, static graph): (i), it can't be changed in runtime (which can be useful when input size isn't constant for the model) and (ii), it's so hard to debug which is a big restriction to research

Meanwhile, PyTorch utilized a dynamic graph, it doesn't compile it so debugging an error midway through the graph is easy and they can change in runtime. 

It took TensorFlow long enough to reimplement graphs to support this as well (which is what we covered in this notebook: Tensorflow 2.x)

In [12]:
import tensorflow as tf                     # import torch

# Just like PyTorch Define the variables
x = tf.Variable(3.0)                          # torch.tensor(3.0, requires_grad=True)
y = tf.Variable(4.0)                          # torch.tensor(4.0, requires_grad=True)

# Perform operations directly
with tf.GradientTape() as tape:
    z = x * x * y + y + 2

# Compute the gradients
gradients = tape.gradient(z, [x, y])        # computed in PyTorch with z.backward() then z.grad, y.grad

# Get the values
z_value = z.numpy()
gradients_value = [grad.numpy() for grad in gradients]

But with this TensorFlow is slower (since compiling the graph allows optimizing it first as well and dynamic means that the graph is traced in every single run). For this, TensorFlow added the `@tf.function` which compiles the function into a static graph. 

In [13]:
def compute_gradients(x, y):
    with tf.GradientTape() as tape:
        # Perform the computation
        z = x * x * y + y + 2
    # Compute the gradients
    gradients = tape.gradient(z, [x, y])
    return z, gradients

# Define the variables
x = tf.Variable(3.0, dtype=tf.float32)
y = tf.Variable(4.0, dtype=tf.float32)

# Call the function
z, gradients = compute_gradients(x, y)

Rationally, you should only use it when you are no longer developing the function as you won't be able to debug effectively with it there.

Analogously, PyTorch also supports a static mode, although it's less often used and a more efficient dynamic implementation may be the reason:

In [None]:
import torch

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b

optimized_foo = torch.compile(foo)
optimized_foo(torch.tensor(0.0),torch.tensor(0.0))