# Deep Learning with Python (2nd ed.)
[Website](https://www.manning.com/books/deep-learning-with-python-second-edition)

# 2.3 Tensor Operations
1. [Element-Wise Operations](#elementWiseOperations)
2. [Broadcasting](#broadcasting)
3. [Tensor Product](#tensorProduct)
4. [Tensor Reshaping](#tensorReshaping)
5. [Geometric Interpretation of Tensor Operations](#geometricInterpretation)

Much as __any computer program can be ultimately reduced to a small set of binary operations on binary inputs__ (AND, OR, NOR, and so on), all __transformations learned by deep NNs can be reduced to a handful of tensor operations__ (or tensor functions) applied to tensors of numeric data.

In [55]:
from tensorflow import keras
from tensorflow.keras import layers

# a keras instance
keras.layers.Dense(512, activation='relu')

<keras.layers.core.dense.Dense at 0x18333479670>

This layer can be interpreted as a __function__, which __takes as input a matrix and returns another matrix__—a new representation for the input tensor. 

Specifically, the function is as follows (where `W` is a matrix and `b` is a vector, both attributes of the layer):

`output = relu(dot(input, W) + b)`

We have __three tensor operations__ here:

1. A __dot product__ between the input tensor and a tensor named `W`
2. An __addition__ between the resulting matrix and a vector `b`
3. A __ReLu__ (rectified linear unit) operation: `relu(x)` is `max(x, 0)`

> "_ReLU is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero._"

__Source__: [Link](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/)

<a name="elementWiseOperations"></a>
## 2.3.1 Element-Wise Operations

The __ReLu operation__ and __addition__ are __element-wise operations__: operations that are __applied independently to each entry in the tensors being considered__. 

This means these operations are highly amenable to __massively parallel implementations__ (__vectorized implementations__). 

The code below demonstrates a naive Python implementation of an element-wise operation of the ReLu and addition operations, using a `for` loop.

In [54]:
# relu operation
def naive_relu(x):
    # check that x is a matrix (rank-2 tensor)
    assert len(x.shape) == 2
    # avoid overwriting the input tensor
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

# addition
def naive_add(x, y):
    # check that x is a matrix (rank-2 tensor)
    assert len(x.shape) == 2
    # check that y is a matrix (rank-2 tensor)
    assert x.shape == y.shape
    # avoid overwriting the input tensor
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x

These operations are available as __well-optimized built-in NumPy functions__, which themselves delegate the heavy lifting to a __Basic Linear Algebra Subprograms__ (BLAS) implementation. 

BLAS are __low-level, highly parallel, efficient tensor-manipulation routines that are typically implemented in Fortran or C__.

So, in NumPy, you can do the following element-wise operation, and it will be blazing fast.

In [61]:
import numpy as np

x = np.random.random((20, 100))
y = np.random.random((20, 100))

# addition
z_add = x + y
# ReLu
z_relu = np.maximum(z, 0.)

In [62]:
import time
  
t0 = time.time() 
for _ in range(1000):
    # addition
    z = x + y
    # ReLu
    z = np.maximum(z, 0.) 
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 0.01 s


In [63]:
t0 = time.time() 
for _ in range(1000):
    # addition
    z = naive_add(x, y)
    # ReLu
    z = naive_relu(z) 
print("Took: {0:.2f} s".format(time.time() - t0))

Took: 2.33 s


Likewise, when running TensorFlow code on a __GPU__, element-wise operations are __executed via fully vectorized CUDA implementations__ that can best utilize the highly parallel GPU chip architecture.

<a name="broadcasting"></a>
## 2.3.2 Broadcasting

Our earlier naive implementation of `naive_add` only supports the addition of __rank-2 tensors with identical shapes__. 

But in the Dense layer introduced earlier, we added a __rank-2 tensor with a vector__. What happens with addition when the shapes of the two tensors being added differ?

When possible, and if there’s no ambiguity, __the smaller tensor will be broadcast to match the shape of the larger tensor__. Broadcasting consists of two steps:

1. Axes (called __broadcast axes__) are added to the smaller tensor to match the `ndim` of the larger tensor.
2. The smaller tensor is __repeated alongside these new axes__ to match the full shape of the larger tensor.


Consider `X` with shape `(32, 10)` and `y` with shape `(10,)`:

In [10]:
X = np.random.random((32, 10))
y = np.random.random((10,))
X.shape, y.shape

((32, 10), (10,))

In [11]:
# add an empty first axis to y
y = np.expand_dims(y, axis=0)
y.shape

(1, 10)

In [12]:
# repeat y 32 times alongside new axis
Y = np.concatenate([y] * 32, axis=0)
Y.shape

(32, 10)

In terms of implementation, __no new rank-2 tensor is created__, because that would be terribly inefficient. 

__The repetition operation is entirely virtual__: it happens at the algorithmic level rather than at the memory level. 

But thinking of the vector being repeated 10 times alongside a new axis is a helpful mental model. 

Here’s what a naive implementation would look like.

In [13]:
def naive_add_matrix_and_vector(x, y):
    # check that x is a matrix (rank-2 tensor)
    assert len(x.shape) == 2
    # check that y is a vector (rank-1 tensor)
    assert len(y.shape) == 1
    #  # the first dim of x must be equal to the 0th dim of y
    assert x.shape[1] == y.shape[0]
    # avoid overwriting the input tensor
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[j]
    return x

With broadcasting, you can generally perform element-wise operations that take two inputs tensors if one tensor has shape `(a, b, ... n, n + 1, ... m)` and the other has shape `(n, n + 1, ... m)`. 

The broadcasting will then automatically happen for axes a through `n - 1`.

The following example applies the element-wise maximum operation to two tensors of different shapes via broadcasting.

In [65]:
x = np.random.random((64, 3, 32, 10))
print(f"x.shape: {x.shape} -> rank-4 tensor")
y = np.random.random(((32,10)))
print(f"y.shape: {y.shape} -> matrix (rank-2 tensor)")
z = np.maximum(x, y)
print(f"z.shape: {z.shape} -> rank-4 tensor")

x.shape: (64, 3, 32, 10) -> rank-4 tensor
y.shape: (32, 10) -> matrix (rank-2 tensor)
z.shape: (64, 3, 32, 10) -> rank-4 tensor


<a name="tensorProduct"></a>
## 2.3.3 Tensor Product

The tensor product, or __dot product__ (not to be confused with an element-wise product, the `*` operator), is __one of the most common, most useful tensor operations__.

In NumPy, a tensor product is done using the `np.dot` function (because the mathematical notation for tensor product is usually a dot).

In [22]:
x = np.random.random((32,))
print(f"x.shape: {x.shape} -> vector (rank-1 tensor)")
y = np.random.random((32,))
print(f"y.shape: {y.shape} -> vector (rank-1 tensor)")
z = np.dot(x, y)
print(f"z.shape: {z.shape} -> scalar!")

x.shape: (32,) -> vector (rank-1 tensor)
y.shape: (32,) -> vector (rank-1 tensor)
z.shape: () -> scalar!


Mathematically, what does the dot operation do? 

Let’s start with the dot product of two vectors, `x` and `y`.

In [20]:
def naive_vector_dot(x, y):
    # check that x is a rank-1 tensor (vector)
    assert len(x.shape) == 1
     # check that y is a rank-1 tensor (vector)
    assert len(y.shape) == 1
     # check that they have equal dims
    assert x.shape[0] == y.shape[0]
    # convert 0 to float
    z = 0.
    for i in range(x.shape[0]):
        z += x[i] * y[i]
    return z

naive_vector_dot(x,y)

7.824177458558538

__Info on `z = 0.`__: [Link](https://stackoverflow.com/questions/26476352/what-does-a-dot-after-an-integer-mean-in-python)

You’ll have noticed that __the dot product between two vectors is a scalar__ and that only __vectors with the same number of elements__ are compatible for a dot product.

You can also take the __dot product between a matrix `x` and a vector `y`__, which __returns a vector where the coefficients are the dot products between `y` and the rows of `x`__.

In [69]:
def naive_matrix_vector_dot(x, y):
    # check that x is a rank-2 tensor (matrix)
    assert len(x.shape) == 2
    # check that y is a rank-1 tensor (vector)
    assert len(y.shape) == 1
    # the first dim of x must be equal to the 0th dim of y
    assert x.shape[1] == y.shape[0]
    # this operation returns a vector of 0s with the same shape as y
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            z[i] += x[i, j] * y[j]
    return z

In [35]:
x = np.random.random((32, 10))
print(f"x.shape: {x.shape} -> matrix (rank-2 tensor)")
y = np.random.random((10,))
print(f"y.shape: {y.shape} -> vector (rank-1 tensor)")

z = naive_matrix_vector_dot(x, y)
print(f"z.shape: {z.shape} -> vector (rank-1 tensor)")

x.shape: (32, 10) -> matrix (rank-2 tensor)
y.shape: (10,) -> vector (rank-1 tensor)
z.shape: (32,) -> vector (rank-1 tensor)


You could also reuse the code we wrote previously, which highlights the __relationship between a matrix-vector product and a vector product__.

In [36]:
def naive_matrix_vector_dot(x, y):
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        z[i] = naive_vector_dot(x[i, :], y)
    return z

Note that as soon as one of the two tensors has an `ndim` greater than 1, __dot is no longer symmetric__, which is to say that __`dot(x, y)` $\neq$ `dot(y, x)`__.

Of course, a dot product generalizes to tensors with an arbitrary number of axes. 

__The most common applications may be the dot product between two matrices__. 

You can take the dot product of two matrices `x` and `y` `(dot(x, y))` if and only if `x.shape[1] == y.shape[0]`. 

The result is a matrix with shape `(x.shape[0], y.shape[1])`, where the coefficients are the vector products between the rows of `x` and the columns of `y`.

In [39]:
def naive_matrix_dot(x, y):
    # check that x is a rank-2 tensor (matrix)
    assert len(x.shape) == 2
    # check that y is a rank-2 tensor (matrix)
    assert len(y.shape) == 2
    # the first dim of x must be equal to the 0th dim of y
    assert x.shape[1] == y.shape[0]
    # this operation returns a matrix of 0s with a specific shape
    z = np.zeros((x.shape[0], y.shape[1]))
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            row_x = x[i, :]
            column_y = y[:, j]
            z[i, j] = naive_vector_dot(row_x, column_y)
    return z

To understand __dot-product shape compatibility__, it helps to visualize the input and output tensors by aligning them as shown below.

![](https://drek4537l1klr.cloudfront.net/chollet2/Figures/02-05.png)

In the figure, `x`, `y`, and `z` are pictured as rectangles (literal boxes of coefficients). 

Because __the rows of `x` and the columns of `y` must have the same size__, it follows that __the width of `x` must match the height of `y`__. 

If you go on to develop new machine learning algorithms, you’ll likely be drawing such diagrams often.

More generally, __you can take the dot product between higher-dimensional tensors__, following the same rules for shape compatibility as outlined earlier for the 2D case.

`(a, b, c, d) • (d,) → (a, b, c)` <br>
`(a, b, c, d) • (d, e) → (a, b, c, e)`

<a name="tensorReshaping"></a>
## 2.3.4 Tensor Reshaping

A third type of tensor operation that’s essential to understand is __tensor reshaping__. 

Although it wasn’t used in the Dense layers in our first NN example, __we used it when we preprocessed the digits data before feeding it into our model__.

`train_images = train_images.reshape((60000, 28 * 28))`

Reshaping a tensor means __rearranging its rows and columns to match a target shape__. 

Naturally, __the reshaped tensor has the same total number of coefficients as the initial tensor__. 

Reshaping is best understood via simple examples.

In [70]:
x = np.array([[0., 1.],
              [2., 3.],
              [4., 5.]])
              
print(f"x.shape: {x.shape}")

x.shape: (3, 2)


In [71]:
x = x.reshape((6, 1))
print(f"x.shape: {x.shape}")
x

x.shape: (6, 1)


array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.]])

In [72]:
x = x.reshape((2, 3))
print(f"x.shape: {x.shape}")
x

x.shape: (2, 3)


array([[0., 1., 2.],
       [3., 4., 5.]])

A special case of reshaping that’s commonly encountered is __transposition__. 

Transposing a matrix means __exchanging its rows and its columns__, so that `x[i, :]` becomes `x[:, i]`.

In [74]:
x = np.zeros((300, 20))
print(f"x.shape: {x.shape}")
x = np.transpose(x)
print(f"x transposed: {x.shape}")

x.shape: (300, 20)
x transposed: (20, 300)
