This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.

**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**

This notebook was generated for TensorFlow 2.6.

In [5]:
import numpy as np
from tensorflow import keras

# The mathematical building blocks of neural networks

To provide sufficient context for introducing tensors and gradient descent, we’ll begin the
chapter with a practical example of a neural network. Then we’ll go over every new concept
that’s been introduced, point by point.

## The gears of neural networks: tensor operations

In [2]:
# what happens here?
keras.layers.Dense(512, activation="relu")

<keras.layers.core.Dense at 0x280f0915b88>

### Element-wise operations

In [6]:
# relu(x) is max(x, 0), relu stands for "REctified Linear Unit"
def naive_relu(x):
    assert len(x.shape) == 2
    
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

In [8]:
# naive_relu(np.array([-1,-1,2,2]))
naive_relu(np.array([[-1,2], [3, -2]]))

array([[0, 2],
       [3, 0]])

In [10]:
def naive_add(x, y):
    assert len(x.shape) == 2
    assert x.shape == y.shape
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x

In [11]:
naive_add(np.array([[1,1], [2,2]]), np.array([[1,2], [3,4]]))

array([[2, 3],
       [5, 6]])

In practice, when dealing with NumPy arrays, these operations are available as well-optimized built-in NumPy functions, which themselves delegate the heavy lifting to a Basic Linear Algebra Subprograms (BLAS) implementation. BLAS are low-level, highly parallel, efficient tensor-manipulation routines that are typically implemented in Fortran or C.

In [16]:
import time

x = np.random.random((20, 100))
y = np.random.random((20, 100))

t0 = time.perf_counter()
for _ in range(1000):
    z = x + y
    z = np.maximum(z, 0.)
print("Took: {0:.2f} s".format(time.perf_counter() - t0))

Took: 0.01 s


In [17]:
t0 = time.perf_counter()
for _ in range(1000):
    z = naive_add(x, y)
    z = naive_relu(z)
print("Took: {0:.2f} s".format(time.perf_counter() - t0))

Took: 2.40 s


Likewise, when running TensorFlow code on a GPU, elementwise operations are executed via fully-vectorized CUDA implementations that can best utilize the highly-parallel GPU chip architecture.

### Broadcasting

With broadcasting, you can generally apply two-tensor element-wise operations if one tensor has shape `(a, b, … n, n + 1, … m)` and the other has shape `(n, n + 1, … m)`. The broadcasting will then automatically happen for axes `a` through `n - 1`.

In [18]:
import numpy as np
X = np.random.random((32, 10))
y = np.random.random((10,))

In [23]:
y = np.expand_dims(y, axis=0)

In [26]:
Y = np.concatenate([y] * 32, axis=0)

In [27]:
def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[j]
    return x

In [28]:
import numpy as np
x = np.random.random((64, 3, 32, 10))
y = np.random.random((32, 10))
z = np.maximum(x, y)

In [29]:
z.shape

(64, 3, 32, 10)

### Tensor product

The tensor product, or **dot product** (not to be confused with an element-wise product, the `*` operator) is one of the most common, most useful tensor operations.

In mathematical notation, you’d note the operation with a dot ($\cdot{}$).

In [30]:
x = np.random.random((32,))
y = np.random.random((32,))
z = np.dot(x, y)

z

7.836457769167991

In [14]:
def naive_vector_dot(x, y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    assert x.shape[0] == y.shape[0]
    z = 0.
    for i in range(x.shape[0]):
        z += x[i] * y[i]
    return z

naive_vector_dot(np.array([1, 2, 3]), np.array([4, 5, 6]))

32.0

You can also take the dot product between a matrix $X$ and a vector $y$, which returns a vector where the coefficients are the dot products between $y$ and the rows of $X$. You implement it as follows

In [17]:
def naive_matrix_vector_dot(X, y):
    assert len(X.shape) == 2
    assert len(y.shape) == 1
    assert X.shape[1] == y.shape[0]
    z = np.zeros(X.shape[0])
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            z[i] += X[i, j] * y[j]
    return z

naive_matrix_vector_dot(np.array([[1, 2, 3], [4, 5, 6]]), np.array([1, 2, 3]))

array([14., 32.])

In [0]:
def naive_matrix_vector_dot(x, y):
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
        z[i] = naive_vector_dot(x[i, :], y)
    return z

![](https://drek4537l1klr.cloudfront.net/chollet2/v-7/Figures/ch02-matrix_dot_box_diagram.png)

In [18]:
def naive_matrix_dot(X, Y):
    assert len(X.shape) == 2
    assert len(Y.shape) == 2
    assert X.shape[1] == Y.shape[0]
    z = np.zeros((X.shape[0], Y.shape[1]))
    for i in range(X.shape[0]):
        for j in range(Y.shape[1]):
            row_x = X[i, :]
            column_y = Y[:, j]
            z[i, j] = naive_vector_dot(row_x, column_y)
    return z

naive_matrix_dot(np.array([[1, 2, 3], [4, 5, 6]]), np.array([[1,2], [3, 4], [5, 6]]))

array([[22., 28.],
       [49., 64.]])

### Tensor reshaping

Reshaping a tensor means rearranging its rows and columns to match a target shape. Naturally, the reshaped tensor has the same total number of coefficients as the initial tensor. Reshaping is best understood via simple examples:

In [0]:
train_images = train_images.reshape((60000, 28 * 28))

In [19]:
x = np.array([[0., 1.],
             [2., 3.],
             [4., 5.]])
x.shape

(3, 2)

In [20]:
x = x.reshape((6, 1))
x

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.]])

In [21]:
x = np.zeros((300, 20))
x = np.transpose(x)
x.shape

(20, 300)