### The Gears of Neural Networks: Tensor Operations

#### Element-wise operations

In [12]:
import numpy as np

ReLU formula in math form:

$$
\operatorname{ReLU}(x) = \max(0, x) =
\begin{cases}
0, & x < 0 \\
x, & x \ge 0
\end{cases}
$$

In [13]:
def naive_relu(x):
    assert len(x.shape) == 2
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

In [14]:
x = np.array([
    [-2.0, -1.0, 0.0],
    [1.0, 2.0, -3.0]
])

naive_relu(x)

array([[0., 0., 0.],
       [1., 2., 0.]])

#### What `naive_relu` is doing

The `naive_relu(x)` function applies the ReLU (Rectified Linear Unit) activation element-wise to a 2D NumPy array `x`. It first makes a copy of `x` so the original isn’t modified, then loops over every element and replaces any negative value with `0`, leaving positive values unchanged. The result is a new array where all entries are `>= 0`.

In [15]:
def naive_add(x, y):
    assert len(x.shape) == 2
    assert x.shape == y.shape
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x

In [16]:
x = np.array([[1.0, -2.0]])
y = np.array([[0.5, -2.0]])

naive_add(x, y)

array([[ 1.5, -4. ]])

#### What `naive_add` is doing

The `naive_add(x, y)` function performs element-wise addition on two 2D NumPy arrays of the same shape. It first checks that both inputs are 2D and have identical shapes, then makes a copy of `x` so the original isn’t modified. It loops over every index `(i, j)` and adds `y[i, j]` to `x[i, j]`, returning the resulting summed array.

#### Numpy implementation

In [17]:
z = x + y 
print("Element-wise Addition:", z)

z = np.maximum(z, 0.0)
print("Element-wise ReLU:", z)

Element-wise Addition: [[ 1.5 -4. ]]
Element-wise ReLU: [[1.5 0. ]]


In [18]:
import time

x = np.random.random((20, 100))
y = np.random.random((20, 100))

t0 = time.time()
for _ in range(1000):
    z = x + y
    z = np.maximum(z, 0.0)
print("Took: {0:.3f}s".format(time.time() - t0))

Took: 0.003s


In [19]:
t0 = time.time()
for _ in range(1000):
    z = naive_add(x, y)
    z = naive_relu(z)
print("Took: {0:.3f}s".format(time.time() - t0))

Took: 0.807s


### Broadcasting

#### What broadcasting means

Broadcasting is NumPy’s way of automatically expanding arrays with smaller shapes so they can participate in element-wise operations with larger arrays, without actually copying data. For example, adding an array of shape `(32, 10)` and a vector of shape `(10,)` works because NumPy “stretches” the `(10,)` vector across the 32 rows, effectively treating it like a `(32, 10)` array during the computation.

In [20]:
np.set_printoptions(threshold=np.inf, linewidth=np.inf)

X = np.random.random((32, 10)) # random matrix with shape (32, 10)
y = np.random.random((10,))    # random vector with shape (10,) 

In [21]:
print("First 5 rows of X matrix:\n", X[:5])
print("X shape:", X.shape)
print("\ny vector:", y)
print("y shape:", y.shape)

First 5 rows of X matrix:
 [[0.4668844  0.00446609 0.31608552 0.89809409 0.4061209  0.89185013 0.10866038 0.6905944  0.15461498 0.5542997 ]
 [0.76314184 0.30692388 0.91176441 0.20310135 0.96389516 0.80601232 0.7329317  0.18900094 0.67967066 0.08348004]
 [0.99196206 0.74050732 0.7724089  0.23737587 0.69801694 0.09258226 0.85991988 0.3732307  0.20065603 0.3536911 ]
 [0.57321585 0.46676595 0.02723593 0.51138373 0.27724907 0.35320421 0.5075838  0.32570316 0.60831363 0.66416643]
 [0.8293787  0.276974   0.4998897  0.51040721 0.44912471 0.17869222 0.82557196 0.22330327 0.52839297 0.38989602]]
X shape: (32, 10)

y vector: [0.61132975 0.95197178 0.00909996 0.96146604 0.98594171 0.40637014 0.72522149 0.45516538 0.84224167 0.2915603 ]
y shape: (10,)


In [22]:
# Add an empty first axis to y, changing the shape to (1, 10) now
Y = np.expand_dims(y, axis=0) 

In [23]:
print("Y:", Y)
print("Y shape:", Y.shape)

Y: [[0.61132975 0.95197178 0.00909996 0.96146604 0.98594171 0.40637014 0.72522149 0.45516538 0.84224167 0.2915603 ]]
Y shape: (1, 10)


In [24]:
# Repeat `y` 32 times alongside this new axis, so that we end up with a tensor Y with shape (32, 10), 
# where Y[i, :] == y for i in range(0, 32)
Y = np.tile(y, (32, 1))

In [25]:
print("Y first 5 rows of matrix:\n", Y[:5])
print("Y shape:", Y.shape)

Y first 5 rows of matrix:
 [[0.61132975 0.95197178 0.00909996 0.96146604 0.98594171 0.40637014 0.72522149 0.45516538 0.84224167 0.2915603 ]
 [0.61132975 0.95197178 0.00909996 0.96146604 0.98594171 0.40637014 0.72522149 0.45516538 0.84224167 0.2915603 ]
 [0.61132975 0.95197178 0.00909996 0.96146604 0.98594171 0.40637014 0.72522149 0.45516538 0.84224167 0.2915603 ]
 [0.61132975 0.95197178 0.00909996 0.96146604 0.98594171 0.40637014 0.72522149 0.45516538 0.84224167 0.2915603 ]
 [0.61132975 0.95197178 0.00909996 0.96146604 0.98594171 0.40637014 0.72522149 0.45516538 0.84224167 0.2915603 ]]
Y shape: (32, 10)


In [27]:
print("Adding first 5 rows of X and Y:", X[:5] + Y[:5])

Adding first 5 rows of X and Y: [[1.07821415 0.95643786 0.32518548 1.85956013 1.39206261 1.29822027 0.83388187 1.14575978 0.99685665 0.84586   ]
 [1.37447159 1.25889566 0.92086437 1.16456739 1.94983687 1.21238246 1.45815319 0.64416631 1.52191232 0.37504033]
 [1.60329181 1.6924791  0.78150887 1.19884191 1.68395865 0.49895239 1.58514137 0.82839608 1.0428977  0.6452514 ]
 [1.1845456  1.41873773 0.03633589 1.47284978 1.26319078 0.75957434 1.23280529 0.78086853 1.4505553  0.95572673]
 [1.44070845 1.22894577 0.50898966 1.47187325 1.43506642 0.58506235 1.55079345 0.67846864 1.37063464 0.68145632]]


In [None]:
def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2    # rank-2 NumPy tensor
    assert len(y.shape) == 1    # NumPy vector
    assert x.shape[1] == y.shape[0]
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += [y[j]]
    return x

In [29]:
x = np.random.random((64, 3, 32, 10)) 
y = np.random.random((32, 10))

#### What `x` and `y` represent here

- `x = np.random.random((64, 3, 32, 10))`  
  This is a 4D tensor. You can read the shape as:  
  - 64: number of samples in the batch  
  - 3: channels per sample (e.g., RGB or feature maps)  
  - 32: height (number of rows)  
  - 10: width (number of columns)  

- `y = np.random.random((32, 10))`  
  This is a 2D tensor with the same height and width as each channel of `x` (`32 × 10`). It can be thought of as a single “feature map” that could be broadcast or added to each channel/location in `x`.

In [34]:
z = np.maximum(x, y)

In [36]:
z.shape

(64, 3, 32, 10)