## Memory usage optimization

Generally, NumPy implements its operations elementwise and require array shapes to be compatible

In [None]:
import numpy as np

# Define two arrays
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[10, 20, 30],
              [40, 50, 60]])

# Perform elementwise addition
C = A + B

# Perform elementwise multiplication
D = A * B

print("Array A:\n", A)
print("Array B:\n", B)
print("Elementwise Addition (A + B):\n", C)
print("Elementwise Multiplication (A * B):\n", D)

However, in practice under certain conditions, it is possible to do operations on arrays of different shapes. NumPy expands the arrays such that the operation becomes viable

In [None]:
import numpy as np
a = np.array([1, 2, 3])
b = 4
a + b

![img](../assets/broadcast.svg)

Broadcasting Rules

- Dimensions match when they are equal, or when either is 1 or None.

- In the latter case, the dimension of the output array is expanded to the larger of the two.

- Broadcasted arrays are never physically constructed, which saves memory.

Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations.

### Cache effects

Memory access is cheaper when it is grouped: accessing a big array in a continuous way is much faster than random access. 

This implies amongst other things that **smaller** strides are faster:

In [None]:
c = np.random.random((1, 10000000))

%timeit c.sum(axis=0)
# 1 loops, best of 3: 3.89 s per loop

%timeit c.sum(axis=1)
# 1 loops, best of 3: 188 ms per loop

c.strides

### Temporary arrays

- In complex expressions, NumPy stores intermediate values in temporary arrays

- Memory consumption can be higher than expected


In [None]:
a = np.random.random((10240, 1024, 50))
b = np.random.random((10240, 1024, 50))


%timeit c = (2.0 * a - 4.5 * b) + (np.sin(a) + np.cos(b))

# four temporary arrays will be created, and from which two are due to unnecessary parenthesis

In [None]:
%%timeit

c = 2.0 * a
c = c - 4.5 * b
c = c + np.sin(a)
c = c + np.cos(b)

Broadcasting approaches can lead also to hidden temporary arrays

- Input data M x 3 array

- Output data M x M array

- There is a temporary M x M x 3 array


In [None]:
M = 10000
X = np.random.random((M, 3))
D = np.sqrt(((X[:, np.newaxis, :] - X) ** 2).sum(axis=-1))