<a href="https://colab.research.google.com/github/cadyngo/EAS-Math-for-AI/blob/main/MatrixMultiplication.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Matrix Multiplication and Factorization Methods

In this notebook, we explore different ways to multiply matrices. We begin with the standard inner-product, or row by column. After this, we move on to column by column and row by row (linear combination of columns and rows, resplectively). Our final multiplication method covers outer-product decomposition (i.e. representing the product as a sum of rank‑1 matrices). Wrapping up this section, we extend these ideas to the multiplication of 3D tensors (batched matrix multiplications).

These exercises are inspired by linear algebra concepts often used in machine learning. For example, we simulate the multiplication of a data matrix (features) by a weight matrix (model coefficients) to produce predictions. This notebook also includes solutions for the challenge exercises.

## Setup

We will use NumPy for our matrix and tensor computations. Run the cell below to import the necessary library.

In [11]:
import numpy as np

print('NumPy version:', np.__version__)

NumPy version: 1.26.4


We’ll initialize small matrices to explore different multiplication views in the context of machine learning.  
Let $X \in \mathbb{R}^{4\times 3}$ be a **data (design) matrix** with 4 samples and 3 features per sample, and let $W \in \mathbb{R}^{3\times 1}$ be a **weight matrix** (one weight per feature) for a **single-output** linear model. The forward pass prediction is
$$
\hat{\mathbf y} \;=\; XW \;\in\; \mathbb{R}^{4\times 1},
$$
i.e., each prediction is a dot product between a sample’s feature row and the weight vector.


In [12]:
# ML-oriented test example
# Data matrix: 4 samples, 3 features (e.g., features of houses, patients, etc.)
X = np.array([
    [2.5, 3.0, 3.5],
    [3.0, 3.5, 4.0],
    [3.5, 4.0, 4.5],
    [4.0, 4.5, 5.0]
])

# Weight matrix: 3 features, 1 output (e.g., regression coefficients)
W = np.array([
    [0.5],
    [1.0],
    [1.5]
])

## Exercise 1: Standard Matrix Multiplication (Dot Product Approach)

This is the standard, most commonly taught approach when first learning about the fundamentals of linear algebra. In this approach, we have the following conditions.

Let A be an $m × n$ matrix and B be an $n × p$ matrix. Then, the product of these two matrices $C = AB$ can be computed element-wise. $\forall\, i,j\in\mathbb{Z},\ 0\le i\le m,\ 0\le j\le p$, we have:

$c_{ij} =
\begin{bmatrix}
a_{i1} & a_{i2} & \dots & a_{in}
\end{bmatrix}
\begin{bmatrix}
b_{1j}\\[6pt]
b_{2j}\\[6pt]
\vdots\\[6pt]
b_{nj}
\end{bmatrix}
= \sum_{k=1}^{n} a_{ik} b_{kj}$

where $c_{ij}$ represents the element at the $i$th row and the $j$th column within the matrix $C$.

**Task:** Implement a function `matrix_multiply_inner(A, B)` that multiplies two matrices using three nested loops (i.e. without using `np.dot` or similar functions).

In [13]:
def matrix_multiply_inner(A, B):
    """
    Multiply two matrices A and B using the standard inner-product method.
    A should be of shape (m, n) and B of shape (n, p).
    Returns the product matrix of shape (m, p).
    """
    m, n = A.shape
    nB, p = B.shape
    if n != nB:
        raise ValueError('Incompatible dimensions for multiplication')
    # Initialize result matrix with zeros
    C = np.zeros((m, p))
    for i in range(m):
        for j in range(p):
            for k in range(n):
                C[i, j] += A[i, k] * B[k, j]
    return C

inner_fct = matrix_multiply_inner(X, W)
np_fct = np.dot(X, W)
np_fct1 = np.matmul(X, W)
np_fct2 = X @ W

print('Inner-product Multiplication:')
print(inner_fct)

print('\nNumpy Matrix Multiplication using three different methods: np.dot, np.matmul and @ ')
print("np.dot:\n", np_fct,"\n\n np.matmul: \n",  np_fct1, "\n\n @:\n", np_fct2)

Inner-product Multiplication:
[[ 9.5]
 [11. ]
 [12.5]
 [14. ]]

Numpy Matrix Multiplication using three different methods: np.dot, np.matmul and @ 
np.dot:
 [[ 9.5]
 [11. ]
 [12.5]
 [14. ]] 

 np.matmul: 
 [[ 9.5]
 [11. ]
 [12.5]
 [14. ]] 

 @:
 [[ 9.5]
 [11. ]
 [12.5]
 [14. ]]


## Exercise 2: Column-Wise Matrix Multiplication


**Preconditions**  
Let $A \in \mathbb{R}^{m\times n}$ and $B \in \mathbb{R}^{n\times p}$. Write the columns of $A$ as $\mathbf{a}_1,\dots,\mathbf{a}_n \in \mathbb{R}^m$ and the columns of $B$ as $\mathbf{b}_1,\dots,\mathbf{b}_p \in \mathbb{R}^n$.

For every column index $j \in \{1,\dots,p\}$, the $j$-th column of $C=AB$ is:
$\mathbf{c}_j \;=\; A\,\mathbf{b}_j \in \mathbb{R}^m.$


**Interpretation**  
Each $\mathbf{c}_j$ is a linear combination of the columns of $A$ with coefficients from the $j$th column of $B$:
$$\boxed{\;\mathbf{c}_j = b_{1j}\,\mathbf{a}_1 + b_{2j}\,\mathbf{a}_2 + \cdots + b_{nj}\,\mathbf{a}_n = A\,\mathbf{b}_j, \ j=1,\dots,p.\;}$$


**Block/column picture**
$
AB \;=\;
\begin{bmatrix}
\,| & | & & |\,\\[2pt]
A\mathbf{b}_1 & A\mathbf{b}_2 & \cdots & A\mathbf{b}_p\\[2pt]
\,| & | & & |\,
\end{bmatrix}
\quad\text{with}\quad
A\mathbf{b}_j
=
\begin{bmatrix}
\,| & | & & |\,\\[2pt]
\mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_n\\[2pt]
\,| & | & & |\,
\end{bmatrix}
\begin{bmatrix}
b_{1j}\\[2pt]
b_{2j}\\[2pt]
\vdots\\[2pt]
b_{nj}
\end{bmatrix}.
$

**Task:** Implement a function `matrix_multiply_cw(A, B)` that multiplies two matrices via the column-wise operation described above that should include two nested loops. **Please do not 'cheat' using powerful numpy functions or those from other libraries that would greatly simplify the process**.

In [14]:
def matrix_multiply_cw(A, B):
  """
  Multiply two matrices A and B using the column-wise method.
  A should be of shape (m, n) and B of shape (n, p).
  Returns the product matrix of shape (m, p).
  """
  # SOLUTION
  m, n = A.shape
  nB, p = B.shape
  if n != nB:
    raise ValueError('Incompatible dimensions for multiplication')

  # initialize empty soln matrix
  C = np.empty((m, p))

  # loop through cols of B
  for i in range(p):

    # get column of interest
    curr = B[:, i]

    sum = np.zeros(m)

    # loop through columns of A and calculate weighted sum
    for j in range(n):
      col_A = A[:, j]
      sum += curr[j] * col_A

    # add final sum to ith column of C - initialized as empty m x p matrix
    C[:, i] = sum

  return C

# verify functionality
cw_fct = matrix_multiply_cw(X, W)

print("Column-Wise Multiplication:")
print(cw_fct)
print("\nNumpy Matrix Multiplication:")
print(np_fct)

Column-Wise Multiplication:
[[ 9.5]
 [11. ]
 [12.5]
 [14. ]]

Numpy Matrix Multiplication:
[[ 9.5]
 [11. ]
 [12.5]
 [14. ]]


## Exercise 3: Row-Wise Matrix Multiplication

**Preconditions**  
Let $A \in \mathbb{R}^{m\times n}$ and $B \in \mathbb{R}^{n\times p}$. Write the **rows** of $A$ as $\mathbf{a}_1^\top,\dots,\mathbf{a}_m^\top \in \mathbb{R}^{1\times n}$, the **rows** of $B$ as $\mathbf{b}_1^\top,\dots,\mathbf{b}_n^\top \in \mathbb{R}^{1\times p}$, and the **rows** of $C$ as $\mathbf{c}_1^\top,\dots,\mathbf{c}_m^\top \in \mathbb{R}^{1\times p}$.

For every row index $i \in \{1,\dots,m\}$, the $i$-th row of $C=AB$ is obtained by multiplying the $i$-th row of $A$ by $B$:
$$
\mathbf{c}_i^\top \;=\; \mathbf{a}_i^\top\, B \;\in\; \mathbb{R}^{1\times p}.
$$



**Interpretation**  
Each $\mathbf{c}_i^\top$ is a **linear combination of the rows of $B$**, with coefficients from the entries of the $i$-th row of $A$:
$$
\boxed{\;\mathbf{c}_i^\top
= a_{i1}\,\mathbf{b}_1^\top \;+\; a_{i2}\,\mathbf{b}_2^\top \;+\; \cdots \;+\; a_{in}\,\mathbf{b}_n^\top
= \mathbf{a}_i^\top B,\qquad i=1,\dots,m.\;}
$$

**Block/row picture.**
$$
AB \;=\;
\begin{bmatrix}
\text{--- } \mathbf{a}_1^\top \text{ ---}\\[2pt]
\text{--- } \mathbf{a}_2^\top \text{ ---}\\
\vdots\\
\text{--- } \mathbf{a}_m^\top \text{ ---}
\end{bmatrix}
B
\;=\;
\begin{bmatrix}
\mathbf{a}_1^\top B\\[2pt]
\mathbf{a}_2^\top B\\
\vdots\\
\mathbf{a}_m^\top B
\end{bmatrix}.
$$

**Task:** Implement a function `matrix_multiply_rw(A, B)` that multiplies two matrices via the row-wise operation described above that should include two nested loops. **Please do not 'cheat' using powerful numpy functions or those from other libraries that would greatly simplify the process**.

In [15]:
def matrix_multiply_rw(A, B):
  """
  Multiply two matrices A and B using the row-wise method.
  A should be of shape (m, n) and B of shape (n, p).
  Returns the product matrix of shape (m, p).
  """
  # SOLUTION
  m, n = A.shape
  nB, p = B.shape
  if n != nB:
    raise ValueError('Incompatible dimensions for multiplication')

  # initialize empty soln matrix
  C = np.empty((m, p))

  # loop through rows of A
  for i in range(m):

    # get row of interest
    curr = A[i, :]

    sum = np.zeros(p)

    # loop through rows of B and calculate weighted sum
    for j in range(n):
      row_B = B[j, :]
      sum += curr[j] * row_B

    # add final sum to ith column of C - initialized as empty m x p matrix
    C[i, :] = sum

  return C

# verify functionality
rw_fct = matrix_multiply_rw(X, W)

print("Row-Wise Multiplication:")
print(rw_fct)
print("\nNumpy Matrix Multiplication:")
print(np_fct)

Row-Wise Multiplication:
[[ 9.5]
 [11. ]
 [12.5]
 [14. ]]

Numpy Matrix Multiplication:
[[ 9.5]
 [11. ]
 [12.5]
 [14. ]]


## Exercise 4: Outer-Product Matrix Multiplication


**Preconditions**  
Let $A \in \mathbb{R}^{m\times n}$ and $B \in \mathbb{R}^{n\times p}$.  
Write the **columns** of $A$ as $\mathbf{a}_1,\dots,\mathbf{a}_n \in \mathbb{R}^{m}$ and the **rows** of $B$ as $\mathbf{b}_1^\top,\dots,\mathbf{b}_n^\top \in \mathbb{R}^{1\times p}$.  
Let $C=AB \in \mathbb{R}^{m\times p}$.

The product $C=AB$ can be formed as a **sum of rank-1 outer products**:
$$
C \;=\; \sum_{k=1}^{n} \mathbf{a}_k\,\mathbf{b}_k^\top \;\in\; \mathbb{R}^{m\times p}.
$$

**Interpretation**  
Each term $\mathbf{a}_k\,\mathbf{b}_k^\top$ is an $m\times p$ matrix obtained by multiplying the $k$-th **column** of $A$ by the $k$-th **row** of $B$. Adding these $n$ rank-1 matrices reconstructs $C$:
$$
\boxed{\;C
= \mathbf{a}_1\,\mathbf{b}_1^\top \;+\; \mathbf{a}_2\,\mathbf{b}_2^\top \;+\; \cdots \;+\; \mathbf{a}_n\,\mathbf{b}_n^\top\;}
$$
Element-wise, this matches the standard formula since for every $i\in\{1,\dots,m\}$ and $j\in\{1,\dots,p\}$,
$$
c_{ij} \;=\; \sum_{k=1}^{n} a_{ik}\,b_{kj}.
$$

**Block/outer picture.**
$$
AB \;=\;
\underbrace{\begin{bmatrix} | & & | \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n \\ | & & | \end{bmatrix}}_{A}
\underbrace{\begin{bmatrix} \text{--- }\mathbf{b}_1^\top\text{ ---} \\ \vdots \\ \text{--- }\mathbf{b}_n^\top\text{ ---} \end{bmatrix}}_{B}
\;=\;
\sum_{k=1}^{n}
\underbrace{\begin{bmatrix} | \\ \mathbf{a}_k \\ | \end{bmatrix}}_{\in\,\mathbb{R}^{m\times 1}}
\underbrace{\begin{bmatrix} \text{--- }\mathbf{b}_k^\top\text{ ---} \end{bmatrix}}_{\in\,\mathbb{R}^{1\times p}}
\;=\;
\sum_{k=1}^{n} \mathbf{a}_k\,\mathbf{b}_k^\top.
$$

**Task:** Implement a function `matrix_multiply_outer(A, B)` that multiplies two matrices via the **outer-product** method described above. **Please do not 'cheat' using powerful numpy functions or those from other libraries that would greatly simplify the process**.

In [16]:
def matrix_multiply_outer(A, B):
    """
    Multiply two matrices A and B using an outer-product approach.
    A is of shape (m, n) and B is of shape (n, p).
    Returns the product matrix of shape (m, p).
    """
    # SOLUTION
    m, n = A.shape
    nB, p = B.shape
    if n != nB:
        raise ValueError('Incompatible dimensions for multiplication')

    # Initialize the result matrix with zeros
    C = np.zeros((m, p))

    for k in range(n):

        # Extract the k-th column of A as a column vector
        a_k = A[:, k].reshape(m, 1)

        # Extract the k-th row of B as a row vector
        b_k = B[k, :].reshape(1, p)

        # Outer product of a_k and b_k adds a rank-1 matrix
        C += a_k @ b_k

    return C

# Test functionality
outer_fct = matrix_multiply_outer(X, W)

print('Predictions from outer-product method:')
print(outer_fct)

print('\nNumpy Matrix Multiplication:')
print(np_fct)

Predictions from outer-product method:
[[ 9.5]
 [11. ]
 [12.5]
 [14. ]]

Numpy Matrix Multiplication:
[[ 9.5]
 [11. ]
 [12.5]
 [14. ]]


## Exercise 5: Extending Multiplication to 3D Tensors (Batched Matrix Multiplication)

In many ML applications (e.g., deep learning), we deal with batched operations. Suppose you have a batch of data matrices and a corresponding batch of weight matrices. For instance, let tensor `A` be of shape `(batch, m, n)` and tensor `B` be of shape `(batch, n, p)`. The batched product is a tensor of shape `(batch, m, p)`, where each slice along the batch dimension (i.e. each index) is the product of the corresponding matrices in A and B.

**Task:** Implement a function `tensor_multiply(A, B)` that performs batched matrix multiplication using either `np.matmul` or `np.einsum`.

*ML-inspired Example:* Imagine processing 4 mini-batches. For each batch, we transform a 4×3 data matrix with a 3×2 weight matrix, yielding a 4×2 output (e.g., feature transformation in a neural network).

In [17]:
def tensor_multiply(A, B):
    """
    Multiply two 3D tensors A and B in a batched manner.
    A should have shape (batch, m, n) and B should have shape (batch, n, p).
    Returns a tensor of shape (batch, m, p).
    """
    # Using np.matmul which supports batched multiplication
    return np.matmul(A, B)

# ML-inspired batched example
batch = 4
m, n, p = 4, 3, 2

# Use same random seed for reproducibility
np.random.seed(0)

# Initialize tensors
A_tensor = np.random.rand(batch, m, n)
B_tensor = np.random.rand(batch, n, p)
C_tensor = tensor_multiply(A_tensor, B_tensor)

# Compare one batch result with np.dot
i = 2

C_slice_np = np.dot(A_tensor[i], B_tensor[i])

print('\nDifference between batched result and np.dot for batch 2 (should be near zero):')
print(np.abs(C_tensor[i] - C_slice_np))


Difference between batched result and np.dot for batch 2 (should be near zero):
[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


## Challenge Exercises (Solutions Included)


Modify the outer product multiplication function so that it can handle batched matrices (i.e. 3D tensors), where for each batch you sum the outer products over the inner dimension.

In [18]:
def matrix_multiply_outer_batched(A, B):
    """
    Multiply two 3D tensors A and B in a batched manner using an outer-product approach.
    A has shape (batch, m, n) and B has shape (batch, n, p).
    Returns a tensor of shape (batch, m, p).
    """
    # SOLUTION

    batch, m, n = A.shape
    batch_B, nB, p = B.shape
    if batch != batch_B or n != nB:
        raise ValueError('Incompatible dimensions for batched multiplication')

    # Initialize the result tensor
    C = np.zeros((batch, m, p))
    for b in range(batch):
        for k in range(n):
            a_k = A[b, :, k].reshape(m, 1)   # (m, 1)
            b_k = B[b, k, :].reshape(1, p)   # (1, p)
            C[b] += a_k @ b_k
    return C

# Test the batched outer product multiplication using the same A_tensor and B_tensor
C_tensor_outer = matrix_multiply_outer_batched(A_tensor, B_tensor)

print('\nDifference with np.matmul (should be near zero):')
print(np.abs(C_tensor_outer - np.matmul(A_tensor, B_tensor)))


Difference with np.matmul (should be near zero):
[[[2.22044605e-16 0.00000000e+00]
  [2.22044605e-16 0.00000000e+00]
  [0.00000000e+00 1.11022302e-16]
  [0.00000000e+00 0.00000000e+00]]

 [[0.00000000e+00 5.55111512e-17]
  [0.00000000e+00 0.00000000e+00]
  [0.00000000e+00 0.00000000e+00]
  [1.11022302e-16 0.00000000e+00]]

 [[5.55111512e-17 0.00000000e+00]
  [0.00000000e+00 5.55111512e-17]
  [0.00000000e+00 0.00000000e+00]
  [2.77555756e-17 0.00000000e+00]]

 [[2.22044605e-16 0.00000000e+00]
  [0.00000000e+00 0.00000000e+00]
  [0.00000000e+00 0.00000000e+00]
  [0.00000000e+00 0.00000000e+00]]]
