# Matrix Multiplication with a Vector

Multiplying a matrix $A$ with a vector $x$ proceeds as follows:
\begin{align}
    A &\in \mathbb{R}^{m \times n} \\
    x &\in \mathbb{R}^{n} \\
    Ax &=
    \begin{bmatrix}
        A_{1, 1} & A_{1, 2} & A_{1, 3} & \ldots & A_{1, n} \\
        A_{2, 1} & A_{2, 2} & A_{2, 3} & \ldots & A_{2, n} \\
        A_{3, 1} & A_{3, 2} & A_{3, 3} & \ldots & A_{3, n} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        A_{m, 1} & A_{m, 2} & A_{m, 3} & \ldots & A_{m, n}
    \end{bmatrix}
    \begin{bmatrix}
        x_{1} \\
        x_{2} \\
        x_{3} \\
        \vdots \\
        x_{n}
    \end{bmatrix} =
    \begin{bmatrix}
        A_{1,:} \cdot x \\
        A_{2,:} \cdot x \\
        A_{3,:} \cdot x \\
        \vdots \\
        A_{m,:} \cdot x \\
    \end{bmatrix} =
    \begin{bmatrix}
        A_{1, 1}x_{1} + A_{1, 2}x_{2} + A_{1, 3}x_{3} + \ldots + A_{1, n}x_{n} \\
        A_{2, 1}x_{1} + A_{2, 2}x_{2} + A_{2, 3}x_{3} + \ldots + A_{2, n}x_{n} \\
        A_{3, 1}x_{1} + A_{3, 2}x_{2} + A_{3, 3}x_{3} + \ldots + A_{3, n}x_{n} \\
        \vdots \\
        A_{m, 1}x_{1} + A_{m, 2}x_{2} + A_{m, 3}x_{3} + \ldots + A_{m, n}x_{n}
    \end{bmatrix}
\end{align}

Multiplying a matrix $A$ and a vector $x$ can be seen as a linear combination of the columns of $A$ using the elements of $x$ as coefficients.
\begin{align}
    Ax &=
    \begin{bmatrix}
        A_{1, 1} \\
        A_{2, 1} \\
        A_{3, 1} \\
        \vdots \\
        A_{m, 1}
    \end{bmatrix}x_{1} +
    \begin{bmatrix}
        A_{1, 2} \\
        A_{2, 2} \\
        A_{3, 2} \\
        \vdots \\
        A_{m, 2}
    \end{bmatrix}x_{2} +
    \begin{bmatrix}
        A_{1, 3} \\
        A_{2, 3} \\
        A_{3, 3} \\
        \vdots \\
        A_{m, 3}
    \end{bmatrix}x_{3} + \ldots +
    \begin{bmatrix}
        A_{1, n} \\
        A_{2, n} \\
        A_{3, n} \\
        \vdots \\
        A_{m, n}
    \end{bmatrix}x_{n}
\end{align}

# Matrix Multiplication

Matrix multiplication $AB=C$ places certain requirements on the operand and resulting product matrices.

If $A$ has $i$ rows and $j$ columns, $B$ must have $j$ rows, but can have $k$ columns. $C$ will then have $j$ rows and $k$ columns. Matrix dimensions are expressed mathematically as:
\begin{align}
    A &\in \mathbb{R} ^ {i \times j} \\
    B &\in \mathbb{R} ^ {j \times k} \\
    C &\in \mathbb{R} ^ {i \times k} \\
    AB &= C \\
    \begin{bmatrix}
        A_{1, 1} & A_{1, 2} & A_{1, 3} & \ldots & A_{1, j} \\
        A_{2, 1} & A_{2, 2} & A_{2, 3} & \ldots & A_{2, j} \\
        A_{3, 1} & A_{3, 2} & A_{3, 3} & \ldots & A_{3, j} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        A_{i, 1} & A_{i, 2} & A_{i, 3} & \ldots & A_{i, j}
    \end{bmatrix}
    \begin{bmatrix}
        B_{1, 1} & B_{1, 2} & B_{1, 3} & \ldots & B_{1, k} \\
        B_{2, 1} & B_{2, 2} & B_{2, 3} & \ldots & B_{2, k} \\
        B_{3, 1} & B_{3, 2} & B_{3, 3} & \ldots & B_{3, k} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        B_{j, 1} & B_{j, 2} & B_{j, 3} & \ldots & B_{j, k} \\
    \end{bmatrix}
    &=
    \begin{bmatrix}
        C_{1, 1} & C_{1, 2} & C_{1, 3} & \ldots & C_{1, k} \\
        C_{2, 1} & C_{2, 2} & C_{2, 3} & \ldots & C_{2, k} \\
        C_{3, 1} & C_{3, 2} & C_{3, 3} & \ldots & C_{3, k} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        C_{i, 1} & C_{i, 2} & C_{i, 3} & \ldots & C_{i, k} \\
    \end{bmatrix}
\end{align}

Matrix multiplication is executed by multiplying a row of $A$ with a column of $B$, producing an element of $C$. The element of C in row $x$ and column $y$ is computed by multiplying row $x$ of $A$ and $y$ of $B$. Mathematically,
\begin{align}
    C_{x, y} &= A_{x,:} \cdot B_{:,y} \\
    \begin{bmatrix}
        C_{1, 1} & C_{1, 2} & C_{1, 3} & \ldots & C_{1, k} \\
        C_{2, 1} & C_{2, 2} & C_{2, 3} & \ldots & C_{2, k} \\
        C_{3, 1} & \cellcolor{red!20} C_{3, 2} & C_{3, 3} & \ldots & C_{3, k} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        C_{i, 1} & C_{i, 2} & C_{i, 3} & \ldots & C_{i, k} \\
    \end{bmatrix}
    &=
    \begin{bmatrix}
        A_{1, 1} & A_{1, 2} & A_{1, 3} & \ldots & A_{1, j} \\
        A_{2, 1} & A_{2, 2} & A_{2, 3} & \ldots & A_{2, j} \\
        A_{3, 1} & A_{3, 2} & A_{3, 3} & \ldots & A_{3, j} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        A_{i, 1} & A_{i, 2} & A_{i, 3} & \ldots & A_{i, j}
    \end{bmatrix}
    \begin{bmatrix}
        B_{1, 1} & B_{1, 2} & B_{1, 3} & \ldots & B_{1, k} \\
        B_{2, 1} & B_{2, 2} & B_{2, 3} & \ldots & B_{2, k} \\
        B_{3, 1} & B_{3, 2} & B_{3, 3} & \ldots & B_{3, k} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        B_{j, 1} & B_{j, 2} & B_{j, 3} & \ldots & B_{j, k} \\
    \end{bmatrix}
\end{align}

# Hadamard Product

The Hadamard product of two matrices is simply the element-wise product of each element of the two matrices.

\begin{align}
    A \odot B &= C \\
    A_{x, y} \times B_{x, y} &= C_{x, y}
\end{align}

# Span of a Matrix

The span of a matrix A is the set of all linear combinations of $A$'s columns with arbitrary real coefficients $c_{i}$.
\begin{align}
    \mathrm{span} \left(A\right) &= \{ c_{1}A_{:,1} + c_{2}A_{:,2} + c_{3}A_{:,3} + \ldots + c_{n}A_{:,n} \}
\end{align}

# Systems of Linear Equations
A system of linear equations can be written as follows:
\begin{align}
    A &\in \mathbb{R} ^ {m \times n} \\
    x, b &\in \mathbb{R} ^ {m} \\
    Ax &= b \\
    \begin{bmatrix}
        A_{1, 1} & A_{1, 2} & A_{1, 3} & \ldots & A_{1, j} \\
        A_{2, 1} & A_{2, 2} & A_{2, 3} & \ldots & A_{2, j} \\
        A_{3, 1} & A_{3, 2} & A_{3, 3} & \ldots & A_{3, j} \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        A_{i, 1} & A_{i, 2} & A_{i, 3} & \ldots & A_{i, j}
    \end{bmatrix}
    \begin{bmatrix}
        x_{1} \\
        x_{2} \\
        x_{3} \\
        \vdots \\
        x_{n}
    \end{bmatrix} &=
    \begin{bmatrix}
        b_{1} \\
        b_{2} \\
        b_{3} \\
        \vdots \\
        b_{n}
    \end{bmatrix}
\end{align}

Expanding the matrix equation yields the following system of $m$ linear equations in $n$ unknowns:

\begin{align}
    A_{1,1}x_{1} + A_{1,2}x_{2} + A_{1,3}x_{3} + \ldots + A_{1,n}x_{n} &= b_{1} \\
    A_{2,1}x_{1} + A_{2,2}x_{2} + A_{2,3}x_{3} + \ldots + A_{2,n}x_{n} &= b_{2} \\
    A_{3,1}x_{2} + A_{3,2}x_{2} + A_{3,3}x_{3} + \ldots + A_{3,n}x_{n} &= b_{3}
\end{align}

The system can be solved by inverting $A$:

\begin{align}
    A^{-1}Ax &= A^{-1}b \\
    I_{n}x &= A^{-1}b \\
    x &= A^{-1}b
\end{align}

For $A^{-1}$ to exist, $A$ must be square and its columns must be linearly independent. These conditions guarantee exactly 1 solution to the system of linear equations.

The system can be **overspecified** when $m > n$, that is $A$ has more rows that columns. In this case, the system has more equations than unknowns.

An **underspecified** system is when $A$ has more columns than rows; $m < n$. In this case, the system has more unknowns than equations.

# Diagonal Matrices

If a matrix $A$ is diagonal, all its elements $A_{i,j}$ where $i \ne j$ are 0.

\begin{align}
    v &\in \mathbb{R}^{n} \\
    \mathrm{diag} \left( v \right) &=
    \begin{bmatrix}
        v_{1} & 0 & 0 & \ldots & 0 \\
        0 & v_{2} & 0 & \ldots & 0 \\
        0 & 0 & v_{3} & \ldots & 0 \\
        \vdots & \vdots & \vdots & \vdots & \vdots \\
        0 & 0 & 0 & \ldots & v_{n}
    \end{bmatrix}
\end{align}

Multiplying a diagonal matrix by a vector is simply the Hadamard product of the diagonal and the vector.
$$
    \mathrm{diag}\left( v \right)x = v \odot x
$$

Multiplying a diagonal matrix by another matrix is the Hadamard product of the diagonal and the columns of the matrix. It is also equal to the Hadamard product of the diagonal and the rows of the matrix.
\begin{align}
    v &\in \mathbb{R}^n \\
    A &\in \mathbb{R}^{n \times n} \\
    \mathrm{diag}\left( v \right)A &= 
    \begin{bmatrix}
        v \odot A_{:,1} & v \odot A_{:,2} & v \odot A_{:,3} & \ldots v \odot A{:,n}
    \end{bmatrix} =
    \begin{bmatrix}
        v \odot A_{1,:} \\
        v \odot A_{3,:} \\
        v \odot A_{2,:} \\
        \vdots \\
        v \odot A_{n,:}
    \end{bmatrix}
\end{align}

The inverse of a diagonal matrix $\mathrm{diag}\left(v\right)$ is also easy to compute. It is a diagonal matrix whose diagonal elements are the inverses of the diagonal.
$$
    \mathrm{diag}\left( v \right) = \mathrm{diag}\left( \left[ \frac{1}{v_{1}}, \frac{1}{v_{2}}, \frac{1}{v_{3}}, \ldots, \frac{1}{v_{n}} \right] ^{T} \right)
$$

A common pattern in machine learning is to define algorithms in terms of arbitrary matrices and constrain some of them to be diagonal matrices to reduce computational cost.

# Symmetric Matrices

A symmetric matrix $A$ is equal to its own transpose.
$$
    A = A^{T}
$$

# Orthogonal Matrix

The columns of an orthogonal matrix $A$ are orthonormal. Its rows are also all orthonormal vectors.

The inverse of an orthogonal matrix A is easy to compute. It is simply the transpose.

$$
    A^{-1} = A^{T}
$$

This implies
$$
    A^{T}A = AA^{T} = I  
$$

# Matrix Multiplication as Linear Transformations

Multiplying a vector by a matrix can be seen as a transformation: a vector is transformed into another vector by multiplication. Take for example the following:
\begin{align}
    x &=
        \begin{bmatrix}
            3 \\
            2
        \end{bmatrix}\\
    A &=
        \begin{bmatrix}
            3 & 2 \\
            1 & 0
        \end{bmatrix} \\
    Ax &=
        \begin{bmatrix}
            3 & 2 \\
            1 & 0
        \end{bmatrix}
        \begin{bmatrix}
            3 \\
            2
        \end{bmatrix} = 
        3 \begin{bmatrix}
            3 \\
            1
        \end{bmatrix} + 
        2 \begin{bmatrix}
            2 \\
            0
        \end{bmatrix} =
        \begin{bmatrix}
            13 \\
            3
        \end{bmatrix}
\end{align}

After multiplication by $A$, $\left[ 3, 2 \right]^{T}$ is transformed into $\left[ 13, 3 \right]^{T}$.

To get an intuition for the transformation a matrix $A$ effects on a vector, observe the transformation on unit directional vectors. 
\begin{align}
    \hat{i} &= 
        \begin{bmatrix}
            1 \\
            0
        \end{bmatrix} \\
    \hat{j} &= 
        \begin{bmatrix}
            0 \\
            1
        \end{bmatrix} \\
    \hat{i}^{\prime} &= A \hat{i} =
        \begin{bmatrix}
            3 & 2 \\
            1 & 0
        \end{bmatrix}
        \begin{bmatrix}
            1 \\
            0
        \end{bmatrix} =
        \begin{bmatrix}
            3 \\
            1
        \end{bmatrix} \\
    \hat{j}^{\prime} &= A \hat{j} =
        \begin{bmatrix}
            3 & 2 \\
            1 & 0
        \end{bmatrix}
        \begin{bmatrix}
            0 \\
            1
        \end{bmatrix} =
        \begin{bmatrix}
            2 \\
            0
        \end{bmatrix}
\end{align}

Linear transformations can stretch or shrink space. Observe what happens to the square formed by the basis vectors $\hat{i}$ and $\hat{j}$ when each is multiplied by a matrix $A$. Similarly, if any of the columns of $A$ are linearly dependent, then $A$ shrinks space down to a lower dimension. 

\begin{align}
    A &=
        \begin{bmatrix}
            1 & 2 \\
            1 & 2
        \end{bmatrix} \\
    A \hat{i} &=
        \begin{bmatrix}
            1 & 2 \\
            1 & 2
        \end{bmatrix}
        \begin{bmatrix}
            1 \\
            0
        \end{bmatrix} =
        \begin{bmatrix}
            1 \\
            1
        \end{bmatrix} \\
    A \hat{j} &=
        \begin{bmatrix}
            1 & 2 \\
            1 & 2
        \end{bmatrix}
        \begin{bmatrix}
            0 \\
            1
        \end{bmatrix} =
        \begin{bmatrix}
            2 \\
            2
        \end{bmatrix}
\end{align}

However, the vectors $\left[1, 1\right]^{T}$ and $\left[2, 2\right]^{T}$ are linearly dependent and so, they have the same direction. Computing the unit directional vector:

\begin{align}
    \left\lVert
        \begin{bmatrix}
            1 \\
            1
        \end{bmatrix}
    \right\rVert &= \sqrt{1^{2} + 1^{2}} = \sqrt{2} \\
    \frac{
        \begin{bmatrix}
                1 \\
                1
        \end{bmatrix}}{\sqrt{2}
    } &=
    \begin{bmatrix}
            \frac{1}{\sqrt{2}} \\
            \frac{1}{\sqrt{2}}
    \end{bmatrix} \\
\end{align}

Similarly,

\begin{align}
    \left\lVert
        \begin{bmatrix}
            2 \\
            2
        \end{bmatrix}
    \right\rVert &= \sqrt{2^{2} + 2^{2}} = \sqrt{8} = 2\sqrt{2} \\
    \frac{
        \begin{bmatrix}
                2 \\
                2
        \end{bmatrix}}{\sqrt{2}
    } &=
    \begin{bmatrix}
            \frac{2}{2\sqrt{2}} \\
            \frac{2}{2\sqrt{2}}
    \end{bmatrix} =
    \begin{bmatrix}
            \frac{1}{\sqrt{2}} \\
            \frac{1}{\sqrt{2}}
    \end{bmatrix}
\end{align}

We see that the unit directional vector of the transformed $\hat{i}$ and $\hat{j}$ are the same. Both vectors have been reduced to a single direction thereby reducing space from a 2D space to a 1D line.

Watch this Youtube video. https://www.youtube.com/watch?v=kYB8IZa5AuE

# Matrix Multiplication as Multiple Transforms

If multiplying a vector $x$ with by a matrix $A$ is transforming the vector, then multiplying the matrix $A$ with a matrix $B$ is composing transformations. Take for example the vector $x$ being transformed by the matrix $A$:

\begin{align}
    Ax &= b
\end{align}

If we then multiply $b$ by a matrix $B$, we are transforming $b$ into $c$.

\begin{align}
    Bb &= BAx = Cx = c
\end{align}

The product of matrix-by-matrix multiplication can be assigned to a single matrix $C$, which can be viewed as a single transform. In effect, multiply $A$ by $B$ is composing two transformations into a single transformation $C$.

# Determinant

<ul>
    <li>A measure of by how much vectors and space is stretched or shrunk.</li>
    <li>The sign of the determinant indicates the orientation by which space and vectors are stretched or shrunk</li>
</ul>

In the system of equations $Ax = b$, if $\mathrm{det}\left(A\right) \ne 0$, then the system has exactly 1 solution. Otherwise, it can have no solutions or an infinite number of solutions.

A matrix whose determinant is 0 is called a **singular** matrix.

The determinant is also equal to the product of all the eigenvalues of a matrix.

Watch this video https://www.youtube.com/watch?v=Ip3X9LOh2dk&t=195s

# Decompositions

Matrix decompositions decompose a matrix into a few other matrices. It is useful for examining the functional properties of a matrix. See [Eigendecomposition](./eigendecomposition.ipynb).

# Useful Definitions and Properties

\begin{align}
    A(B+C) &= AB+AC \\
    A(BC) &= (AB)C \\
    AB &\ne BA \\
    x^{T}y &= y^{T}x \\
    (AB)^{T} &= B^{T}A^{T} \\
    I &= \mathrm{diag}([1_{1}, 1_{2}, \ldots, 1_{n}]), I \in \mathbb{R} ^ {n \times n} \\
    IA &= A \\
    A^{-1}A &= I
\end{align}

# Other Notes
<ul>
    <li>Computing the inverse $A^{-1}$ of a matrix $A$ directly can result in its elements being represented by limited precision. There are other ways to achieve what the inverse is trying to do.</li>
    <li>The system of linear equations $Ax = b$, $b$ can be thought of as a linear combination of the columns of $A$ and the elements of $x$.</li>
    <li>$\mathrm{span}(A)$ is the set of all vectors that are a linear combination of the columns of A.</li>
    <li>The $L^{p}$ norm of a vector $\left\Vert x \right\Vert_{p}$ maps the vector $x$ to a non-negative number.</li>
</ul>

# References

https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

https://en.wikipedia.org/wiki/Unit_vector

https://www.deeplearningbook.org/contents/linear_algebra.html