$$
\newcommand{theorem}{\textbf{Theorem: }}
\newcommand{proof}{\textbf{Proof: }}
\newcommand{lemma}{\textbf{Lemma: }}
\newcommand{corollary}{\textbf{Corollary: }}
\newcommand{prop}{\textbf{Proposition: }}
$$

$$
\newcommand{arr}{\mathbf}
\newcommand{inv}{^{-1}}
\newcommand\mat[1]{\begin{pmatrix}#1\end{pmatrix}} 
\newcommand\det[1]{\left| #1\right|} 
$$

In [1]:
import sys
sys.path.append('..')

In [2]:
import numpy as np
from module.elimination import gauss_jordan_elim, gaussian_elim
from common.utility import show_implementation

np.set_printoptions(edgeitems=10, linewidth=180)

# Matrix

A **matrix** is simply a rectangular array of numbers

The **size** of the matrix is given by $m \times n$, where $m$ is the number of rows and $n$ is the number of columns.

The $(i, j)$-entry of the matrix is the number at the $i$-th row and $j$-th column, represented as $a_{ij}$.

Matrices are denoted by uppercase bold letters.

## Special matrices

### Row matrix

A $1 \times n$ matrix is a row matrix.

### Column matrix

Similarly, a $m \times 1$ matrix is a column matrix.

### Square matrix

A $n \times n$ matrix is a square matrix, where it has same number of rows and columns.
We call $n$ the order of a square matrix.

$a_{ii}$ are the the **diagonal entries** of a square matrix.

### Diagonal matrix

A special type of square matrix which have non-diagonal entries being $0$ is a diagonal matrix.

### Identity matrix

A special type of diagonal matrix where its diagonal entries are all 1's.

We denote it as $\mathbf I_n$, or simply $\mathbf I$ if the order is unambiguous.

### Zero matrix

A special type of matrix where all its entries are 0's.

We denote it as $\mathbf 0_n$ or simply $\mathbf 0$.

### Triangular  matrix

If a matrix has all 0's below its diagonal entries, it is **upper triangular**.
If it has all 0's above its diagonal, it is **lower triangular**.

If the diagonal entries are also 0, then it is **strictly upper/lower triangular** respectively.

### Symmetric matrix

A **symmetric matrix** has $a_{ij} = a_{ji}$ for all $i, j$.
That is, the matrix is symmetric about its diagonal.

## Matrix operations

### Equality

Two matrices are equal if and only if they have the same size and all the entries are equal.

### Scalar multiplication

For some $c \in \mathbb R$,

$$
c \mathbf{A} = \left( c a_{ij} \right)
$$

### Addition

Given two matrices $\bf {A, B}$ of **equal size**, we define:
$$
\mathbf A + \mathbf B = \left( a_{ij} + b_{ij} \right)
$$

### Subtraction

Using scalar multiplication and addition, we get matrix subtraction:

$$
\mathbf A - \mathbf B = \mathbf A + (-1)  \mathbf B = \left( a_{ij} - b_{ij} \right)
$$

#### Properties of matrix addition/multiplication

* Commutative: $\mathbf A + \mathbf B = \mathbf B + \mathbf A$
* Associative: $\mathbf A + (\mathbf B + \mathbf C) = (\mathbf A + \mathbf B) + \mathbf C$
* Additive identity: $\mathbf 0 _{m \times x} + \mathbf A = \mathbf A$
* Additive inverse: $\mathbf A + (-\mathbf A) = \mathbf 0 _{m \times n}$
* Distributive: $k(\mathbf A + \mathbf B) = k \mathbf A + k \mathbf B$
* Scalar addition: $(a + b) \mathbf A = a \mathbf A + b \mathbf A$

### Matrix multiplication

Given two matrices $\mathbf A_{m \times p}, \mathbf B _{p \times n}$, the product $\mathbf {AB}$ is the matrix with $(i, j)$-entries defined as:

$$
\sum _{k=1} ^p a_{ik}b_{kj}  = a_{i1} b_{1j} + a_{i2} b_{2j} + \dots + a_{ip}b_{pj}
$$

For example,
$$
\mathbf A = 
\begin{pmatrix}
1 & 2& 3\\
4 & 5 & 6
\end{pmatrix} \\
\mathbf B = 
\begin{pmatrix}
1 & 1 \\
2 & 3\\
-1 & -2
\end{pmatrix} \\
\mathbf {AB} =
\begin{pmatrix}
1 & 2& 3\\
4 & 5 & 6
\end{pmatrix}
\begin{pmatrix}
1 & 1 \\
2 & 3\\
-1 & -2
\end{pmatrix}
=
\begin{pmatrix}
1(1) + 2(2) + 3(-1) = 2 & 1(1) + 2(3) + 3(-2) = 1 \\
4(1) + 2(2) + 3(-1) = 8 & 4(1) + 5(3) + 6(-2) = -7\\
\end{pmatrix}
$$

Since the sizes of $\mathbf A, \mathbf B$ are $m \times p$ and $p \times n$ respectively, the size of $\mathbf {AB}$ is $m \times n$.

We say that $\mathbf A$ is **pre-multiplied** to $\mathbf B$ or equivalently, $\mathbf B$ is **post-multiplied** to $\mathbf A$.

#### Properties

* **Not commutative**: $\mathbf {AB} \neq \mathbf {BA}$ in general
* Associative: $(\mathbf {AB}) \mathbf C = \mathbf A (\mathbf{BC})$
* Distributive: $\mathbf A (\mathbf B + \mathbf C) = \mathbf {AB} + \mathbf {AC}$ and $(\mathbf A + \mathbf B) \mathbf C = \mathbf {AC} + \mathbf {BC}$
* Commutative with scalar multiplication: $c(\mathbf {AB}) = (c \mathbf A) \mathbf B = \mathbf A (c \mathbf B)$
* Identity: $\mathbf {I_m A} = \mathbf A = \mathbf {A I_n}$
    * note the different sized $\mathbf I$
* Zero: $\mathbf {0_m A} = \mathbf 0 = \mathbf {A 0_n}$ 

In [3]:
def mult(A, B):
    assert A.shape[1] == B.shape[0]

    result = np.zeros((A.shape[0], B.shape[1]))

    for i in range(A.shape[0]):
        for j in range(B.shape[1]):
            result[i][j] = np.sum(A[i] * B[:, j])

    return result


A, B = np.arange(1, 7).reshape((2, 3)), np.array([[1, 1], [2, 3], [-1, -2]])
print("A", A, sep=':\n')
print("B", B, sep=':\n')
print("AB", mult(A, B), sep=':\n')

A:
[[1 2 3]
 [4 5 6]]
B:
[[ 1  1]
 [ 2  3]
 [-1 -2]]
AB:
[[2. 1.]
 [8. 7.]]


### Matrix exponentiation

Since we have defined multiplication, we can define matrix exponentiation.

$$
\mathbf A^2 = \mathbf {AA} \\
\mathbf A^n = \mathbf{A A}^{n-1}
$$

### Transpose

Given a matrix $\mathbf A_{m \times n}$, the transpose of the matrix, denoted as $\mathbf A ^T$, is a $n \times n$ matrix where its $(i, j)$-entry is the $(j, i)$-entry of $\mathbf A$.

In order words, the rows of $\mathbf A$ are the columns of $\mathbf A^T$ and vice versa.

In [4]:
A = np.arange(9).reshape((3, 3))
print("A:", A, sep='\n')
print("A transposed:", A.T, sep='\n')

A:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
A transposed:
[[0 3 6]
 [1 4 7]
 [2 5 8]]


#### Properties

* For $c \in \mathbb R, (c \mathbf A)^T = c \mathbf A^T$
* $(\mathbf A + \mathbf B)^T = \mathbf A ^T + \mathbf B ^T$
* $(\mathbf {AB})^T = \mathbf B ^T \mathbf A^T$
    * note the change of order from $\mathbf {AB}$ to $\mathbf {BA}$.
* $(\mathbf A^T)^T = \mathbf A$
* A matrix $\mathbf A$ is symmetric if and only if $\mathbf A^T = \mathbf A$


## Block multiplication

Consider $\mathbf {AB}$.
Notice that the first row of $\mathbf {AB}$ is determined by $\mathbf{B}$ and **only the first row** of $\mathbf A$.
Indeed, this is true for all the rows, where the $i$-th row of $\mathbf {AB}$ only requires the $i$-th row of $\mathbf{A}$ to be computed.

For example,
$$
\begin{pmatrix}
1 & 2& 3\\
4 & 5 & 6
\end{pmatrix}
\begin{pmatrix}
1 & 1 \\
2 & 3\\
-1 & -2
\end{pmatrix}
=
\begin{pmatrix}
1(1) + 2(2) + 3(-1) = 2 & 1(1) + 2(3) + 3(-2) = 1 \\
4(1) + 2(2) + 3(-1) = 8 & 4(1) + 5(3) + 6(-2) = -7\\
\end{pmatrix}
$$

We don't need the $\begin{pmatrix} 4 & 5 & 6\end{pmatrix}$ row to determine the first row of $\mathbf {AB}$.
We only need the first row of $\mathbf A$ and the whole of $\mathbf B$.

Therefore, we can treat $\mathbf A$ as a stack of rows $a_i, 1 \leq i \leq m$.
Then $$
\mathbf A = 
\begin{pmatrix}
\mathbf a_1 \\
\mathbf a_2 \\
\vdots \\
\mathbf a_m \\
\end{pmatrix}\\
\mathbf {AB} = 
\begin{pmatrix}
\mathbf a_1 \mathbf B\\
\mathbf a_2 \mathbf B\\
\vdots \\
\mathbf a_m \mathbf B\\
\end{pmatrix}
$$

The same argument applies for columns too, 
thus if we treat $\mathbf B$ as a stack of columns $b_i, 1 \leq i \leq n$.
Then $$
\mathbf {B} = 
\begin{pmatrix}
\mathbf b_1 &
\mathbf b_2 &
\cdots &
\mathbf b_n 
\end{pmatrix} \\
\mathbf {AB} = 
\begin{pmatrix}
\mathbf A \mathbf b_1 & 
\mathbf A \mathbf b_2 &
\cdots  &
\mathbf A \mathbf b_n
\end{pmatrix}
$$

In general, we can treat $\mathbf A$ as a stack of rows, and $\mathbf B$ as a stack of columns, and we get:

Therefore, we can treat $\mathbf A$ as a stack of rows $a_i, 1 \leq i \leq m$.
Then $$
\mathbf A = 
\begin{pmatrix}
\mathbf a_1 \\
\mathbf a_2 \\
\vdots \\
\mathbf a_m \\
\end{pmatrix}\\
\mathbf {B} = 
\begin{pmatrix}
\mathbf b_1 &
\mathbf b_2 &
\cdots &
\mathbf b_n 
\end{pmatrix} \\
\mathbf {AB} =
\begin{pmatrix}
\mathbf a_1 \mathbf b_1  & \mathbf a_1 \mathbf b_2  & \cdots   & \mathbf a_1 \mathbf b_n \\
\mathbf a_2 \mathbf b_1  & \mathbf a_2 \mathbf b_2  & \cdots  & \mathbf a_2 \mathbf b_n \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf a_m \mathbf b_1  & \mathbf a_m \mathbf b_2  & \cdots  & \mathbf a_m \mathbf b_n \\
\end{pmatrix}
$$

Using this, we can also use "bigger blocks" as per below:

$$
\mathbf A = 
\begin{pmatrix}
\mathbf A_1\\
\mathbf A_2
\end{pmatrix}\\
\mathbf {B} = 
\begin{pmatrix}
\mathbf B_1 & \mathbf B_2
\end{pmatrix} \\
\mathbf {AB} =
\begin{pmatrix}
\mathbf A_1 \mathbf B_1 & \mathbf A_1 \mathbf B_2 \\
\mathbf A_2 \mathbf B_1 & \mathbf A_2 \mathbf B_2 \\
\end{pmatrix}
$$

## Linear system

Recall that the linear system is defined as:
$$
a_{11} x_1 + a_{12} a_2 + \dots + a_{1n}x_n = b_1\\
a_{21} x_1 + a_{22} a_2 + \dots + a_{2n}x_n = b_2\\
\dots\\
a_{m1} x_1 + a_{m2} a_2 + \dots + a_{mn}x_n = b_m\\
$$

With the knowledge of matrices, we can redefine this as a matrix equation: $\mathbf {Ax} = \mathbf b$.

$$
  \begin{pmatrix}
    a_{11} & a_{12} & \dots & a_{1n} \\
    a_{21} & a_{22} & \dots & a_{2n} \\
    \vdots & \vdots & \ddots & \vdots  \\
    a_{m1} & a_{m2} & \dots & a_{mn} \\
  \end{pmatrix}
  \begin{pmatrix}
  x_1 \\ x_2 \\ \vdots \\ x_n
  \end{pmatrix}
  =
  \begin{pmatrix}
  b_1 \\ b_2 \\ \vdots \\ b_n
  \end{pmatrix}
$$

A **homogeneous** system as $\mathbf b = \mathbf 0$, that is $\mathbf {Ax} = \mathbf 0$.

This brings some intricacy, as it always allows for the **trivial solution** of $\mathbf x = \mathbf 0$.
For solutions such that $\mathbf x \neq 0$, these are called the **nontrivial solutions**.

### Multiple linear systems

Suppose that we have $p$ linear systems rather than only 1.

Suppose that all the linear system has the same coefficient matrix $\mathbf A$.

We can represent all the equations as simply $\mathbf {AX} = \mathbf X$:

$$
  \begin{pmatrix}
    a_{11} & a_{12} & \dots & a_{1n} \\
    a_{21} & a_{22} & \dots & a_{2n} \\
    \vdots & \vdots & \ddots & \vdots  \\
    a_{m1} & a_{m2} & \dots & a_{mn} \\
  \end{pmatrix}
  \begin{pmatrix}
    x_{11} & x_{12} & \dots & x_{1p} \\
    x_{21} & x_{22} & \dots & x_{2p} \\
    \vdots & \vdots & \ddots & \vdots  \\
    x_{n1} & x_{n2} & \dots & x_{np} \\
  \end{pmatrix}
  =
  \begin{pmatrix}
    b_{11} & b_{12} & \dots & b_{1p} \\
    b_{21} & b_{22} & \dots & b_{2p} \\
    \vdots & \vdots & \ddots & \vdots  \\
    b_{m1} & b_{m2} & \dots & b_{mp} \\
  \end{pmatrix}
$$

In fact, we can treat it as augmented matrix.

$$
  \begin{pmatrix}
  \begin{matrix}
    a_{11} & a_{12} & \dots & a_{1n} \\
    a_{21} & a_{22} & \dots & a_{2n} \\
    \vdots & \vdots & \ddots & \vdots  \\
    a_{m1} & a_{m2} & \dots & a_{mn} \\
  \end{matrix}
  \left |
  \begin{matrix}
    b_{11} \\ b_{21} \\ \vdots \\ b_{m1}
  \end{matrix}
  \right .
  \left |
  \begin{matrix}
    b_{12} \\ b_{22} \\ \vdots \\ b_{m2}
  \end{matrix}
  \right .
  \left |
  \begin{matrix}
    \cdots \\ \cdots \\ \cdots \\\cdots 
  \end{matrix}
  \right .
  \left |
  \begin{matrix}
    b_{1p} \\ b_{2p} \\ \vdots \\ b_{mp}
  \end{matrix}
  \right .
  \end{pmatrix}
$$

Then, we can perform [Gaussian/Gauss-Jordan elimination](./linear_systems.ipynb#Gaussian-Elimination) (Or Gaussian elimination) on all the $b_i$'s simultaneously to obtain the solution to all of them together.

The reason we could do this is because the elimination process is only focused on $\mathbf A$.
Thus, all the $b_i$ would experience the same modification, and hence we can act on them simultaneously.

For example, given the following augmented matrix:

$$
  \begin{pmatrix}
  \begin{matrix}
   1 & 2\\
   3 & 6
  \end{matrix}
  \left |
  \begin{matrix}
  2 \\ 6
  \end{matrix}
  \right .
  \left |
  \begin{matrix}
  3 \\ 2
  \end{matrix}
  \right .
  \end{pmatrix}
$$

Its row-echelon form would be:

$$
  \begin{pmatrix}
  \begin{matrix}
   1 & 2\\
   0 & 0
  \end{matrix}
  \left |
  \begin{matrix}
  2 \\ 0
  \end{matrix}
  \right .
  \left |
  \begin{matrix}
  3 \\ -7
  \end{matrix}
  \right .
  \end{pmatrix}
$$

In [5]:
A = np.array([1, 2, 3, 6]).reshape((2, 2))
b = np.array([2, 3, 6, 2]).reshape((2, 2))
print('Simultaneous:', np.hstack(gaussian_elim(A, b)), sep='\n')
print('Individual:', np.hstack(gaussian_elim(A, b[:, 0:1])), np.hstack(
    gaussian_elim(A, b[:, 1:])), sep='\n')

Simultaneous:
[[ 1.  2.  2.  3.]
 [ 0.  0.  0. -7.]]
Individual:
[[1. 2. 2.]
 [0. 0. 0.]]
[[ 1.  2.  3.]
 [ 0.  0. -7.]]


Hence, we can conclude that the first system is consistent, while the second one isn't.

We would have come to the same result if we were to perform elimination on the 2 systems separately.

## Inverse

In the real number system, the multiplicative inverse of a number $c$ is the number $k$ such that $ck = 1$.
Hence, $k= \frac{1}{c}$, (if $c \neq 0$, _ie_ 0 has no inverse)

Since $\mathbf I$ serves a similar row to $1$ in the matrix system; similarly, given a matrix $\mathbf A$, its **left inverse** $\mathbf B$ is such that:
$$
\mathbf {BA} = \mathbf I
$$
And the **right inverse** satisfy:
$$
\mathbf {AB} = \mathbf I
$$

Note that since matrix multiplication is [not commutative](#Properties), the left inverse may be different from the right inverse.

### Properties

Not all matrices have inverses, for example $\begin{pmatrix} 0 & 1 \\ 0 & 0\end{pmatrix}$ have no inverse.

Inverse may not be unique, for example, both $\begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{pmatrix}$ and $\begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix}$
are left inverses of $\begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$.

However, the properties becomes stricter (and thus more interesting) when we consider the inverses of [square matrices](#Square-matrix).

#### Properties of inverses of square matrices

We define the inverse of a square matrix such that both $\mathbf {AB} = I_m$ and $\mathbf {BA} = I_n$.

In fact, if $\mathbf {AB} = \mathbf I_m$, then it is certain that $\mathbf {BA} = I_n$, and vice versa.
[Proof of this](#inverse-equality) is postponed till later.

$\prop$:
The inverse is **unique**

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        Let $\mathbf B, \mathbf C$ be two inverses of $\mathbf A$.
        Since $\mathbf A$ is invertible, $$\mathbf {A B} = \mathbf I = \mathbf {A C} \\\Rightarrow \mathbf A ^ {-1} \mathbf A \mathbf B = \mathbf A ^ {-1} \mathbf A \mathbf C  \\ \Rightarrow \mathbf B = \mathbf C$$
        $$QED$$
    </div>
</details>

Since the inverse is unique, we can simply denote the inverse as $\mathbf A^{-1}$.

If such an inverse exists, we deem $\mathbf A$ **invertible**, otherwise it is **singular**.

* $(\mathbf A ^{-1}) ^{-1} = \mathbf A$
* $(a \mathbf A) ^{-1} = \frac{1}{a} \mathbf A ^{-1}, \quad a \neq 0$
* $(\mathbf A ^{T}) ^{-1} = (\mathbf A ^{-1}) ^{T} $ 
* $(\mathbf{AB}) ^{-1} = \mathbf B ^{-1} \mathbf A ^{-1}$
    * This implies that if two matrices are invertible, their product is also invertible

For an order 2 matrix of the form $\begin{pmatrix} a & b \\ c & d\end{pmatrix}$, the inverse is:
$$
\frac{1}{ad - bc} \begin{pmatrix} d & -b \\ -c & a\end{pmatrix}
$$

If $\mathbf A$ is invertible and:
* $\mathbf {AB} = \mathbf {AC}$, then $\mathbf B = \mathbf C$.
* $\mathbf {BA} = \mathbf {CA}$, then $\mathbf B = \mathbf C$.

## Elementary matrix

A square matrix $\arr E$ is an **elementary matrix** if it is obtained from $\arr I$ by performing a single [elementary row operation](./linear-systems.ipynb#Elementary-Row-Operations) on it.

The following are some examples:

$$
\begin{matrix}
2R_1 & R_1 \leftrightarrow R_2 & R_2 + 2R_1 \\
\begin{pmatrix}
2 & 0 & 0\\ 
0 & 1 & 0 \\
0 & 0 & 1
\end{pmatrix} & 
\begin{pmatrix}
0 & 1 & 0 \\
1 & 0 & 0\\ 
0 & 0 & 1 
\end{pmatrix} & 
\begin{pmatrix}
1 & 0 & 0\\ 
2 & 1 & 0 \\
0 & 0 & 1
\end{pmatrix} 
\end{matrix}
$$

Notice that $\arr {EA}$ is the same as performing the corresponding elementary row operation on $\arr A$.

In [6]:
A = np.arange(9).reshape((3, 3))
E = np.array([[2, 0, 0], [0, 1, 0], [0, 0, 1]])
EA = mult(E, A)
print('A:', A, sep='\n')
print('E:', E, sep='\n')
print('EA:', EA, sep='\n')

A:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
E:
[[2 0 0]
 [0 1 0]
 [0 0 1]]
EA:
[[0. 2. 4.]
 [3. 4. 5.]
 [6. 7. 8.]]


In [7]:
E = np.array([[1, 0, 0], [2, 1, 0], [0, 0, 1]])
EA = mult(E, A)
print('A:', A, sep='\n')
print('E:', E, sep='\n')
print('EA:', EA, sep='\n')

A:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
E:
[[1 0 0]
 [2 1 0]
 [0 0 1]]
EA:
[[0. 1. 2.]
 [3. 6. 9.]
 [6. 7. 8.]]


In [8]:
E = np.array([[1, 0, 0], [2, 1, 0], [0, 0, 1]])
EA = mult(E, A)
print('A:', A, sep='\n')
print('E:', E, sep='\n')
print('EA:', EA, sep='\n')

A:
[[0 1 2]
 [3 4 5]
 [6 7 8]]
E:
[[1 0 0]
 [2 1 0]
 [0 0 1]]
EA:
[[0. 1. 2.]
 [3. 6. 9.]
 [6. 7. 8.]]


$\corollary$ If $\arr B$ is obtained from $\arr A$ by a series of row operations $r_1, r_2, \dots r_k$
$$
\arr A \xrightarrow{r_1} \xrightarrow{r_2} \cdots \xrightarrow{r_k} \arr B
$$
then 
$$
\arr B = \arr E_k \cdots \arr E_2 \arr E_1 \arr A
$$
where $\arr E_i$ is the elementary matrix corresponding to $r_i$.

$\lemma$ $\arr E$ is invertible, where its inverse is an elementary matrix which undoes $\arr E$'s elementary row operation

$$
\begin{matrix}
\arr E & \leftrightarrow & \arr E \inv \\
c R_i & & \frac{1}{c} R_i \\
R_i \leftrightarrow R_j & &  R_i \leftrightarrow R_j \\
R_i + a R_j & & R_i - a R_j 
\end{matrix}
$$

$\theorem$ A square matrix is invertible if and only if its reduced row-echelon form is $\arr I$.

$\corollary$ A square matrix is invertible if and only if it is a product of elementary matrices

This also means that if $\arr A = \arr E_k \cdots \arr E_2  \arr E_1$, then $\arr A \inv = \arr E_1 \inv \arr E_2 \inv \cdots \arr E_k \inv$.

These are statement 2 and 3 in our [pool of equivalence](pool_of_equivalence.ipynb).

## Finding inverse

Using the above theorem, we have the following algorithm to obtain the inverse of a matrix:
1. Create an augmented matrix $\mat{\arr A \left | \arr I \right .}$
2. Perform Gauss-Jordan elimination until we get $\mat{\arr R \left | \arr B \right .}$ where $\arr R$ is the reduced row-echelon form of $\arr A$
3. If $\arr R = \arr I$, then $\arr B = \arr A \inv$ , otherwise $\arr A$ is singular

In [9]:
A = np.arange(9).reshape((3, 3))
R, B = gauss_jordan_elim(A, np.identity(3))
np.hstack((R, B))

array([[ 3.,  0., -3., -4.,  1.,  0.],
       [ 0.,  1.,  2.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1., -2.,  1.]])

Since $\arr R \neq \arr I$, $\arr A$ is singular.

In [10]:
A = np.array([1, 2, 1, 3, 2, 1, 2, 3, 1]).reshape((3, 3))
R, B = gauss_jordan_elim(A, np.identity(3))
np.hstack((R, B))

array([[ 1. ,  0. ,  0. , -0.5,  0.5,  0. ],
       [ 0. ,  1. ,  0. , -0.5, -0.5,  1. ],
       [-0. , -0. ,  1. ,  2.5,  0.5, -2. ]])

In this case, we have found out that $\mat{1 & 2 & 1 \\ 3 & 2 & 1 \\ 2 & 3 & 1} \inv = \mat{-0.5 & 0.5 & 0 \\ -0.5 & -0.5 & 1 \\ 2.5 & 0.5 & -2}$

In [11]:
mult(A, B)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

## Inverse and linear system

$\theorem$:
A square matrix $\arr A$ is invertible if and only if the *homogeneous* system $\arr A \arr x = \arr 0$ has only the trivial solution.

This is statement 4 in our [pool of equivalence](pool_of_equivalence.ipynb).

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        Suppose $\arr A$ is invertible, and $\arr u$ is a solution to the system.
        Then $\arr {Au} = \arr 0 \Rightarrow \arr u = \arr A \inv \arr 0 = \arr 0$.
        Hence, only the trivial solution exists.
        <br>
        Suppose that $\arr A$ is not invertible.
        We perform elimination from $\mat{\arr A \left | \arr 0 \right .} \rightarrow \mat{\arr R \left | \arr 0 \right .}$.
        Since $\arr A$ is not invertible, $\arr R \neq \arr I$, and hence must have a non-pivot column.
        Hence, the system does not have a trivial solution.
        $$QED$$
    </div>
</details>

Using our previous example, we know that the first matrix is singular, while the second is invertible.

In [12]:
A = np.arange(9).reshape((3, 3))
np.hstack(gauss_jordan_elim(A, np.zeros((3, 1))))

array([[ 3.,  0., -3.,  0.],
       [ 0.,  1.,  2.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [13]:
A = np.array([1, 2, 1, 3, 2, 1, 2, 3, 1]).reshape((3, 3))
np.hstack(gauss_jordan_elim(A, np.zeros((3, 1))))

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [-0., -0.,  1., -0.]])

And indeed, the first one doesn't have the trivial solution, while the second one does.

$\theorem$:
A square matrix $\arr A$ is invertible if and only if the system $\arr A \arr x = \arr b$ has a unique solution for all $\arr b$.

This is statement 5 in our [pool of equivalence](pool_of_equivalence.ipynb).

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        Similar to the previous proof, we consider $\mat{\arr A \left | \arr b \right.} \rightarrow \mat{\arr R \left | \arr u\right.}$.
        If $\arr A$ is invertible, then $\arr R = \arr I$ which means $\arr u$ is the unique solution to the system.
        <br>
        Otherwise, if $\arr A$ is singular, then the system does not have a unique solution.
        $$QED$$
    </div>
</details>

With the linkage between linear system and inverses, we can now tackle one of our previous proofs.

<span id="inverse-equality">$\theorem$:</span>
The left inverse and right inverse of a square matrix $\arr A$ are both $\arr A \inv$.

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        Let $\arr {BA} = \arr I$.
        Consider the system $\arr {Ax} = \arr 0$, where $\arr u$ is a solution to the system.
        Then
        $$
        \arr u = \arr {BAu} = \arr{B0} = \arr 0
        $$
        and hence there is only the trivial solution, and thus $\arr A$ is invertible.
        Therefore,
        $$
        \arr B = \arr B (\arr {AA} \inv) = (\arr{BA}) \arr A \inv = \arr A \inv
        $$, which means the left inverse is $\arr A \inv$.
        <br>
        Let $\arr {AC} = \arr I$.
        We transpose both sides to get $\arr C^T \arr A^T = \arr I$.
        Using the same process as above, we would arrive at
        $$
        \arr C ^T = (\arr A ^ T) \inv = (\arr A \inv) ^T
        $$
        Therefore $\arr C = \arr A \inv$, which means the right inverse is also $\arr A \inv$
        $$QED$$
    </div>
</details>

## Minor

The $(i, j)$ matrix minor of $\arr A$ is the matrix obtained by deleting the $i$-th row and $j$-th column of $\arr A$.

For example, given $\arr A = \mat{1 & 2& 3 \\ 4& 5& 6 \\ 7 & 8 & 9}$, $\arr M_{23} = \mat {1 & 2 \\ 7 & 8}$.

## Cofactor

The $(i, j)$-cofactor of $\arr A$ is defined as:
$$
A_{ij} = (-1)^{i+j} | \arr M_{ij} |
$$

where $| \arr M_{ij} |$ is determinant of the $(i,j)$ minor.

## Determinant

The **determinant** of square matrix $\arr A$ of order $n$ is defined as:

$$
| \arr A | = 
\begin{cases}
a & n=1, \arr A = \mat{a}\\ 
ad - bc & n=2, \arr A = \mat{a & b \\ c & d}\\ 
\sum \limits_{k=1} ^ n a_{1k} A_{1k} & n \geq 3
\end{cases}
$$

The summation is called the *cofactor expansion* along row 1.

For a order 3 matrix, it would be:

$$
\left | \mat{a & b & c \\ d & e &f \\ g& h & i}\right |
=
a \left | \mat{e & f \\ h & i}\right |
- b \left | \mat{d & f \\ g & i}\right |
+ c \left | \mat{d & e \\ g & h}\right |
=
aei - afh - bdi + bfg + cdh - ceg
$$

In fact, the determinant is the same, regardless of which row or column we perform the co-factor expansion.

$\theorem$ The determinant of a square matrix can be obtained by performing cofactor expansion along any row or column

For example, expanding along the 2nd column,
$$
\left | \mat{a & b & c \\ d & e &f \\ g& h & i}\right |
=
- b \left | \mat{d & f \\ g & i}\right |
+ e \left | \mat{a & c \\ g &i}\right |
- h \left | \mat{a & c \\ d & f}\right |
=
-bdi + bfg + eai - ecg - haf + hcd
$$

And we can see that the 2 are equal.

In [14]:
from module.matrix import det
show_implementation(det)

def det(A: np.ndarray) -> float:
    if A.ndim == 1:
        return A[0]

    assert A.ndim == 2
    assert A.shape[0] == A.shape[1]
    n = A.shape[0]

    if n == 1:
        return A[0][0]
    if n == 2:
        a, b, c, d = A.ravel()
        return a * d - b * c

    return sum(A[0][i] * det(A[1:, [j for j in range(n) if j != i]]) * (-1) ** i for i in range(n))


$\corollary$:
$\left | \arr A \right | = \left | \arr A ^T \right |$

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        Simply notice that a cofactor expansion across the first row of $\arr A$ is the same as the cofactor expansion down the first column of $\arr A^T$.
        $$QED$$
    </div>
</details>

$\corollary$:
If a square matrix $\arr A$ has a zero row or column, its determinant is 0

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        Simply expand along the 0 row.
        $$QED$$
    </div>
</details>

$\corollary$:
The determinant of a [triangular matrix](#Triangular--matrix) is the sum of the diagonal entries.

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        For a upper triangular matrix, we expand along the first column.
        We will see that it recursively multiply the diagonal entries, getting our result.
        For lower triangular matrices, we expand along the first row to get the same conclusion.
        $$QED$$
    </div>
</details>

$\theorem$:
$\left | \arr {A B} \right | = \left | \arr A \right | \left | \arr B \right|$

$\corollary$:
$\left | \arr {A}\inv \right | = \frac{1}{\left | \arr A \right |}$

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        $$
        \arr A \arr A \inv = \arr I \\
        \Rightarrow \left|  \arr A \arr A \inv \right | = \left | \arr I \right| \\
        \Rightarrow \left|  \arr A \right| \left | \arr A \inv \right | = 1 \\
        \Rightarrow  \left | \arr A \inv \right | = \frac{1}{\left|  \arr A \right|} \\
        $$
        $$QED$$
    </div>
</details>

### Effects of elementary row operation

Suppose that we perform some elementary row operation on a square matrix.
We wish to know its effect on the determinant of the matrix.

Suppose $\arr B$ is the new matrix after applying the row operation on $\arr A$.
Then the relationship between their determinant would be:

| Operation | Determinant of $\arr B $ |
| --- | --- |
| $$c R_i$$ | $$ c \left | A \right|$$ |
| $$R_i \leftrightarrow R_j$$ | $$- \left | A \right|$$ |
| $$R_i + a R_j$$ | $$\left | A \right|$$ |

In [15]:
A = np.array([1, 2, 3, 4, 5, 6, 3, 4, 2]).reshape((3, 3))
B = A.copy()
B[1] *= 3
print('A:', A, sep='\n')
print('det(A):', det(A))
print('B:', B, sep='\n')
print('det(B):', det(B))

B = A.copy()
B[1], B[0] = A[0], A[1]
print('B:', B, sep='\n')
print('det(B):', det(B))

B = A.copy()
B[2] += 2 * B[0]
print('B:', B, sep='\n')
print('det(B):', det(B))

A:
[[1 2 3]
 [4 5 6]
 [3 4 2]]
det(A): 9
B:
[[ 1  2  3]
 [12 15 18]
 [ 3  4  2]]
det(B): 27
B:
[[4 5 6]
 [1 2 3]
 [3 4 2]]
det(B): -9
B:
[[1 2 3]
 [4 5 6]
 [5 8 8]]
det(B): 9


Using the properties of the determinant of product of two matrices, we can derive the determinant of $E$.

| Operation | Determinant of $\arr E $ |
| --- | --- |
| $$c R_i$$ | $$ c $$ |
| $$R_i \leftrightarrow R_j$$ | $$- 1$$ |
| $$R_i + a R_j$$ | $$1 $$ |

$\corollary$:
$\left | \arr {c A} \right | = c^n \det {\arr A}$

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        We treat the scalar multiplication as the row scaling elementary row operation on all the rows.
        Our conclusion follows.
        $$QED$$
    </div>
</details>

$\theorem$:
$\arr A$ is invertible if and only if $\det {\arr A} \neq 0$

This is statement 6 in our [pool of equivalence](pool_of_equivalence.ipynb).

<details>
<summary style="color: blue">$\proof$ (Click to expand)</summary>
    <div style="background: aliceblue">
        Consider the reduced row-echelon form of $\arr A$.
        If it is not invertible, then it must have a zero row and thus the determinant must be 0.
        Since elementary row operations cannot cause the determinant to be 0, the determinant of $arr A$ must be 0.
        <br>
        Suppose that it is invertible.
        Then $\arr A$ can be express as a product of elementary row operations.
        The determinant will be the product of these determinants.
        Since none of the determinants are 0, the determinant of $\arr A$ is non-zero.
        $$QED$$
    </div>
</details>