# Matrices

A matrix is an array of numbers organized as a table:

$$
\bm{A} = 
\begin{bmatrix}
3.14159 & 2.71828 \\
1.41421 & 42
\end{bmatrix}
$$

We will use boldface capital letters for matrices. This particular matrix has $2$ rows and $2$ columns. If a matrix $\bm{M}$ has $m$ rows and $n$ columns we indicate this as follows:

- $\bm{M}_{m \times n}$
- $\bm{M} \in \mathbb{R}^{m \times n}$ ($\bf{M}$ belong to the set of real-valued matrices of size $m$ by $n$)
- $\bm{M} \in \mathcal{M}_{m, n}(\mathbb{R})$ (same as above)

So, for instance, to indicate the size of the matrix $\bm{A}$ above we can write $\bm{A}_{2 \times 2}$, $\bm{A} \in \mathbb{R}^{2 \times 2}$, or $\bm{A} \in \mathcal{M}_{2, 2}(\mathbb{R}$)

Each element position can be identified by two integer indices starting at one and indicating the row and column of the element. They are written as subscripts to the matrix name:

$$
\begin{align*}
\bm{A}_{1, 1} & = 3.14159 \text{ (a few digits of } \pi \text{ at row } 1 \text{ and column } 1 \text{)}\\
\bm{A}_{1, 2} & = 2.71828 \text{ (a few digits of } e \text{ at row } 1 \text{ and column } 2 \text{)}\\
\bm{A}_{2, 1} & = 1.41421 \text{ (a few digits of } \sqrt{2} \text{ at row } 2 \text{ and column } 1 \text{)}\\
\bm{A}_{2, 2} & = 42 \text{ (``The Answer to the Ultimate Question of Life, The Universe, and Everything'' at row } 2 \text{ and column } 2 \text{)}
\end{align*}
$$

Sometimes we use lowercase letters corresponding to the matrix name as well:

$$
\begin{align*}
a_{1, 1} & = 3.14159 \\
a_{1, 2} & = 2.71828 \\
a_{2, 1} & = 1.41421 \\
a_{2, 2} & = 42
\end{align*}
$$

It's the same thing.



---

**Exercise** For the matrix below identify what is asked.

$$
\bm{C} = 
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix}
$$

(a) The size, using one of the notations above

(b) The element $\bm{C}_{2, 2}$

(c) The element $c_{1, 3}$

---

Sometimes we want to be able to specify a matrix in a more abstract way, by defining its elements with formulas. For example, if I define the matrix $\bm{A}$ like this:

$$
\bm{A}_{m \times n} = 
\begin{bmatrix}
a_{i, j}
\end{bmatrix} \text{ such that } a_{i, j} = 10 i + j^2
$$

that means

$$
\bm{A}_{2 \times 3} = 
\begin{bmatrix}
11 & 14 & 19 \\
21 & 24 & 29
\end{bmatrix}
$$


---

**Exercise** Write the following matrices:

(a) $\bm{A}_{2 \times 2} = [a_{i,j}] \text{ such that } a_{i,j} = 7$

(b) $\bm{A}_{4 \times 4} = [a_{i,j}] \text{ such that } a_{i,j} = 1 \text{ if } i = j \text{ and zero otherwise}$

---

**Exercise** For

$$
\bm{A}_{2 \times 2} = \left[a_{i,j}\right] =
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}
$$

write $\bm{B}_{2 \times 2} = \left[b_{i, j}\right]$ such that $b_{i,j} = a_{j, i}$

---

Some terminology:

- A matrix of just one row is called a "row-matrix". Row-matrices have just one effective index (the column index), and therefore are very analogous to standard $\mathbb{R}^{n}$ vectors. For this reason, sometimes we use lowercase letters for row-matrices (breaking the rule of uppercase letters for matrices). For instance:

$$
\bm{u} =
\begin{bmatrix}
1 & 2 & 3
\end{bmatrix}
$$

- A matrix of just one column is called a "column-matrix". For the same reasons as those of the row-matrix, a column-matrix is usually seen as a vector, and written with lowercase letters. For example:

$$
\bm{v} = 
\begin{bmatrix}
1 \\
2 \\
3
\end{bmatrix}
$$

- A matrix with the same number of rows and columns is called a "square matrix"

Matrices are very useful:

- You can store data: spreadsheets, images, etc
- You can represent connectivity data, such as networks
- You can represent linear transformations

For instance, if we have the data as follows:

- Tarfield eats 5 lasanhas and 3 apple pies a day. 
- Ton eats 1 lasanha and 1 salad.

We can represent this information in a matrix:

$$
\bm{H} = 
\begin{bmatrix}
5 & 3 & 0 \\
1 & 0 & 1
\end{bmatrix}
$$

where $\bm{H}_{2 \times 3}$ is the matrix of eating habits of Tarfield and Ton, the rows represent the characters (row $1$: Tarfield, row $2$: Ton) and the columns represent the dishes (column $1$: lasanha, column $2$: apple pie, column $3$: salad).


---

**Exercise** Bananas have $420$ mg of potassium, $27$ g of carbohydrates, and $1.3$ g of protein. Apples have $200$ mg of potassium, $25$ g of carbs and $0.5$ g of protein. Represent this information in a matrix where rows are nutrients and columns are fruits.

---

In Python we can use the library `numpy` to handle matrices:

In [1]:
import numpy as np

For a review of `numpy` consult the documentation: https://numpy.org/doc/stable/ (hint: try the "Getting Started" guide, it is really good!)

and this useful notebook by Aurélien Geron (the author of this course's textbook): https://github.com/ageron/handson-ml3/blob/main/tools_pandas.ipynb

The effort to cover library details is both too much and too redundant with the good references above. So, from now on I'll make few examples of code, but by far and large I'll assume that you went through the references I provided to make yourself acquinted with the library, ok?

Our little example matrix $\bm{A}$ above can be made in several ways in `numpy`. We could start with an empty matrix and fill it up:

In [2]:
# Make a 2x2 matrix of zeros.
A = np.zeros((2, 2))
print(A)

# Fill it up.
A[0, 0] = 3.14159
A[0, 1] = 2.71828
A[1, 0] = 1.41421
A[1, 1] = 42
print(A)


[[0. 0.]
 [0. 0.]]
[[ 3.14159  2.71828]
 [ 1.41421 42.     ]]


Notice an important detail: while mathematicians prefer to start numbering their rows and columns from one, computer programmers often prefer to start at zero. So, in Mathematics we have $\bm{A}_{1,1}$, in `numpy` we have `A[0, 0]`.

A more direct way of making a matrix of known values is shown below:

In [3]:
A = np.array([[3.14159, 2.71828], [1.41421, 42]])
print(A)

[[ 3.14159  2.71828]
 [ 1.41421 42.     ]]



---

**Exercise** Write the matrix $\bm{C}$ in `numpy`:

$$
\bm{C} = 
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix}
$$

---

Numpy allows us to make matrices with random values easily. For instance, here is a $3 \times 4$ matrix with random integers from $1$ to $100$:

In [4]:
X = np.random.randint(low=1, high=100, size=(3, 4))
X

array([[63, 92, 22, 49],
       [41, 28, 18, 63],
       [83, 87, 55,  2]], dtype=int32)

And here is a $4 \times 2$ matrix of normal-distributed values with $\mu = 5.2$ and $\sigma = 2.3$:

In [5]:
X = np.random.normal(loc=5.2, scale=2.3, size=(4, 2))
X

array([[7.88696632, 5.03365202],
       [4.47656039, 2.4458209 ],
       [5.36613614, 4.05562301],
       [6.65951866, 4.87429513]])

Sometimes we have to select a "submatrix" from the larger one. In Numpy we can use the familiar "slicing" notation of Python:

In [6]:
X = np.linspace(start=1, stop=20, num=20).reshape(4, 5)
X

array([[ 1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10.],
       [11., 12., 13., 14., 15.],
       [16., 17., 18., 19., 20.]])

In [7]:
i_1 = 1  # Take from row index 1 (the second row).
i_2 = 4  # Take up to the row BEFORE index 4 (that is, go up to row index 3, the last one).
j_1 = 0  # Take from column index 0 (the first column).
j_2 = 2  # Take up to the column BEFORE index 2 (that is, go up to column index 1).

X[i_1:i_2, j_1:j_2]

array([[ 6.,  7.],
       [11., 12.],
       [16., 17.]])

When you want to go all the way to the end, or start from the very beginning, you don't need to write the corresponding index:

In [8]:
X[i_1:, :j_2]

array([[ 6.,  7.],
       [11., 12.],
       [16., 17.]])

If you want the entire range, you can skip both indices:

In [9]:
X[:, 1:2]  # Take the entire second column.

array([[ 2.],
       [ 7.],
       [12.],
       [17.]])

CAREFUL: the operation below will reduce the dimensionality of the array:

In [10]:
X[:, 1]  # Take the entire second column as a 1D array.

array([ 2.,  7., 12., 17.])

 so a matrix becomes a vector! If you want to select the column and keep it as a column matrix (which is what you want to do most of the time), you have to `.reshape()` the result:

In [11]:
X[:, 1].reshape(-1, 1)  # Take the entire second column and reshape it to be a column vector.

array([[ 2.],
       [ 7.],
       [12.],
       [17.]])

Better to always use the full notation.

We will use a similar notation for slicing matrices in pure Mathematics, but with the following differences:

- We start indexing at one
- We include both ends of the indexing (Numpy excludes the final index value)

So, for example, if 

$$
\bm{X} = 
\begin{bmatrix}
 1 &  2 &  3 &  4 &  5 \\
 6 &  7 &  8 &  9 & 10 \\
11 & 12 & 13 & 14 & 15 \\
16 & 17 & 18 & 19 & 20 \\
\end{bmatrix}
$$

then

$$
\bm{X}_{2:3, 2:4} = 
\begin{bmatrix}
 7 &  8 &  9 \\
12 & 13 & 14 \\
\end{bmatrix},
\bm{X}_{:, :2} = 
\begin{bmatrix}
 1 &  2 \\
 6 &  7 \\
11 & 12 \\
16 & 17 \\
\end{bmatrix}, \text{ etc.}
$$


---

**Exercise** Write both the slicing notation for mathematical matrices and the Numpy notation for `numpy` arrays for the following submatrices of a given matrix $\bm{M} \in \mathbb{R}^{m \times n}$:

(a) The range of data between rows $3$ and $7$, and columns $2$ and $5$

(b) The first column

(c) The last row

---

## Matrix operations

### Matrix transposition

The transpose of a matrix $\bm{A}_{m \times n}$ is a matrix $\bm{B}_{n \times m}$ that is obtained exchanging row and column indices: $b_{i, j} = a_{j, i}$. There is a special notation for the transpose of a matrix: apply the superscript $T$ to it, so instead of naming the transpose of $\bm{A}$ as some other name $\bm{B}$, it is better to write $\bm{A}^{T}$.

$$
\bm{A} = 
\begin{bmatrix}
\color{red}{a_{1,1}} & \color{red}{a_{1,2}} & \color{red}{\cdots} & \color{red}{a_{1,n}} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n} \\
\end{bmatrix}
\Rightarrow
\bm{A}^{T} = 
\begin{bmatrix}
\color{red}{a_{1,1}} & a_{2,1} & \cdots & a_{m,1} \\
\color{red}{a_{1,2}} & a_{2,2} & \cdots & a_{m,2} \\
\color{red}{\vdots} & \vdots & \ddots & \vdots \\
\color{red}{a_{1,n}} & a_{2,n} & \cdots & a_{m,n} \\
\end{bmatrix}
$$

### Matrix sum

Two matrices can be added together if they have the same shape. The sum of two matrices is the matrix formed by adding the corresponding elements of each matrix. For example:

$$
\bm{A} = 
\begin{bmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{bmatrix}, \;
\bm{B} = 
\begin{bmatrix}
-1 & -2 \\
3 & 4 \\
0 & 0
\end{bmatrix}
\Rightarrow
\bm{A} + \bm{B} = 
\begin{bmatrix}
0 & 0 \\
6 & 8 \\
5 & 6
\end{bmatrix}
$$

### Scalar multiplication

A matrix can be multiplied by a scalar, which multiplies each element of the matrix by that scalar. For example:

$$
\bm{A} = 
\begin{bmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{bmatrix}, \; 
\alpha = 10
\Rightarrow
\alpha \bm{A} = 
\begin{bmatrix}
10 & 20 \\
30 & 40 \\
50 & 60 \\
\end{bmatrix}
$$

By the way, with scalar multiplication and matrix sum we fit the criteria for a *vector space*: matrices are vectors too! We have the following properties for matrix sum and scalar multiplication (assume compatible shapes whenever necessary):

(Sum properties)

1. Matrix sum is commutative: $\bm{A} + \bm{B} = \bm{B} + \bm{A}$

2. Matrix sum is associative: $(\bm{A} + \bm{B}) + \bm{C} = \bm{A} + (\bm{B} + \bm{C})$

3. There exists the *null matrix* $\bm{0} \in \mathbb{R}^{m \times n}$ such that $\bm{A} + \bm{0} = \bm{A}$. It is just the matrix made of zeros.

4. For every matrix $\bm{A}$ there is an *additive inverse* (or simply, its *negative*), denoted $-\bm{A}$, such that $\bm{A} + (-\bm{A}) = \bm{0}$

(Scalar multiplication properties)

5. Scalar multiplication is distributive with respect to the scalar sum: $(\alpha + \beta) \bm{A} = \alpha \bm{A} + \beta \bm{A}$

6. Scalar multiplication is distributive with respect to the matrix sum: $\alpha (\bm{A} + \bm{B}) = \alpha \bm{A} + \alpha \bm{B}$

7. Scalar multiplication is associative: $\alpha (\beta \bm{A}) = (\alpha \beta) \bm{A}$

8. Multiplying by $1$ does nothing: $1 \bm{A} = \bm{A}$


---

**Exercise** For the matrices $\bm{A}$ and $\bm{B}$ above, write:

(a) $5 \bm{A} + 3 \bm{B}$

(b) $\bm{A} - \bm{A}$

(c) $0 \bm{B}$

---

**Exercise** Repeat the previous exercise with Numpy.

---

**Exercise** Construct numerical examples of each property above using Numpy.

---

**Exercise** My online store sell pens and notebooks (the paper kind) through two different marketplaces: Rainforest and oBey. In January we sold products as follows:

|           | Rainforest | oBey  |
|-----------|------------|-------|
| pens      | 12,345     | 5,432 |
| notebooks | 2,345      | 1,234 |

Curiously, every month after that my sales increased $10%$ with respect to the previous month! I had to close the store by the end of December, that's how afraid I am of exponential success! :)

(a) Represent January sales as a matrix.

(b) Represent as a matrix expression (scalar multiplication and matrix summations) the entire set of sales for the $12$ months.

(c) How much did I sell of each product in each marketplace after my short run as an entrepreneur?

---

### Matrix multiplication

Matrix multiplication has a very weird definition. Let's start with the definition, then get a feeling for the meaning.

Two matrices $\bm{A} \in \mathbb{R}^{m \times p}$ and $\bm{B} \in \mathbb{R}^{p \times n}$ can be multiplied if the shapes are compatible: the number of columns of the first matrix matches the number of rows of the second matrix. The resulting matrix $\bm{C} = \bm{A} \bm{B}$ will have shape $m \times n$, that is, the number of rows of the first for its own number of rows, and the number of columns of the second for its own number of columns: $\bm{C} \in \mathbb{R}^{m \times n}$.

<img src="matprod.png" width=25%/>

The matrix multiplication operation is defined as:

$$
\bm{C}_{m \times n} = \bm{A}_ {m \times p} \bm{B}_{p \times n}
\Rightarrow
c_{i,j} = \sum_{k=1}^{p} a_{i, k} b_{k, j}
$$

This is the familiar "row times column" rule that you learned in high school.

<img src="matprod_dot.png" width=50%/>

It is like doing the dot product between the $i$-th row of $\bm{A}$ and the $j$-th column of $\bm{B}$.

But what is the real meaning of a matrix multiplication? In order to explore this, let's consider the multiplication of a matrix $\bm{A} \in \mathbb{R}^{m \times n}$ by a column-matrix $\bm{v} \in \mathbb{R}^{n}$:

<img src="matvec_1.png" width=50%/>

If we look at the columns of $\bm{A}$ as vectors $\bm{a}^{(i)}$ (we will drop the arrows for vectors and use boldface instead, so row and column matrices are interchangeable with vectors), and look at $\bm{v}$ as column-matrix, then the result $\bm{u} = \bm{A} \bm{v}$ is a linear combination of the columns of $A$:

<img src="matvec_2.png" width=50%/>

So, in this view, matrix multiplication is the same as *forming linear combinations*. If instead of a column-matrix $\bm{v}$ we had a full matrix $\bm{V}$ the same logic applies, once per column of $\bm{V}$.


---

**Exercise** In a previous exercise we had the information that:

- Bananas have $420$ mg of potassium, $27$ g of carbohydrates, and $1.3$ g of protein. 
- Apples have $200$ mg of potassium, $25$ g of carbs and $0.5$ g of protein.

Kevin likes to eat $5$ bananas and $3$ apples a day, while Stuart likes to eat $2$ bananas and $4$ apples a day. 

(a) You already wrote the nutritional information of the fruits as a matrix of nutrients (rows) per fruits (columns). Now write the eating preferences (quantity of fruit per day) of Kevin and Stuart as a matrix of fruits (rows) per character (columns)

(b) You now have two matrices: $\bm{F}_{\text{nutrients} \times \text{fruits}}$ and $\bm{P}_{\text{fruits} \times \text{characters}}$. Write the matrix product that allows us to obtain the amount of nutrients per character.

---

In Numpy the matrix multiplication is performed by the operator `@`:

In [12]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

C = A @ B
D = B @ A

print(C)
print(D)

[[19 22]
 [43 50]]
[[23 34]
 [31 46]]


### Properties of matrix multiplication

**Matrix multiplication is *NOT* commutative**

Often one cannot even form the reverse product due to the shapes! For example, if $\bm{A} \in \mathbb{R}^{2 \times 3}$ and $\bm{B} \in \mathbb{R}^{3 \times 4}$ then $\bm{A} \bm{B}$ is doable, but $\bm{B} \bm{A}$ is not due to the shapes of the matrices.

Even if the reverse product is doable, generally we have $\bm{A} \bm{B} \neq \bm{B} \bm{A}$. First of all, the shapes of $\bm{A} \bm{B}$ and $\bm{B} \bm{A}$ may not match: if $\bm{A} \in \mathbb{R}^{2 \times 3}$ and $\bm{B} \in \mathbb{R}^{3 \times 2}$ then $(\bm{A} \bm{B}) \in \mathbb{R}^{2 \times 2}$ and $(\bm{B} \bm{A}) \in \mathbb{R}^{3 \times 3}$

Second, even if the reverse product is doable and the shapes match (only possible for square matrices $\bm{A}$ and $\bm{B}$), we still usually do not have commutativity. For instance:

$$
\bm{A} = 
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}, \;
\bm{B} = 
\begin{bmatrix}
5 & 6 \\
7 & 8
\end{bmatrix}
\Rightarrow
\bm{A} \bm{B} = 
\begin{bmatrix}
19 & 22 \\
43 & 50
\end{bmatrix}, \;
\bm{B} \bm{A} = 
\begin{bmatrix}
23 & 34 \\
31 & 46 \\
\end{bmatrix}
\Rightarrow
\bm{A} \bm{B} \neq \bm{B} \bm{A}
$$

**Matrix multiplication is associative**

$\bm{A} (\bm{B} \bm{C}) = (\bm{A} \bm{B}) \bm{C}$

**Matrix multiplication is distributive with the matrix sum**

And it is distributive on both the left and right sides: $\bm{A} (\bm{B} + \bm{C}) = \bm{A} \bm{B} + \bm{A} \bm{C}$ and $(\bm{A} + \bm{B}) \bm{C} = \bm{A} \bm{C} + \bm{B} \bm{C}$.

**Scalar multiplication is associative with matrix multiplication**

$\alpha (\bm{A} \bm{B}) = (\alpha \bm{A}) \bm{B} = \bm{A} (\alpha \bm{B})$

**The transpose of a product is the reverse product of the transposes**

$(\bm{A} \bm{B})^T = \bm{B}^T \bm{A}^T$

**There is a square matrix that does nothing in multiplication: the identity matrix**

The square matrix below is a *diagonal* matrix (that is, a square matrix where all elements are zero except those with the same row and column index - the elements in the main diagonal) with ones in the diagonal:

$$
\bm{I}_{n \times n} = 
\begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1
\end{bmatrix}
$$

This is the *identity matrix* of size $n$. Any matrix $\bm{M} \in \mathbb{R}^{n \times p}$ that gets multiplied by the identity matrix results in the very same matrix $\bm{M}$!

$$
\bm{I} \bm{M} = \bm{M}
$$


---

**Exercise** Construct numerical examples of each property using Numpy. For transposition of a matrix `X` use `X.T`. For the identity matrix, use `np.eye(n)`.

---

### Block matrix multiplication

An extremely useful property of matrix multiplication is that you can do it "in blocks"! What does it mean? Suppose you have big matrices $\bm{A}$ and $\bm{B}$ that you wish to multiply together:

<img src="block_matrices_01.png" width=50%/>

Let's partition the matrices into blocks in any way we want, as long as:

- The number of blocks in the row direction and in the column direction of each matrix obey the rule for the shapes of matrices being multiplied. For instance, we can partition $\bm{A}$ into $3$ blocks vertically and $2$ blocks horizontally, and $\bm{B}$ into $2$ block vertically (so that we respect the matrix multiplication shape rules) and $4$ blocks horizontally.

<img src="block_matrices_02.png" width=50%>

- The blocks themselves have shapes that allow the inner block multiplication to happen. See the grid in the image above, and how it respects the shapes of the inner blocks!

Now construct the blocks by applying the matrix multiplication to the partitioned matrix! So, in the example:

$$
\begin{align*}
\bm{C}^{(1,1)} & = \bm{A}^{(1,1)} \bm{B}^{(1,1)} + \bm{A}^{(1,2)} \bm{B}^{(2,1)} \\
\bm{C}^{(1,2)} & = \bm{A}^{(1,1)} \bm{B}^{(1,2)} + \bm{A}^{(1,2)} \bm{B}^{(2,2)} \\
& \vdots \\
\bm{C}^{(3,4)} & = \bm{A}^{(3,1)} \bm{B}^{(1,4)} + \bm{A}^{(3,2)} \bm{B}^{(2,4)} \\
\end{align*}
$$

Let's make a numerical example: suppose we want to multiply matrices $\bm{A}$ and $\bm{B}$ as below:

<img src="block_matrices_03.png" width=33%>

We can partition these matrices into blocks as follows:

<img src="block_matrices_04.png" width=33%>

Now the block multiplications are easier:

$$
\begin{align*}
\bm{C}^{(1,1)} & =
\begin{bmatrix}
1 & 2 \\
3 & 4 \\
\end{bmatrix}
\begin{bmatrix}
1 & 2 \\
3 & 4 \\
\end{bmatrix}
+
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix}
\begin{bmatrix}
0 & 0
\end{bmatrix}
= 
\begin{bmatrix}
7 & 10 \\
15 & 22 \\
\end{bmatrix}
+
\begin{bmatrix}
0 & 0 \\
0 & 0 \\
\end{bmatrix}
=
\begin{bmatrix}
7 & 10 \\
15 & 22 \\
\end{bmatrix} \\
\bm{C}^{(1,2)} & =
\begin{bmatrix}
1 & 2 \\
3 & 4 \\
\end{bmatrix}
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix}
+
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix}
\begin{bmatrix}
5 \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix}
+
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix} \\
\bm{C}^{(2,1)} & =
\begin{bmatrix}
0 & 0
\end{bmatrix}
\begin{bmatrix}
1 & 2 \\
3 & 4 \\
\end{bmatrix}
+
\begin{bmatrix}
5 \\
\end{bmatrix}
\begin{bmatrix}
0 & 0
\end{bmatrix}
= 
\begin{bmatrix}
0 & 0
\end{bmatrix}
+
\begin{bmatrix}
0 & 0
\end{bmatrix}
=
\begin{bmatrix}
0 & 0
\end{bmatrix} \\
\bm{C}^{(2,2)} & =
\begin{bmatrix}
0 & 0
\end{bmatrix}
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix}
+
\begin{bmatrix}
5 \\
\end{bmatrix}
\begin{bmatrix}
5 \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
\end{bmatrix}
+
\begin{bmatrix}
25 \\
\end{bmatrix}
=
\begin{bmatrix}
25 \\
\end{bmatrix}
\end{align*}
$$

And then we put the result together:

<img src="block_matrices_05.png" width=40%>

Block multiplication is very useful in several scenarios:

- It allows easy multiplication of matrices in particular (and quite common) cases, such as "block-diagonal" matrices.

- We can multiply massive matrices that may not even fit in memory, by loading smaller blocks, performing operations on them, and recomposing the result later.

- In many cases it is useful to look at matrices from a block standpoint, to get some interesting insights - more on this later.


---

**Exercise** In a previous exercise we had the information that:

- Bananas have $420$ mg of potassium, $27$ g of carbohydrates, and $1.3$ g of protein. 
- Apples have $200$ mg of potassium, $25$ g of carbs and $0.5$ g of protein.

Kevin likes to eat $5$ bananas and $3$ apples a day, while Stuart likes to eat $2$ bananas and $4$ apples a day. 

Representing all this information as a matrix multiplication we have:

<img src="minions_matprod.png" width=66%>

(a) Split $\bm{F}$ by columns and $\bm{P}$ by rows and express $\bm{N}$ as a block-matrix product. This way of computing $\bm{N}$ emphasizes the fruit contribution separately, by providing the nutrient per character as each parcel of a matrix sum. Do the computations using Numpy.

(b) Split $\bm{F}$ by rows and $\bm{P}$ by columns and express $\bm{N}$ as a block-matrix product. This way of computing $\bm{N}$ is the standard matrix product computation, and emphasizes the separate contributions of nutrients per character by abstracting the fruit itself. Do the computations using Numpy.

## Matrices as linear transformations

A linear transformation is a function $T: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ that maps vectors in $\mathbb{R}^{n}$ to vectors in $\mathbb{R}^{m}$ with the following properties:

1. Scaling a vector scales its transform: $T(\alpha \bm{v}) = \alpha T(\bm{v})$

2. The transform of the sum is the sum of the transforms: $T(\bm{u} + \bm{v}) = T(\bm{u}) + T(\bm{v})$

Linear transformations are **everywhere** - that's why Linear Algebra is so popular! Some examples:

- Consider the following electrical circuit:

<img src="circuit.png" width=33%/>

The relationship between the source voltages $(V_1, V_2)$ and the branch currents $(i_1, i_2, i_3)$ is given by:

$$
\begin{bmatrix}
i_1 \\
i_2 \\
i_3
\end{bmatrix}
=
\begin{bmatrix}
\phantom{-}\frac{2}{3} & -\frac{1}{3} \\[4pt]
-\frac{1}{3} & \phantom{-}\frac{2}{3} \\[4pt]
\phantom{-}\frac{1}{3} & \phantom{-}\frac{1}{3} \\
\end{bmatrix}
\begin{bmatrix}
V_1 \\
V_2
\end{bmatrix}
$$

So the vector of currents is given by a matrix times the vector of voltages: $\bm{i} = \bm{G} \bm{v}$. As such, the transformation from voltages to currents is a linear transformation.

In fact, ***any*** linear transformation can be represented as a matrix multiplication operation! 

Let $\bm{e}^{(i)} \in \mathbb{R}^n$, $i \in \{1, 2, \cdots, n\}$ be the vector with all zeros except for a $1$ in the $i$-th position, that is, if $n = 3$ for example we have:

$$
\bm{e}^{(1)} =
\begin{bmatrix}
1 \\
0 \\
0
\end{bmatrix}, \;
\bm{e}^{(2)} =
\begin{bmatrix}
0 \\
1 \\
0
\end{bmatrix}, \;
\bm{e}^{(3)} =
\begin{bmatrix}
0 \\
0 \\
1
\end{bmatrix}
$$

These special vectors have a name: *canonical basis vectors*. Now every vector in $\mathbb{R}^{n}$ is easily written as a linear combination of these guys: if a vector $\bm{v}$ has components $v_{i}$, $i \in \{1, 2, \cdots, n\}$, then $\bm{v} = v_1 \bm{e}^{(1)} + v_2 \bm{e}^{(2)} + \cdots + v_n \bm{e}^{(n)}$:

$$
\bm{v} = 
\begin{bmatrix}
v_1 \\
v_2 \\
\vdots \\
v_n
\end{bmatrix}
=
v_1
\begin{bmatrix}
1 \\
0 \\
\vdots \\
0
\end{bmatrix} +
v_2
\begin{bmatrix}
0 \\
1 \\
\vdots \\
0
\end{bmatrix} +
\cdots +
v_n
\begin{bmatrix}
0 \\
0 \\
\vdots \\
1
\end{bmatrix}
=
v_1 \bm{e}^{(1)} + v_2 \bm{e}^{(2)} + \cdots + v_n \bm{e}^{(n)}
$$

Consider a linear transformation $T: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ (we don't know that we can use a matrix for this, we just know that the transformation is linear). What do we get for $\bm{u} = T(\bm{v})$? Since the transformation is linear:

$$
\begin{align*}
\bm{u} & = T(\bm{v}) \\
& = T(v_1 \bm{e}^{(1)} + v_2 \bm{e}^{(2)} + \cdots + v_n \bm{e}^{(n)}) \\
& = v_1 T(\bm{e}^{(1)}) + v_2 T(\bm{e}^{(2)}) + \cdots + v_n T(\bm{e}^{(n)})
\end{align*}
$$

Let's call $\bm{m}^{(i)} = T(\bm{e}^{(i)})$ the vector that we obtain transforming the $i$-th canonical basis vector. So the expression above becomes:

$$
\begin{align*}
\bm{u} & = v_1 T(\bm{e}^{(1)}) + v_2 T(\bm{e}^{(2)}) + \cdots + v_n T(\bm{e}^{(n)}) \\
& = v_1 \bm{m}^{(1)} + v_2 \bm{m}^{(2)} + \cdots + v_n \bm{m}^{(n)}
\end{align*}
$$

Now stack all $\bm{m}^{(i)}$ as columns of a matrix $\bm{M}$:

$$
\bm{M} = 
\begin{bmatrix}
\rule[-2ex]{0.5pt}{4ex} & \rule[-2ex]{0.5pt}{4ex} & & \rule[-2ex]{0.5pt}{4ex} \\[1pt]
\bm{m}^{(1)} & \bm{m}^{(2)} & \cdots & \bm{m}^{(n)} \\[1pt]
\rule[-2ex]{0.5pt}{4ex} & \rule[-2ex]{0.5pt}{4ex} & & \rule[-2ex]{0.5pt}{4ex} \\[1pt]
\end{bmatrix}
$$

Now we can write $\bm{u} = \bm{M} \bm{v}$ because:

$$
\begin{bmatrix}
\rule[-2ex]{0.5pt}{4ex} \\[1pt]
\bm{u} \\[1pt]
\rule[-2ex]{0.5pt}{4ex} \\[1pt]
\end{bmatrix}
=
\begin{bmatrix}
\rule[-2ex]{0.5pt}{4ex} & \rule[-2ex]{0.5pt}{4ex} & & \rule[-2ex]{0.5pt}{4ex} \\[1pt]
\bm{m}^{(1)} & \bm{m}^{(2)} & \cdots & \bm{m}^{(n)} \\[1pt]
\rule[-2ex]{0.5pt}{4ex} & \rule[-2ex]{0.5pt}{4ex} & & \rule[-2ex]{0.5pt}{4ex} \\[1pt]
\end{bmatrix}
\begin{bmatrix}
v_1 \\
v_2 \\
\vdots \\
v_n
\end{bmatrix}
$$

And we are done! Through the process above, any linear transformation can be converted into a matrix multiplication!


---

**Exercise** Consider a series of linear transformations that takes a vector $\bm{v} \in \mathbb{R}^{2}$ and applies the following steps:

($T_1$) Rotates the vector counter-clockwise by $30$ degrees, and

($T_2$) Scales the resulting vector by a factor of $2$ in the first coordinate and $0.5$ in the second coordinate, and \\

($T_3$) Rotates the vector clockwise by $30$ degrees.

For instance, follow the transformations below:

$$
\bm{v} = 
\begin{bmatrix}
1 \\
1
\end{bmatrix}
\overset{T_1}{\longrightarrow}
\begin{bmatrix}
0.366\\
1.366
\end{bmatrix}
\overset{T_2}{\longrightarrow}
\begin{bmatrix}
0.732\\
0.683
\end{bmatrix}
\overset{T_3}{\longrightarrow}
\begin{bmatrix}
0.975\\
0.225
\end{bmatrix}
$$

<img src="linear_transf.png" width=100%>

(a) Express $T_1$, $T_2$, and $T_3$ in matrix form.

(b) Express $T = (T_3 \circ T_2 \circ T_1)$ in matrix form.

**HINT**: For every transformation, think about what that particular transformation does to the *canonical basis vectors*:

$$
\bm{e}^{(1)} = 
\begin{bmatrix}
1 \\
0
\end{bmatrix}, \;
\bm{e}^{(2)} = 
\begin{bmatrix}
0 \\
1
\end{bmatrix}
$$

then assemble the transformation matrix as shown above.

---

## When is a linear transformation invertible?

A transformation $T: \mathbb{R}^{n} \to \mathbb{R}^{m}$ is invertible if every time I transform an arbitrary vector $\bm{v}$ to get $\bm{u} = T(\bm{v})$ then I can always find $\bm{v}$ again through another transformation, called the *inverse* transformation: $\bm{v} = T^{-1}(\bm{u})$.

Many times we will be talking about transformation from $\mathbb{R}^{n}$ onto itself. In that case, our transformation matrices will be square matrices. Can linear transformations mapping between spaces with different dimensionality be invertible? Not really, unless there are restrictions on the domain and co-domain - they are not truly of different dimensionality.

So for this short course on linear algebra we will focus on $T: \mathbb{R}^{n} \to \mathbb{R}^{n}$, that is, transformations whose matrices are square. Now lets consider the question: when are such transformations invertible?

*Example 1:* Consider the identity transformation $T: \mathbb{R}^{n} \to \mathbb{R}^{n}$, that is, for any $\bm{v}$ we have $T(\bm{v}) = \bm{v}$. Is this invertible? Sure is, and the inverse transformation $T^{-1} = T$ itself! What is the corresponding transformation matrix? The identity matrix!

*Example 2:* Consider the transformation $T: \mathbb{R}^{3} \to \mathbb{R}^{3}$ such that $T(v_1, v_2, v_3) = (v_1, 2 v_2, 3 v_3)$. This transformation is equivalent to a matrix multiplication by the matrix below:

$$
\bm{M} = 
\begin{bmatrix}
1 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 3 \\
\end{bmatrix}
$$

Is this transformation invertible? Lets see: any vector $\bm{v}$ that gets transformed by it results in the following:

$$
\bm{v} = 
\begin{bmatrix}
v_1 \\
v_2 \\
v_3
\end{bmatrix}
\Rightarrow
\bm{u} = T(\bm{v}) = \bm{M} \bm{v} =
\begin{bmatrix}
1 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 3 \\
\end{bmatrix}
\begin{bmatrix}
v_1 \\
v_2 \\
v_3
\end{bmatrix}
=
\begin{bmatrix}
v_1 \\
2 v_2 \\
3 v_3
\end{bmatrix}
$$

So, given the transformed vector $\bm{u}$, can we find another transform that recreates the original vector $\bm{v}$? Yes, we can. The inverse transformation is simply dividing the components by the factors that got applied by $T$, that is:

$$
T(\bm{v}) = (v_1, 2 v_2, 3 v_3) \Rightarrow T^{-1}(\bm{u}) = (u_1, \frac{1}{2} u_2, \frac{1}{3} u_3)
$$

Composing $T^{-1}$ with $T$ gives the following transform:

$$
(T^{-1} \circ T)(\bm{v}) = T^{-1}(T(\bm{v})) =  T^{-1}(\underbrace{v_1, 2 v_2, 3 v_3}_{\bm{u} = T(\bm{v})}) = (v_1, \frac{1}{2} \cdot 2 v_2, \frac{1}{3} \cdot 3 v_3) = \bm{v}
$$


The matrix that reverses the work of $T$ is called the *inverse matrix* of $\bm{M}$, written $\bm{M}^{-1}$:

$$
\bm{M}^{-1} = 
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1/2 & 0 \\
0 & 0 & 1/3
\end{bmatrix}
$$

When we compose $T$ with $T^{-1}$ we are multiplying $\bm{M}$ by $\bm{M}^{-1}$ and we get the identity matrix:


$$
\bm{M}^{-1} \bm{M} = 
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1/2 & 0 \\
0 & 0 & 1/3
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 3 \\
\end{bmatrix}
= 
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
= \bm{I}^{(3)}
$$

*Example 3:* Now what about this: $T(\bm{v}) = (v_1, v_2, 0)$? Imagine that we transform the vector $\bm{v} = (1, 2, 3)$ - we get $\bm{u} = T(\bm{v}) = (1, 2, 0)$. Can we recover the last component of $\bm{v}$ from $\bm{u}$? Not at all! The information is lost! This transformation is *not invertible*.

## Determinants

The determinant of a matrix is a function that ~~was created to make highschoolers miserable~~ assigns a real number to a real square matrix such that:

- The determinant of the identity matrix is 1.
- The exchange of two rows multiplies the determinant by −1.
- Multiplying a row by a number multiplies the determinant by this number.
- Adding a multiple of one row to another row does not change the determinant.

This is a somewhat bizarre list of properties, but with some geometrical intuition we will finally understand the meaning and utility of the determinant!