Alright. Now it's time to talk about the bread and butter of Linear Algebra: the $\textbf{matrix}$. If you've seen a vector before, you've actually seen a matrix. A vector is just a special kind of matrix. Let's generalize first though.

A $\textbf{matrix}$ is effectively a $2$ dimensional array as we saw in $LA1$. If $A$ is a matrix it will have a number of rows, $\color{red}{m}$, and a number of columns, $\color{blue}{n}$, where $\color{red}{m},\color{blue}{n}\in \mathbb{N}$. It may take a little practice to get used to the numbering system, but effectively $A$ looks like this:

$$A = \begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}$$

For the above I used $\color{red}{m},\color{blue}{n}\geq 4$ but this is not necessary. As long as neither are zero, the matrix has some definition. $\textbf{Vectors}$ coincide with either $\color{red}{m}=1$ or $\color{blue}{n}=1$. We have very little interest in the case where both are one so we will omit it.

Enough chatter! Let's just create a matrix in code. Say $\color{red}{m}=4$ and $\color{blue}{n}=5$:

In [None]:
import math
import numpy as np
import random
random.seed(1)

In [None]:
m = 4
n = 5
row = np.empty((n,), dtype='<U4')
A = np.array([row for _ in range(m)], dtype='<U4')

for i in range(m):
    for j in range(n):
        A[i][j] += 'a_'+str(i+1)+str(j+1)

print(A)

Okay, so $A$ is a matrix of dimension $\color{red}{m}\times \color{blue}{n}=\color{red}{4}\times\color{blue}{5}$. Sometimes you'll hear dimension referred to as "size" or "shape" as well. Since all of the components of $A$ are $\textbf{real numbers}$ we say that $A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$ by convention. All this is saying is that $A$ is a member of the set of matrices with real coefficents and has shape $\color{red}{m}\times \color{blue}{n}$.

In order to reference a single component of the matrix we generally see the following notation:

$$(A)_{ij}=a_{ij}$$

Okay, now that we have $A$, let's just fill it with some random things. Let's also create a $B$ of the same size. Notice that because $A$ and $B$ are both $\color{red}{m}\times \color{blue}{n}$ matrices we can add them together pointwise. Also, given any scalar values $s,t\in \mathbb{R}$, we can scale our respective matrices by them in a pointwise manner. I'll explain in a second, but first set $s=\frac{1}{2}$ and $t=3$. We're going to build $A$ and $B$ by randomizing the values in them:

In [None]:
A = np.zeros((m,n))
B = A.copy()
s = (1/2)
t = 3
for i in range(m):
    for j in range(n):
        A[i][j] += random.randint(0,9)
        B[i][j] += random.randint(0,9)

print('A=\n',A,'\n')
print('B=\n',B,'\n')
print('A+B=\n',A+B,'\n')
print('sA+tB=\n', s*A+t*B,'\n')

You can check that any given component of $sA+tB$ matches the scaled sum of the component-wise pieces of $A$ and $B$. And you should check.

Basically for each $i\in [1,\ldots,m]$ and each $j\in [1,\ldots,n]$ we see that $s(A)_{ij}+t(B)_{ij}=(sA+tB)_{ij}$. We refer to this as the $\textbf{pointwise operation}$ $(A)_{ij}+(B)_{ij}=(A+B)_{ij}$ as dictated by the precise location of the pair $ij$ with respect to each matrix. You can also see this in the unscaled addition of matrices of the same size, $A+B$. Scaling on the other hand just scales each $ij$ member of the matrix by that value.

Okay, with that little intro out of the way, let me lay out the rest of the ground rules.

$$\textbf{MATH WARNING!!!}$$

Let $A$, $B$, and $C$ be $\color{red}{m}\times \color{blue}{n}$ matrices, and let $s,t\in \mathbb{R}$ as scalars.

$\hphantom{abcdefghijklmnop}\bullet A+B=B+A$

$\hphantom{abcdefghijklmnop}\bullet (A+B)+C=A+(B+C)$

Because addition and subtraction are pointwise, these respective properties of $\textbf{commutativity}$ and $\textbf{associativity}$ are easily proven. Now let $O$ be the $\color{red}{m}\times \color{blue}{n}$ vector of all zeros. Component-wise addition provides that $O$ is the $\textbf{additive identity}$:

$\hphantom{abcdefghijklmnop}\bullet A+O=A=O+A$

As a consequence of the above, we also have an $\textbf{additive inverse}$:

$\hphantom{abcdefghijklmnop}\bullet A+(-A)=0$

Finally, for scalar multiplication we have an $\textbf{associativity of scalar multiplication}$ and a pair of $\textbf{distributive properties of scalar mulitiplication}$:

$\hphantom{abcdefghijklmnop}\bullet (st)A=s(tA)$

$\hphantom{abcdefghijklmnop}\bullet s(A+B)=sA+sB$

$\hphantom{abcdefghijklmnop}\bullet (s+t)A=sA+sA$

We can confirm most of these with just the $A$, $B$, $s$, and $t$ from above. Notice:

In [None]:
if (A+B).all() == (B+A).all(): print('A+B = B+A:', True)
if (A+np.zeros((4,5))).all() == A.all(): print('A+O = A:', True)
if (A+(-1)*A).all() == np.zeros((4,5)).all(): print('A+(-A) = O:',True)
if ((s*t)*A).all() == (s*(t*A)).all(): print('(st)A = s(tA):', True)
if (s*(A+B)).all() == (s*A+s*B).all(): print('s(A+B) = sA+sB:', True)

These seem like a lot of rules!! I promise though, they're pretty natural. They extend from a lot of the same addition and scalar multiplication you're used to. I think you should try to prove some of these things for yourself.

$$\textbf{MATH WARNING!!!}$$

A proof is just a formal presentation of why something is or isn't true. We won't be super strict, but let's create something that will convince anyone reading that $(A+B)+C=A+(B+C)$. Notice I conveniently left it off the things that we checked above.

Okay let's start! We'll put some training wheels on, and I may over-explain some concepts. That's okay. This isn't meant to throw you into the deep end.

Consider $A,B,C\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$. For $i\in [1,\ldots,\color{red}{m}]$ and $j\in [1,\ldots,\color{blue}{n}]$ we denote the $ij$-component of $A$, $B$, or $C$ by $(A)_{ij}$, $(B)_{ij}$, and $(C)_{ij}$ respectively.

Recall that $(A)_{ij}+(B)_{ij}=(A+B)_{ij}$ and $(A+B)_{ij}+(C)_{ij}=((A+B)+C)_{ij}$ by the properties of pointwise addition on matrices of the same size. Moreover by the associative properities of addition in $\mathbb{R}$, $((A+B)+C)_{ij}=(A+B+C)_{ij}=(A+(B+C))_{ij}$. Basically it doesn't matter how we add in $\mathbb{R}$ as the results are the same. By a symmetric argument in the other direction we complete the equality with:

$$\big((A)_{ij}+(B)_{ij}\big) + (C)_{ij}=(A+B)_{ij}+(C)_{ij}=((A+B)+C)_{ij}=(A+(B+C))_{ij}+(A)_{ij}+(B+C)_{ij}=(A)_{ij}+\big((B)_{ij}+(C)_{ij}\big)$$

Since our choice of $ij$ pair was arbitrary, this holds for all members of $A,B,C\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$. Consequently, for matrices of equivalent size we know $ (A+B)+C=A+(B+C) $.

$$\textbf{and...done!}$$

Alright. It's okay if that moved quickly. I think you should try to think about what every phrase means. Then once you're comfortable, try to apply the same ideas to proving one of the things we verified above.

Now, we need to discuss some more things. There will be a lot of discussion today...sorry.

Next, we need to talk about the $\textbf{transpose}$ of a matrix. We denote the transpose of $A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$ as $A^{T}\in \mathbb{R}^{\color{blue}{n}\times \color{red}{m}}$. Effectively the first column of $A$ becomes the top row in $A^{T}$. The second column becomes the second row. This process continues to exhaustion of columns. So, if $A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$:

$$A^{T}= \begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}^{T}=\begin{pmatrix}
a_{11}&a_{21}&a_{31}&\ldots&a_{m1}\\
a_{12}&a_{22}&a_{32}&\ldots&a_{m2}\\
a_{13}&a_{23}&a_{33}&\ldots&a_{m3}\\
\vdots&&&\ddots&\vdots\\
a_{1n}&a_{2n}&a_{3n}&\ldots&a_{mn}\\
\end{pmatrix}$$

Put simply, the columns become the rows and the rows become the columns. Look:

In [None]:
print('A=\n',A,'\n')

print('A shape:', A.shape,'\n')

print('A^T=\n', A.transpose(), '\n')

print('A^T shape:', A.transpose().shape)

The transpose also has a few properties we will examine:

$\hphantom{abcdefghijklmnop}\bullet (sA)^{T}=sA^{T}$

$\hphantom{abcdefghijklmnop}\bullet (A+B)^{T}=A^{T}+B^{T}$

$\hphantom{abcdefghijklmnop}\bullet (A^{T})^{T}=A$

I will leave the proofs of these points up to you at your pace. They are not that difficult and fall similarly to the proof given above when you become aware of one key fact: given the $ij$ component of $A$, the transpose forces $(A)_{ij}=(A^{T})_{ji}$ meaning the location is swapped to become the $ji$ component of $A^{T}$. Though I won't prove them for you, I will verify them:

In [None]:
if ((s*A).transpose()).all() == (s*(A.transpose())).all(): print('(sA)^T = sA^T:', True)
if ((A+B).transpose()).all() == (A.transpose()+B.transpose()).all(): print('(A+B)^T = A^T+B^T:', True)
if ((A.transpose()).transpose()).all() == A.all(): print('(A^T)^T = A:', True)

Okay, okay, okay. Let's prove the second one, just to get a better hang of things.

$$\textbf{MATH WARNING!!!}$$

Consider $A,B\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$. Recall that by pointwise addition of matrices, for $i\in [1,\ldots,\color{red}{m}]$ and $j\in [1,\ldots,\color{blue}{n}]$ we know:

$$(A)_{ij}+(B)_{ij}=(A+B)_{ij}$$

Coupled with the properties of the transpose we also know:

$$((A+B)^{T})_{ij}=(A+B)_{ji}=(A)_{ji}+(B)_{ji}=(A^{T})_{ij}+(B^{T})_{ij}$$

As this holds for all $ij$ pairs we conclude that $(A+B)^{T}=A^{T}+B^{T}$ for all matrices of equivalent size.

$$\textbf{and...done!}$$

See? Not so bad. You can stop shaking, crying, and rocking in the corner now. Let's straighten ourselves up and keep going.

We're building to $\textbf{matrix multiplication}$ which is very different from the scalar multiplication above. In order to do that, we need to talk about the special case of matrices I alluded to earlier. Let's break down the $\textbf{vector}$. Say we have a vector $v\in \mathbb{R}^{1\times \color{red}{m}}$ and another one $u\in \mathbb{R}^{\color{blue}{n}\times 1}$. Keep $A,B\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$. We're going to bring back our fixed numbers with $\color{red}{m}=4$ and $\color{blue}{n}=5$ as before:

In [None]:
v = np.zeros((1,m))
u = np.zeros((n,1))

for i in range(len(v[0])):
    v[0][i] += np.random.randint(0,9)

for j in range(len(u)):
    u[j][0] += np.random.randint(0,9)
print('v:\n',v,'\n')
print('u:\n',u,'\n')

Okay. Let's focus for a second. Matrix multiplication is a little tricky when you first see it. I'm going to try to demystify everything, but you need to pay careful attention. Maybe even go through this part more than once. I'm going to go really slow. Let's start with $v\in \mathbb{R}^{1\times \color{red}{m}}$. We know we can take the transpose of $v$, so let's take a look at that:

In [None]:
print('v^T:\n',v.transpose(),'\n')

Notice that $v^{T}\in \mathbb{R}^{\color{red}{m}\times 1}$ which is what you should expect given what the transpose does. Now don't get too confused. We're going to define the $\textbf{dot product}$ between two $\textbf{vectors}$ first.

In order to do so we need two vectors with the same number of components. Well, why don't we just take the dot product between a vector and itself!? That guarantees they have the same number of entries! Great idea! So the dot product is a form of multiplication between matrices. Often these two matrices are actually vectors.

There is some confusion though, so let's try to unravel a little bit of the problem.

Often the dot product between two vectors $a$ and $b$ of $k\in \mathbb{N}$ components is thought of as:

$$ a\cdot b= \begin{pmatrix}
a_{1}&a_{2}&a_{3}&\ldots& a_{k}
\end{pmatrix}\cdot \begin{pmatrix}
b_{1}&b_{2}&b_{3}&\ldots& b_{k}
\end{pmatrix} = a_{1}b_{1}+a_{2}b_{2}+a_{3}b_{3}+\ldots+a_{k}b_{k}=\sum\limits_{\ell=1}^{k}a_{\ell}b_{\ell}$$

Instead of talking about them in terms of size $k\times 1$ or $1\times k$, these are generally just thought of as vectors $a,b\in \mathbb{R}^{k}$ with the $1$ omitted. In future courses, perhaps even in a past course, this is how you will encounter or have encountered the dot product between two vectors. Unfortunately we need to be a bit more careful here in this course. That's not to say that this is wrong per se, but it glosses over something very important that's happening underneath the shorthand. You can't really use this idea when multiplying matrices, and so when thinking about vectors as a type of matrix the slight-of-hand above won't work. To show you really quick, watch what happens when I take $v\cdot v$. They both have the same number of components, so this should work out just fine:

In [None]:
print('v.v:\n',v.dot(v),'\n')

Oooops! There was an error. What went wrong!?!?!

Let's see $v\cdot v^{T}$, both again with the same number of components:

In [None]:
print('v.v^T:\n',v.dot(v.transpose()),'\n')

This one worked! Wait. But why?

Notice importantly that $v\neq v^{T}$ as $v\in \mathbb{R}^{1\times \color{red}{m}}$ and $v^{T}\in \mathbb{R}^{\color{red}{m}\times 1}$. The point is that multiplying two vectors from the same space doesn't really work when you think of them as matrices. In other classes we just gloss over this idea which, for a number of reasons, this isn't actually a problem. It provides some confusion for those unfamiliar with what's going on. So for instance, if I say:

$$ v\cdot v = \begin{pmatrix}
v_{1}&v_{2}&v_{3}&v_{4}
\end{pmatrix}\cdot \begin{pmatrix}
v_{1}&v_{2}&v_{3}&v_{4}
\end{pmatrix}=\sum\limits_{\ell=1}^{4}v_{\ell}^{2} $$

...What I'm really doing is:

$$ v\cdot v^{T} = \begin{pmatrix}
v_{1}&v_{2}&v_{3}&v_{4}
\end{pmatrix}\begin{pmatrix}
v_{1}\\
v_{2}\\
v_{3}\\
v_{4}
\end{pmatrix}= v_{1}v_{1}+v_{2}v_{2}+v_{3}v_{3}+v_{4}v_{4}=\sum\limits_{\ell=1}^{4}v_{\ell}^{2}$$

The kind of abuse of notation present in the first example is common when discussing the dot product between vectors. It is very important though that we hammer the correct mechanisms for the $\textbf{dot product}$ home right now, because it will matter when we expand to general matrix multiplication. Feel free to abuse notation at your leisure elsewhere, but as we build to matrix multiplication I'm going to be rather strict. You will see why in a second. Let's bring back an $\color{red}{m}\times \color{blue}{n}$ matrix and a vector $w\in \mathbb{R}^{\color{blue}{n}\times 1}$:

$$A = \begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}\hphantom{abcdefg} w=\begin{pmatrix}
w_{1}\\
w_{2}\\
w_{3}\\
\vdots\\
w_{n}
\end{pmatrix}$$

We're going to do this in two distinct ways. The first way, we will break $A$ into $\textbf{rows}$. The second way, we will break $A$ into $\textbf{columms}$. The thought process behind each is extremely useful, so it's nice to have both ways broken down.

$\textbf{The First Idea}$

Each of the $\color{red}{m}$ rows in $A$ we can think of as a row vector, $r_{i}\in \mathbb{R}^{1\times\color{blue}{n}}$ for $i\in [1,\ldots,\color{red}{m}]$. With this convention, we can find the dot product of $A\cdot w$ or $Aw$ for short:

$$r_{i}=\begin{pmatrix}
a_{i1}&a_{i2}&a_{i3}&\ldots&a_{in}
\end{pmatrix}$$

$$\text{in:}$$

$$A\cdot w=Aw = \begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}w=\begin{pmatrix}
r_{1}\\
r_{2}\\
r_{3}\\
\vdots\\
r_{m}\\
\end{pmatrix}w$$

Now don't be confused. Each one of these rows is individually a vector as denoted above. We then proceed by taking the dot product of each individual row vector with $w$ as the normal dot product between vectors:

$$Aw =\begin{pmatrix}
r_{1}\\
r_{2}\\
r_{3}\\
\vdots\\
r_{m}\\
\end{pmatrix}w=\begin{pmatrix}
r_{1}\cdot w\\
r_{2}\cdot w\\
r_{3}\cdot w\\
\vdots\\
r_{m}\cdot w\\
\end{pmatrix} =\begin{pmatrix}
\sum\limits_{\ell=1}^{n}a_{1\ell}w_{\ell}\\
\sum\limits_{j=\ell}^{n}a_{2\ell}w_{\ell}\\
\sum\limits_{j=\ell}^{n}a_{3\ell}w_{\ell}\\
\vdots\\
\sum\limits_{j=\ell}^{n}a_{m\ell}w_{\ell}\\
\end{pmatrix}$$

The final result of $Aw$ is $\color{red}{m}$ rows of $1$ element each meaning $Aw\in \mathbb{R}^{\color{red}{m}\times 1}$. $\textbf{IMPORTANT!}$ I repeat. The end result of a dot product between a matrix of size $\color{red}{m}\times\color{blue}{n}$ and of size $\color{blue}{n}\times 1$ is a matrix of size $\color{red}{m}\times 1$. It's a vector! Something similar will happen when we multiply matrices by other matrices. Notice how easy it is to find individual components in the resulting vector, namely $\sum\limits_{\ell=1}^{n}a_{i\ell}w_{\ell}$ for $i\in [1,\ldots,\color{red}{m}]$. This will also carry over when we generalize the dot product.

$\textbf{The Second Idea}$

Sometimes it will be useful to break $A$ into columns instead of rows. We're going to look at this thing called a $\textbf{linear combination}$ a little differently. Consider our product $A\cdot w$ again:

$$A\cdot w=Aw = \begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}\begin{pmatrix}
w_{1}\\
w_{2}\\
w_{3}\\
\vdots\\
w_{n}
\end{pmatrix} $$

This time we're going to break $A$ into column vectors $c_{j}\in \mathbb{R}^{\color{red}{m}\times 1}$ for $j\in [1,\ldots \color{blue}{n}]$:

$$Aw = \begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}\begin{pmatrix}
w_{1}\\
w_{2}\\
w_{3}\\
\vdots\\
w_{n}
\end{pmatrix}=\begin{pmatrix}
c_{1}&c_{2}&c_{3}&\ldots&c_{n}
\end{pmatrix}\begin{pmatrix}
w_{1}\\
w_{2}\\
w_{3}\\
\vdots\\
w_{n}
\end{pmatrix}$$

Remember that the components of $w$ denoted $w_{j}$ for $j\in [1,\ldots,\color{blue}{n}]$ are just numbers whereas each $c_{j}$ is a column vector from $A$. We can do this maneuver similar to the dot product between vectors:

$$Aw=\begin{pmatrix}
c_{1}&c_{2}&c_{3}&\ldots&c_{n}
\end{pmatrix}\begin{pmatrix}
w_{1}\\
w_{2}\\
w_{3}\\
\vdots\\
w_{n}
\end{pmatrix}=(w_{1})c_{1}+(w_{2})c_{2}+(w_{3})c_{3}+\ldots+(w_{n})c_{n}$$
$$=w_{1}\begin{pmatrix}
a_{11}\\
a_{21}\\
a_{31}\\
\vdots\\
a_{m1}
\end{pmatrix}+w_{2}\begin{pmatrix}
a_{12}\\
a_{22}\\
a_{32}\\
\vdots\\
a_{m2}
\end{pmatrix}+w_{3}\begin{pmatrix}
a_{13}\\
a_{23}\\
a_{33}\\
\vdots\\
a_{m3}
\end{pmatrix}+\ldots+w_{n}\begin{pmatrix}
a_{1n}\\
a_{2n}\\
a_{3n}\\
\vdots\\
a_{mn}
\end{pmatrix}$$

Wait. That's cool! That looks like the linear combination we were introduced to in $LA1$!

ALRIGHT! Enough chatter! We've seen it both ways, so let's do some stuff! Recall our $A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$. Since $v\in \mathbb{R}^{1\times \color{red}{m}}$ and $v^{T}\in \mathbb{\color{red}{m}\times 1}$ neither will work for $Av$ or $Av^{T}$. If you can't think of why immediately, try to convince yourself why before we move on. Instead let's look at $u\in \mathbb{R}^{\color{blue}{n}\times 1}$:

In [None]:
print('A:\n', A,'\n')

print('u:\n', u,'\n')

This should work right?!?!? Well, without further ado:

In [None]:
print('Au:\n', A.dot(u), '\n')

$Au\in \mathbb{R}^{\color{red}{m}\times 1}$ where $\color{red}{m}=4$, just as prescribed. Wait! But because $Au\in \mathbb{R}^{\color{red}{m}\times 1}$ we should be able to $v(Au)$ since $v\in \mathbb{R}^{1\times \color{red}{m}}$. Notice:

In [None]:
print('vAu:\n', v.dot(A.dot(u)),'\n')

Going the other way, recall $A^{T}\in \mathbb{R}^{\color{blue}{n}\times \color{red}{m}}$ and $v^{T}\in \mathbb{R}^{\color{red}{m}\times 1}$:

In [None]:
print('A^T:\n', A.transpose(), '\n')
print('v^T:\n', v.transpose(), '\n')

Let's see $A^{T}v^{T}$:

In [None]:
print('A^Tv^T:\n', (A.transpose()).dot(v.transpose()), '\n')

Now $u^{T}A^{T}v^{T}$:

In [None]:
print('u^TA^Tv^T:\n', (u.transpose()).dot((A.transpose()).dot(v.transpose())), '\n')

Notice $u^{T}A^{T}v^{T} = vAu$. There's a reason that this works under special conditions, but we won't get into that in this lesson sadly. Still...know there are sooooooo many cool things on the horizon.

In [None]:
print('u^TA^Tv^T = vAu:', (u.transpose()).dot((A.transpose()).dot(v.transpose())).all() == v.dot(A.dot(u)).all())

That was a neat little game, but we can also do something else pretty slick using a very special vector. Imagine a vector where one of the components is $1$ and the rest are $0$. This idea will be incredibly useful later when we're discussing geometric properties, but for now it's a cool little toy we can use to isolate specific columns of a matrix. Let's define this vector $e_{j}\in \mathbb{R}^{\color{blue}{n}\times 1}$ where $j\in [1,\ldots,\color{blue}{n}]$ in such a way where the $j$-th element is a $1$. We call these the $\textbf{standard unit vectors}$; unit because their length is $1$ (a concept we will talk more about in the geometry lesson), and standard sort of because only one component has a value (another concept we will talk more about in the geometry lesson). For instance:


$$e_{1}=\begin{pmatrix}
1\\
0\\
0\\
\vdots\\
0
\end{pmatrix}\hphantom{abcdefg}e_{2}=\begin{pmatrix}
0\\
1\\
0\\
\vdots\\
0
\end{pmatrix}\hphantom{abcdefg}e_{3}=\begin{pmatrix}
0\\
0\\
1\\
\vdots\\
0
\end{pmatrix}\hphantom{abcdefg}\ldots\hphantom{abcdefg}e_{n}=\begin{pmatrix}
0\\
0\\
0\\
\vdots\\
1
\end{pmatrix}$$

So, if we take $Ae_{j}$ we return the $j$th column of $A$ using the $\textbf{linear combination}$ idea:

$$Ae_{j} = \begin{pmatrix}
a_{11}&a_{12}&\ldots&a_{1j}&\ldots&a_{1n}\\
a_{21}&a_{22}&\ldots&a_{2j}&\ldots&a_{2n}\\
a_{31}&a_{32}&\ldots&a_{3j}&\ldots&a_{3n}\\
\vdots&&&\ddots&&\vdots\\
a_{m1}&a_{m2}&\ldots&a_{mj}&\ldots&a_{mn}\\
\end{pmatrix}\begin{pmatrix}
0_{1}\\
0_{2}\\
\vdots\\
1_{j}\\
\vdots\\
0_{n}
\end{pmatrix}$$
$$=(0)\begin{pmatrix}
a_{11}\\
a_{21}\\
a_{31}\\
\vdots\\
a_{m1}
\end{pmatrix}+(0)\begin{pmatrix}
a_{12}\\
a_{22}\\
a_{32}\\
\vdots\\
a_{m2}
\end{pmatrix}+\ldots+(1)\begin{pmatrix}
a_{1j}\\
a_{2j}\\
a_{3j}\\
\vdots\\
a_{mj}
\end{pmatrix}+\ldots+(0)\begin{pmatrix}
a_{1n}\\
a_{2n}\\
a_{3n}\\
\vdots\\
a_{mn}
\end{pmatrix}=\begin{pmatrix}
a_{1j}\\
a_{2j}\\
a_{3j}\\
\vdots\\
a_{mj}
\end{pmatrix}$$

Let's create a few of these so we can use them to isolate our column-vectors of $A$:

In [None]:
units = [np.zeros((n,1)) for _ in range(n)]

print('A:\n',A,'\n')

for i in range(len(units)):
    units[i][i] = 1
    print('Ae'+str(i+1)+':\n', A.dot(units[i]),'\n')

Alright, that was cool I guess. Let's now move onto multiplying matrices by things larger than a vector. Again we will do it in two ways, but they are effectively the same as before.

Pick a random $k\in \mathbb{N}$. We'll start with $A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$ again, but let's say $B\in \mathbb{R}^{\color{blue}{n}\times \color{red}{k}}$. We want to multiply them together, so let's have a look:

$\textbf{The First Idea}$

Let's take this slowly.

$$ A\cdot B = AB=\begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}
\begin{pmatrix}
b_{11}&b_{12}&\ldots&b_{1k}\\
b_{21}&b_{22}&\ldots&b_{2k}\\
b_{31}&b_{32}&\ldots&b_{3k}\\
\vdots&&\ddots&\vdots\\
b_{n1}&b_{n2}&\ldots&b_{nk}\\
\end{pmatrix}$$

Let's write the rows of $A$ in vector form as before with $r_{i}\in \mathbb{R}^{1\times\color{blue}{n}}$ for $i\in [1,\ldots,\color{red}{m}]$, but we will also write the columns of $B$ in vector form with $c_{j}\in \mathbb{R}^{\color{blue}{n}\times 1}$ for $j\in [1,\ldots,\color{red}{k}]$. The product expands on the idea of matrix-vector multiplication and exhausts every column-vector in $B$ dotted with every row-vector in $A$:

$$ AB=\begin{pmatrix}
r_{1}\\
r_{2}\\
r_{3}\\
\vdots\\
r_{m}\\
\end{pmatrix}
\begin{pmatrix}
c_{1}&c_{2}&\ldots&c_{k}
\end{pmatrix}=\begin{pmatrix}
r_{1}\cdot c_{1}&r_{1}\cdot c_{2}&\ldots & r_{1}\cdot c_{k}\\
r_{2}\cdot c_{1}&r_{2}\cdot c_{2}&\ldots & r_{2}\cdot c_{k}\\
r_{3}\cdot c_{1}&r_{3}\cdot c_{2}&\ldots & r_{3}\cdot c_{k}\\
\vdots&&\ddots&\vdots\\
r_{m}\cdot c_{1}&r_{m}\cdot c_{2}&\ldots&r_{m}\cdot c_{k}\\
\end{pmatrix}=\begin{pmatrix}
\sum\limits_{\ell=1}^{n}a_{1\ell}b_{\ell1}&\sum\limits_{\ell=1}^{n}a_{1\ell}b_{\ell2}&\ldots & \sum\limits_{\ell=1}^{n}a_{1\ell}b_{\ell k}\\
\sum\limits_{\ell=1}^{n}a_{2\ell}b_{\ell1}&\sum\limits_{\ell=1}^{n}a_{2\ell}b_{\ell2}&\ldots & \sum\limits_{\ell=1}^{n}a_{2\ell}b_{\ell k}\\
\sum\limits_{\ell=1}^{n}a_{3\ell}b_{\ell1}&\sum\limits_{\ell=1}^{n}a_{3\ell}b_{\ell2}&\ldots & \sum\limits_{\ell=1}^{n}a_{3\ell}b_{\ell k}\\
\vdots&&\ddots&\vdots\\
\sum\limits_{\ell=1}^{n}a_{m\ell}b_{\ell1}&\sum\limits_{\ell=1}^{n}a_{m\ell}b_{\ell2}&\ldots&\sum\limits_{\ell=1}^{n}a_{m\ell}b_{\ell k}\\
\end{pmatrix}
$$

This is a LOT. I promise we will do an example soon. We have to go though a few things though. First, if we want to pinpoint a specific entry in $AB$ we can do so for any $i\in[1,\ldots,\color{red}{m}]$ and $j\in [1,\ldots \color{red}{k}]$ with:

$$(AB)_{ij}=\sum\limits_{\ell=1}^{n}a_{i\ell}b_{\ell j}$$

The other consequence is also pretty cool. $A$ is just a construction of row vectors in this form. Instead of worrying so much about breaking it up, why don't we just pass it through as $A$? We still break up $B$ of course, but look:

$$ AB = A\begin{pmatrix}
c_{1}&c_{2}&\ldots&c_{k}
\end{pmatrix}=\begin{pmatrix}
Ac_{1}&Ac_{2}&\ldots&Ac_{k}
\end{pmatrix}$$

From here each $c_{j}$ is just a column vector of $B$, but importantly it's a vector. All the same things we discussed when multiplying a matrix and vector apply fully here. Since $Ac_{j}$ as a product results in a vector of size $\color{red}{m}\times 1$, we can think of it as such: $Ac_{j}\in \mathbb{R}^{\color{red}{m}\times1}$ for each $j\in [1,\ldots \color{red}{k}]$. Moreover, since there are $\color{red}{k}$ of these vectors making up the final matrix product of $AB$ we know that $AB\in \mathbb{R}^{\color{red}{m}\times \color{red}{k}}$. WOAH! Hold on a sec. It's easy to just flip through this without digging deeper, but we need to press on a particular point here.

$$ A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$$
$$ B\in \mathbb{R}^{\color{blue}{n}\times\color{red}{k}}$$
$$ AB\in \mathbb{R}^{\color{red}{m}\times \color{red}{k}}$$

Even in the case of the vector result $Ac_{j}\in \mathbb{R}^{\color{red}{m}\times1}$, we view it as the product $A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$ and $c_{j}\in \mathbb{R}^{\color{blue}{n}\times1}$. Valid products between matrices will be based on the two inner dimensions matching, and the result will be in the space of the two outter dimensions.

$\textbf{The Second Idea}$

Let's split $A$ into columns:

$$ A\cdot B = AB=\begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}
\begin{pmatrix}
b_{11}&b_{12}&\ldots&b_{1k}\\
b_{21}&b_{22}&\ldots&b_{2k}\\
b_{31}&b_{32}&\ldots&b_{3k}\\
\vdots&&\ddots&\vdots\\
b_{n1}&b_{n2}&\ldots&b_{nk}\\
\end{pmatrix}=\begin{pmatrix}
c_{1}&c_{2}&c_{3}&\ldots&c_{n}
\end{pmatrix}\begin{pmatrix}
b_{11}&b_{12}&\ldots&b_{1k}\\
b_{21}&b_{22}&\ldots&b_{2k}\\
b_{31}&b_{32}&\ldots&b_{3k}\\
\vdots&&\ddots&\vdots\\
b_{n1}&b_{n2}&\ldots&b_{nk}\\
\end{pmatrix}$$

Now, for each column vector in $B$, we will use it to make a $\textbf{linear combination}$ out of its components and the column vectors in $A$, like so:

$$\mathcal{L}_{1}=(b_{11})c_{1}+(b_{21})c_{2}+(b_{31})c_{3}+\ldots+(b_{n1})c_{n}=\sum\limits_{\ell=1}^{n}(b_{i1})c_{i}$$
$$\mathcal{L}_{2}=(b_{12})c_{1}+(b_{22})c_{2}+(b_{32})c_{3}+\ldots+(b_{n2})c_{n}=\sum\limits_{\ell=1}^{n}(b_{i2})c_{i}$$
$$\vdots\hphantom{abcdefghijklmn}\vdots\hphantom{abcdefghijklmn}\vdots$$
$$\mathcal{L}_{k}=(b_{1k})c_{1}+(b_{2k})c_{2}+(b_{3k})c_{3}+\ldots+(b_{nk})c_{n}=\sum\limits_{\ell=1}^{n}(b_{ik})c_{i}$$

Just as if we did this with a matrix product with an individual vector, these linear combinations become column vectors in a new matrix:

$$ AB=\begin{pmatrix}
\mathcal{L}_{1}&\mathcal{L}_{2}&\ldots&\mathcal{L}_{k}
\end{pmatrix} $$

Okay! Enough! Let's do a concrete example, and we're going to use a special kind of matrix, a $\textbf{square}$ matrix. This is what happens when $\color{red}{m}=\color{blue}{n}$. We're also going to demonstrate the multiplication from both ideas. You will eventually gravitate to the one you like the best. Pick $F,D\in \mathbb{R}^{\color{red}{m}\times \color{red}{m}}$ while remember we originaly set $\color{red}{m}=4$:

In [None]:
F = np.array([
    [1,2,3,4],
    [4,0,2,1],
    [0,2,2,0],
    [2,0,2,0]
])
D = np.array([
    [0,1,1,0],
    [1,0,0,1],
    [1,0,1,0],
    [0,1,0,1]
])

print('F:\n',F,'\n')
print('D:\n',D,'\n')

$\textbf{The First Idea}$

Okay, let's do one step-by-step just once to see the internal mechanisms.


$$FD = \begin{pmatrix}
1&2&3&4\\
4&0&2&1\\
0&2&2&0\\
2&0&2&0
\end{pmatrix}\begin{pmatrix}
0&1&1&0\\
1&0&0&1\\
1&0&1&0\\
0&1&0&1
\end{pmatrix}=\begin{pmatrix}
r_{1}\\
r_{2}\\
r_{3}\\
r_{4}
\end{pmatrix}\begin{pmatrix}
c_{1}&c_{2}&c_{3}&c_{4}
\end{pmatrix}=\begin{pmatrix}
r_{1}c_{1}&r_{1}c_{2}&r_{1}c_{3}&r_{1}c_{4}\\
r_{2}c_{1}&r_{2}c_{2}&r_{2}c_{3}&r_{2}c_{4}\\
r_{3}c_{1}&r_{3}c_{2}&r_{3}c_{3}&r_{3}c_{4}\\
r_{4}c_{1}&r_{4}c_{2}&r_{4}c_{3}&r_{4}c_{4}
\end{pmatrix}$$

Let's run down the first column vector of $FD$:

$$(FD)_{11}=\begin{pmatrix}
1&2&3&4
\end{pmatrix}\begin{pmatrix}
0\\
1\\
1\\
0
\end{pmatrix} = 0+2+3+0=5$$

$$(FD)_{21}=\begin{pmatrix}
4&0&2&1
\end{pmatrix}\begin{pmatrix}
0\\
1\\
1\\
0
\end{pmatrix} = 0+0+2+0=2$$

$$(FD)_{31}=\begin{pmatrix}
0&2&2&0
\end{pmatrix}\begin{pmatrix}
0\\
1\\
1\\
0
\end{pmatrix} = 0+2+2+0=4$$

$$(FD)_{41}=\begin{pmatrix}
2&0&2&0
\end{pmatrix}\begin{pmatrix}
0\\
1\\
1\\
0
\end{pmatrix} = 0+0+2+0=2$$

That gives us a first column of $FD$ which is $\begin{pmatrix}
5\\
2\\
4\\
2
\end{pmatrix}$. Honestly though, I don't have the patience to carry the rest of this out. You should though. Make sure you fully understand the mechanics.

$\textbf{The Second Idea}$

Let's fully work out the mechanics using the other method of $\textbf{linear combination}$ of column vectors from $F$ where the coefficients are from each respective column vector from $D$ and see that they agree:

$$FD = \begin{pmatrix}
1&2&3&4\\
4&0&2&1\\
0&2&2&0\\
2&0&2&0
\end{pmatrix}\begin{pmatrix}
0&1&1&0\\
1&0&0&1\\
1&0&1&0\\
0&1&0&1
\end{pmatrix}=\begin{pmatrix}
c_{1}&c_{2}&c_{3}&c_{4}
\end{pmatrix}\begin{pmatrix}
d_{11}&d_{12}&d_{13}&d_{14}\\
d_{21}&d_{22}&d_{23}&d_{24}\\
d_{31}&d_{32}&d_{33}&d_{34}\\
d_{41}&d_{42}&d_{43}&d_{44}
\end{pmatrix}$$

$$=\begin{pmatrix}
\big((d_{11})c_{1}+(d_{21})c_{2}+(d_{31})c_{3}+(d_{41})c_{4}\big)&\big((d_{12})c_{1}+(d_{22})c_{2}+(d_{32})c_{3}+(d_{42})c_{4}\big)&\big((d_{13})c_{1}+(d_{23})c_{2}+(d_{33})c_{3}+(d_{43})c_{4}\big)&\big((d_{14})c_{1}+(d_{24})c_{2}+(d_{34})c_{3}+(d_{44})c_{4}\big)
\end{pmatrix}
=\begin{pmatrix}
\big((0)c_{1}+(1)c_{2}+(1)c_{3}+(0)c_{4}\big)&\big((1)c_{1}+(0)c_{2}+(0)c_{3}+(1)c_{4}\big)&\big((1)c_{1}+(0)c_{2}+(1)c_{3}+(0)c_{4}\big)&\big((0)c_{1}+(1)c_{2}+(0)c_{3}+(1)c_{4}\big)
\end{pmatrix}
$$

Let's worry only about the first column vector in the $FD$ result and write the others as $\mathcal{L}_{2}$, $\mathcal{L}_{3}$, and $\mathcal{L}_{4}$:

$$ FD =\begin{pmatrix}
\big((0)\begin{pmatrix}
1\\
4\\
0\\
2
\end{pmatrix}+(1)\begin{pmatrix}
2\\
0\\
2\\
0
\end{pmatrix}+(1)\begin{pmatrix}
3\\
2\\
2\\
2
\end{pmatrix}+(0)\begin{pmatrix}
4\\
1\\
0\\
0
\end{pmatrix}\big)&\mathcal{L}_{2}&\mathcal{L}_{3}&\mathcal{L}_{4}
\end{pmatrix} $$

$$ =\begin{pmatrix}\begin{pmatrix}
2+3\\
0+2\\
2+2\\
0+2
\end{pmatrix}&\mathcal{L}_{2}&\mathcal{L}_{3}&\mathcal{L}_{2}
\end{pmatrix} = \begin{pmatrix}
5&&&\\
2&\mathcal{L}_{2}&\mathcal{L}_{3}&\mathcal{L}_{4}\\
4&&&\\
2&&&
\end{pmatrix}$$

Notice that it matches the result that we got before. When multiplying matrices, this method is probably less desireable than the first. When thinking about matrices multiplied into a vector, sometimes linear combination is a better approach. It is up to you in the end how you think about things. All of this will become more natural with practice. You should finish out the matrix multiplication and be sure you agree with:

In [None]:
print('FD:\n',F.dot(D),'\n')

One of the cool things you'll notice about two $\color{red}{m}\times \color{blue}{m}$ matrices dotted together, their product is the same size. Of course:

$$ F\in \mathbb{R}^{\color{red}{m}\times\color{blue}{m}}$$
$$ D\in \mathbb{R}^{\color{blue}{m}\times\color{red}{m}}$$
$$ FD\in \mathbb{R}^{\color{red}{m}\times\color{red}{m}}$$

One question should pop into your head now, and if not I'm going to put it there. If I can define matrix multiplication like this, why didn't I just include it with the other properties above. Well, remember $F+D=D+F$ rule for same size matrices? Do you think $FD=DF$ even for square matrices? I urge you to check the multiplication yourself, but if you need a break just look at this:

In [None]:
print('DF:\n',D.dot(F),'\n')
print('FD = DF:', F.dot(D).all() == D.dot(F).any().all())

Okay. This is a big deal. $FD\neq DF$! We need to be careful now because the ordering of our dot product on matrices absolutely matters. Sometimes it won't, but more often than not ordering will change the outcome.

Let's talk real quick about an instance where it doesn't though. First a short build-up. Break up the $\color{red}{m}=\color{blue}{n}$ stuff and reseparate them.

Remember when we discussed all of those special vectors comprised of a single $1$ and the rest $0$'s? The $\textbf{standard unit vectors}$. What would happen if I turned it into a matrix? Notice first that it would have to be square. Moreover we force the $1$'s to only exist along the diagonal axis. We call this the identity matrix $I_{\color{blue}{n}}\in \mathbb{R}^{\color{blue}{n}\times \color{blue}{n}}$:

$$ I_{\color{blue}{n}}=\begin{pmatrix}
1&0&0&\ldots&0\\
0&1&0&\ldots&0\\
0&0&1&\ldots&0\\
\vdots&&&\ddots&\vdots\\
0&0&0&\ldots&1
\end{pmatrix}= \begin{pmatrix}
e_{1}&e_{2}&e_{3}&\ldots&e_{n}
\end{pmatrix}$$

Remember $A\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$ when we broke it up into column vectors? What happens when we take $AI_{\color{blue}{n}}$? We should know immediate that $AI_{\color{blue}{n}}\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$ by our rule. Feel free to use whichever $\textbf{idea}$ you're comfortable with to multiply.

$$AI_{\color{blue}{n}} = \begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}\begin{pmatrix}
1_{1}&0&0&\ldots&0\\
0&1_{2}&0&\ldots&0\\
0&0&1_{3}&\ldots&0\\
\vdots&&&\ddots&\vdots\\
0&0&0&\ldots&1_{n}
\end{pmatrix}$$

As the first column vector in $I_{\color{blue}{n}}$, $e_{1}$, multiplies over to create the first column result of $AI_{\color{blue}{n}}$ you'll notice that it only leaves the first column vector of $A$ in tact. This process continues from $e_{2}$ to $e_{n}$ resulting in all of $A$'s components unchanged. Consequently:

$$ AI_{\color{blue}{n}} = A $$

From the other direction we're not so lucky as $\color{red}{m}\neq \color{blue}{n}$. $I_{n}A$ is undefined:

$$I_{\color{blue}{n}}\in \mathbb{R}^{\color{blue}{n}\times\color{blue}{n}}$$
$$A\in \mathbb{R}^{\color{red}{m}\times\color{blue}{n}}$$

So if $\color{red}{m}\neq \color{blue}{n}$ and $A$ is not square, the identity product from both sides is different. You should check via whatever $\textbf{idea}$ you like that:

$$I_{\color{red}{m}}A= \begin{pmatrix}
1_{1}&0&0&\ldots&0\\
0&1_{2}&0&\ldots&0\\
0&0&1_{3}&\ldots&0\\
\vdots&&&\ddots&\vdots\\
0&0&0&\ldots&1_{m}
\end{pmatrix}\begin{pmatrix}
a_{11}&a_{12}&a_{13}&\ldots&a_{1n}\\
a_{21}&a_{22}&a_{23}&\ldots&a_{2n}\\
a_{31}&a_{32}&a_{33}&\ldots&a_{3n}\\
\vdots&&&\ddots&\vdots\\
a_{m1}&a_{m2}&a_{m3}&\ldots&a_{mn}\\
\end{pmatrix}=A=AI_{\color{blue}{n}}$$

It turns out though that square matrices are nice! This is just one of the many reasons we like square matrices, but recall $D\in \mathbb{R}^{\color{red}{m}\times \color{red}{m}}$.

Recalling that our original $\color{red}{m}=4$ it should be that $I_{\color{red}{m}}D=D=DI_{\color{red}{m}}$. We can check:

In [None]:
In = np.identity(n)
Im = np.identity(m)

print('ImA = A:', Im.dot(A).all() == A.all())
print('AIn = A:', A.dot(In).all() == A.all())

print('ImD = D:', Im.dot(D).all() == D.all())
print('DIm = D:', D.dot(Im).all() == D.all())

Trying $I_{n}A$ though we should get an error:

In [None]:
In.dot(A)

If we're checking up on our rules comparison with matrix addition, when the matrices are square we know $I_{\color{red}{m}}$ works as an identity element similar to the zero matrix, $O$, for addition: $D+O=D=O+D$ and $I_{\color{red}{m}}D=D=DI_{\color{red}{m}}$. It is for that reason we call $I_{\color{red}{m}}$ the $\textbf{identity matrix}$.

ALRIGHT! ENOUGH! We've done a lot. I'm sure you're exhausted. Let's limp through one last important point.

Drop the square matrix thing for a minute, and go back to general matrices with vectors. More about matrix on matrix multiplication will come later. Let $A,B\in \mathbb{R}^{\color{red}{m}\times \color{blue}{n}}$, let $u,w\in \mathbb{R}^{\color{blue}{n}\times 1}$, and let $s\in \mathbb{R}$ be a scalar:

$\hphantom{abcdefghijklmnop}\bullet A(u+w)=Au+Aw$

$\hphantom{abcdefghijklmnop}\bullet s(Au)=(sA)u=A(su)$

$\hphantom{abcdefghijklmnop}\bullet (A+B)u=Au+Bu$

You should convince yourself some sort of argument that these all check out via proofs. I will show you the first.

$$\textbf{MATH WARNING!!!}$$

Notice that by matrix addition $(u+w)\in \mathbb{R}^{\color{blue}{n}\times 1}$ can be thought of as a single vector. Writing $A$ in column form:

$$A(u+w)=\begin{pmatrix}
c_{1}&c_{2}&c_{3}&\ldots&c_{n}
\end{pmatrix}(u+w)$$

But since $(u+w)$ is itself a vector we can write its components as $(u+w)_{i}$ for $i\in [1,\ldots,\color{blue}{n}]$. Then we take the $\textbf{linear combination}$ of column vectors of $A$ with the components of $(u+w)$ as scalars:

$$A(u+w)=(u+w)_{1}c_{1}+(u+w)_{2}c_{2}+(u+v)_{3}c_{3}+\ldots+(u+w)_{n}c_{n} $$

From our scalar multiplication properties though we can take this form of $(u+w)_{i}$, separate it back into $u_{i}+w_{i}$ because they're just numbers, and then distribute via scalar multiplication of vectors:

$$A(u+w)=(u_{1}+w_{1})c_{1}+(u_{2}+w_{2})c_{2}+(u_{3}+w_{3})c_{3}+\ldots+(u_{n}+w_{n})c_{n} $$
$$=(u_{1}c_{1}+w_{1}c_{1})+(u_{2}c_{2}+w_{2}c_{2})+(u_{3}c_{3}+w_{3}c_{3})+\ldots+(u_{n}c_{n}+w_{n}c_{n})$$

Then regrouping by $u_{i}$ and $w_{i}$:

$$A(u+w)=(u_{1}c_{1}+u_{2}c_{2}+u_{3}c_{3}+\ldots+u_{n}c_{n})+(w_{1}c_{1}+w_{2}c_{2}+w_{3}c_{3}+\ldots+w_{n}c_{n})$$

Working backwards though, each of these chunks parsed by $u$ and $w$ components respectively are just linear combinations of $u$ and $w$ over the column vectors of $A$ similar to when $(u+w)$ was being treated as a single vector. Consequently:

$$A(u+w)=\begin{pmatrix}
c_{1}&c_{2}&c_{3}&\ldots&c_{n}
\end{pmatrix}u+\begin{pmatrix}
c_{1}&c_{2}&c_{3}&\ldots&c_{n}
\end{pmatrix}w=Au+Aw$$

$$\textbf{and...done!}$$

Nice. Any proofs you do on these points should look similar. The last thing I'll leave you with is some confirmation:

In [None]:
w = np.zeros((n,1))
for j in range(len(w)):
    w[j][0] += np.random.randint(0,4)

print('A:\n',A,'\n')
print('B:\n',B,'\n')
print('u:\n',u,'\n')
print('w:\n',w,'\n')
print('s:\n',s,'\n')

In [None]:
print('A(u+w) = Au+Aw:', A.dot(u+w).all() == (A.dot(u)+A.dot(w)).all())
print('(A+B)u = Au+Bu:', (A+B).dot(u).all() == (A.dot(u)+B.dot(u)).all())
print('s(Au) = (sA)u:', (s*(A.dot(u))).all() == (s*A).dot(u).all())
print('(sA)u = A(su):', (s*A).dot(u).all() == A.dot(s*u).all())

Go back and work on some of the proofs if you have time. It will help you immensely. Generalizing and working through the problems is the best way to understand the internal mechanics. You might even have some new inspirations/realizations of your own. It is also fair to work on the mechanics via concrete examples until you are comfortable. Figure out what works best for you, and until next time, have fun!