# Basic Linear Algebra

Linear Algebra is an important field of mathematics for artificial intelligence. Many AI algorithms that we will cover in this course are expressed or framed in terms of vectors and matrices - the primary objects of study of linear algebra

## Terminology
* A **tensor** is an $n$-dimensional array. The entries in a tensor, also called its components or elements, are drawn from some set. For the most part in this course, we will manipulate tensors defined over the set of real numbers, $\mathbb{R}$, or the set of integers, $\mathbb{Z}$
* The number of dimensions of a tensor is called its **rank**.
    * A rank 0 tensor is called a scalar, e.g. $2$, $3.8989$. By convention, scalars are denoted by common letter variables, e.g. $x = 3.8989$
    * A rank 1 tensor is called a (column) vector, e.g. $\begin{bmatrix}2.1 \\ 3.4\end{bmatrix}$. By convention, we use common letters to denote vectors or common letter with arrows overhead, e.g. $x = \begin{bmatrix}2.1 \\ 3.4\end{bmatrix}$ or $\vec{x} = \begin{bmatrix}2.1 \\ 3.4\end{bmatrix}$   Vectors are often used to represent directions through space or points in space.
    * A rank 2 tensor is called a matrix, e.g. $\begin{bmatrix}2 & 3 \\ 4 & 5\end{bmatrix}$. By convention, we use capital letters to denote matrices, e.g. $M = \begin{bmatrix}2 & 3 \\ 4 & 5\end{bmatrix}$
* While rank tells us the number of dimensions, each dimension has a size. We need to describe the set of tensors drawn from a particular set with each dimension having a particular size. By convention, a vector of size $n$ over the set of real numbers is said to be an element of the set $\mathbb{R}^n$, and a matrix with $n$ rows and $m$ columns over the set of real numbers is said to be an element of the set $\mathbb{R}^{n \times m}$. 
* Note that vectors of size $n$ can be descibed as being matrices of size $n \times 1$ and are, at least computationally, are sometimes treated as such. Note, however, that vectors are usually used to express concepts different from that of matricies. Matrices are usually used (including in an AI context) as a way of expressing a special type of function called a linear mapping, while vectors are used to describe directions or "points" in space.
*  The $i$<sup>th</sup> component of a vector $x$ is denoted as $x_i$
* The component of a matrix $M$ at row $i$ and column $j$ is denoted as either $M_{ij}$ or $m_{ij}$
* The $j$<sup>th</sup> column of a matrix $M$ is denoted as $M_{:,j}$ or $m_{j}$
* The $j$<sup>th</sup> row of a matrix $M$ is denoted as $M_{j,:}$ or $m^{T}_{j}$
* Suppose that we have numbers $x_1, x_2, x_3, \ldots x_n$. If $y = a_1x_1 + a_2x_2 + a_3x_3 + \ldots + a_nx_n$, where $x_1, x_2, x_3, \ldots x_n$ are some other numbers, then $y$ is a linear combination of  $x_1, x_2, x_3, \ldots x_n$
* A matrix is square if it has the same number of rows and columns
* The vector $\mathbf{0}$ is a special vector containing only 0s and $\mathbf{1}$ is a special vector containing only 1s

## Excercises
1. Give an example of a tensor from the following sets:
    1. $\mathbb{R}^2$
    2. $\mathbb{Z}^{3 \times 3}$
    3. $\mathbb{R}^{5 \times 1}$
    4. $\mathbb{R}^{1 \times 5}$
    5. $\mathbb{R}$
2. To what set do the following tensors belong?
    1. $\begin{bmatrix}2 & 5 & 5.34 \\ 3 & 4.5 & -1.2\end{bmatrix}$
    2. $\begin{bmatrix}2 \\ 5\end{bmatrix}$
    2. $\begin{bmatrix}2 & 5\end{bmatrix}$
3. Consider the matrix $$M = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}$$
    1. What is $M_{11}$ ?
    2. What is $m_{1}$ ?
    3. What is $m^{T}_{1}$ ?
4. Consider the vector $$x = \begin{bmatrix}1 \\ 2 \\ 3 \end{bmatrix}$$
    1. What is $x_{1}$ ?
    2. What is $x_{3}$ ?

## Code
The `numpy` and `torch` libraries give us facilities to manipulate and store tensors. We shall look at how to create them here

In [42]:
import numpy as np # import numpy library

x_row = np.array([1, 2, 3]) # a row vector i.e. 1 x 3 matrix
x_col = np.array([[1], [2], [3]]) # a vector 
M = np.array([[1, 2, 3], [4, 5, 6]]) # a 2 x 3  matrix

n, m = M.shape
print(n)
print(m)

M_11 = M[1][1] # Access component
M_11_another = M[1,1] # Access component
print(M_11)
print(M_11_another)

# column vectors are treated as matrices
x_1 = x_col[1][0]
x_2 = x_col[2][0]

M_1 = M[:,1] # column slice
M_T_1 = M[1,:] # row slice

ones_3_3 = np.ones((3, 3)) # 3 x 3 matrix of ones
zeros_3_3 = np.zeros((3, 3)) # 3 x 3 matrix of ones
ones_3 = np.ones((3, 1)) # vector with ones
zeros_3 = np.ones((3, 1)) # vector with ones
M_with_ones = np.ones_like(M) # create a matrix with all ones, with the same size as M
M_with_zeros = np.zeros_like(M) # create a matrix with all zeroes, with the same size as M

sep = "========================="

2
3
5
5


We can also generate random vectors are matrices

In [21]:
M1 = np.random.random((2, 3)) # a random 2 x 3 matrix


print(M1)

print(sep)

v1 = np.random.random((1, 3)) # a random column vector

print(v1)

print(sep)
v2 = np.random.random(3) # a random row vector

print(v2)
print(sep)

# can also reshape matrices and vectors

v3 = np.random.random(10)
v3_col = v3.reshape((10, 1))
v3_col_another_way = v3.reshape((10, -1)) # -1 autofills
M2 = v3.reshape((5, 2))

print(v3)
print(v3_col)
print(v3_col_another_way)
print(M2)

print(sep)


[[0.63285569 0.59211819 0.01850163]
 [0.45281911 0.26457497 0.00597591]]
[[0.79794057 0.93664194 0.58856676]]
[0.38144335 0.21832541 0.10261183]
[0.0485009  0.71376719 0.70346129 0.89056215 0.25419393 0.49464767
 0.08818572 0.67597223 0.87221094 0.80835031]
[[0.0485009  0.71376719 0.70346129 0.89056215 0.25419393 0.49464767
  0.08818572 0.67597223 0.87221094 0.80835031]]
[[0.0485009  0.71376719 0.70346129 0.89056215 0.25419393 0.49464767
  0.08818572 0.67597223 0.87221094 0.80835031]]
[[0.0485009  0.71376719]
 [0.70346129 0.89056215]
 [0.25419393 0.49464767]
 [0.08818572 0.67597223]
 [0.87221094 0.80835031]]


## Operations on Vectors and Matrices
There are several operations that can be performed on vectors and matrices, and many familiar operations have vector and matrix equivalents

### Addition and Hadamard Product

Addition and Hadamard Product are examples of component-wise binary operations. In such operations, both operands must be of the same size. The output is the same size as the inputs. They are defined as follows:

#### Addition
$
\begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix}
$ $+$ $
\begin{bmatrix} 
b_{11} & b_{12} & \ldots & b_{1m} \\ 
b_{21} & b_{22} & \ldots & b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
b_{n1} & b_{n2} & \ldots & b_{nm} 
\end{bmatrix}
$ $=$ $\begin{bmatrix} 
a_{11} + b_{11} & a_{12} + b_{12} & \ldots & a_{1m}  + b_{1m}\\ 
a_{21} + b_{21} & a_{22} + b_{22} & \ldots & a_{2m} + b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} + b_{n1} & a_{n2} + b_{n2} & \ldots & a_{nm} + b_{nm} 
\end{bmatrix}$

Or more succintly, $(A + B)_{ij} = a_{ij} + b_{ij}$

### Hadamard Product
$
\begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix}
$ $\odot$
$\begin{bmatrix} 
b_{11} & b_{12} & \ldots & b_{1m} \\ 
b_{21} & b_{22} & \ldots & b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
b_{n1} & b_{n2} & \ldots & b_{nm} 
\end{bmatrix}$
 $=$ $\begin{bmatrix} 
a_{11} \cdot b_{11} & a_{12} \cdot b_{12} & \ldots & a_{1m}  \cdot b_{1m}\\ 
a_{21} \cdot b_{21} & a_{22} \cdot b_{22} & \ldots & a_{2m} \cdot b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} \cdot b_{n1} & a_{n2} \cdot b_{n2} & \ldots & a_{nm} +\cdot b_{nm} 
\end{bmatrix}$

Or more succintly, $(A \odot B)_{ij} = a_{ij} \cdot b_{ij}$


The operations on vectors are defined accordingly. Note that these operations are commutative and associative.




In [22]:
# numpy code

A = np.random.random((3, 2))
B = np.random.random((3, 2))

print(A)
print('+')
print(B)
print('=')
print(A + B)

print(sep)

print(A)
print('+')
print(B)
print('=')
print(A * B)


[[0.84155044 0.10350133]
 [0.67842369 0.02882484]
 [0.09323786 0.69758931]]
+
[[0.22109236 0.51558658]
 [0.72054356 0.70079357]
 [0.95047241 0.36641164]]
=
[[1.0626428  0.61908791]
 [1.39896725 0.72961841]
 [1.04371027 1.06400095]]
[[0.84155044 0.10350133]
 [0.67842369 0.02882484]
 [0.09323786 0.69758931]]
+
[[0.22109236 0.51558658]
 [0.72054356 0.70079357]
 [0.95047241 0.36641164]]
=
[[0.18606037 0.0533639 ]
 [0.48883382 0.02020026]
 [0.08862002 0.25560484]]


## Scalar Multiplication and Addition

You can also mulitply and add each component of a vector or matrix by a scalar

### Scalar Multiplication

$\alpha\begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix} = \begin{bmatrix} 
\alpha \cdot a_{11} & \alpha \cdot a_{12} & \ldots & \alpha \cdot a_{1m} \\ 
\alpha \cdot a_{21} & \alpha \cdot a_{22} & \ldots & \alpha \cdot a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
\alpha \cdot a_{n1} & \alpha \cdot a_{n2} & \ldots &  \alpha \cdot a_{nm} 
\end{bmatrix}$

Or more succintly, $(\alpha A)_{ij} = \alpha \cdot A_{ij}$

### Scalar Addition
$b + \begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix} = \begin{bmatrix} 
b + a_{11} & b + a_{12} & \ldots & b + a_{1m} \\ 
b + a_{21} & b + a_{22} & \ldots & b + a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
b + a_{n1} & b + a_{n2} & \ldots &  b + a_{nm} 
\end{bmatrix}$

Or more succintly, $(b + A)_{ij} = b + A_{ij}$
By convention, we tend to write it as $A + b$

The operations on vectors are defined accordingly. Note that these operations are commutative and associative.

In [23]:
M1 = np.random.random((5, 2))

print(M1)
print(sep)
print(2 * M1)
print(sep)
print(2 + M1)

[[0.37619272 0.34653864]
 [0.30175993 0.43102818]
 [0.55410546 0.60680518]
 [0.22476439 0.15351535]
 [0.17032058 0.11558036]]
[[0.75238544 0.69307728]
 [0.60351987 0.86205636]
 [1.10821093 1.21361035]
 [0.44952879 0.3070307 ]
 [0.34064116 0.23116071]]
[[2.37619272 2.34653864]
 [2.30175993 2.43102818]
 [2.55410546 2.60680518]
 [2.22476439 2.15351535]
 [2.17032058 2.11558036]]


### Transpose

The transpose operation flips columns and rows

Suppose $M = \begin{bmatrix}2 & 3 & 4\\ 5 & 6 & 7\end{bmatrix}$, then its transpose, denoted as $M^T$ is
$M^{T} = \begin{bmatrix}2 & 5 \\ 3 & 6\\ 4 & 7\end{bmatrix}$

Sometimes for convenience, we write column vectors as the tranpose of row vectors. For example, instead of writing $v = \begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix}$, we write $v = \begin{bmatrix} 2 & 3 & 4 \end{bmatrix}^{T}$

In [24]:
M1 = np.random.random((2, 3))
print(M1)
print(sep)
print(M1.transpose())

[[0.13983724 0.95807682 0.17524652]
 [0.88026113 0.24211687 0.52199909]]
[[0.13983724 0.88026113]
 [0.95807682 0.24211687]
 [0.17524652 0.52199909]]


# Norm 

The norm of a vector or matrix is the some notion of length on a vector or matrix. The most familiar norm is the euclidean norm or $\mathcal{l}_{2}$ norm. The euclidean norm of a vector $v$ of size $n$ is denoted as either $||v||_{2}$ or as just $||v||$, and is defined as follows:
$$||v||_{2} = \sqrt{\sum_{i=1}^{n} v_{i}^{2}}$$. In general the $\mathcal{l}_{p}$ norm of a vector is 
$$||v||_{p} = \left({\sum_{i=1}^{n} v_{i}^{p}}\right)^{\frac{1}{p}}$$

The euclidean distance between two vectors is the euclidean norm of their component-wise difference, i.e. $$d(u, v) = ||u - v||$$

In [28]:
v = np.random.random((3, 1))
u = np.random.random((3, 1))
norm_v = np.linalg.norm(v, 2)
norm_u = np.linalg.norm(u, 2)
dist_uv = np.linalg.norm(u - v, 2)
print(v)
print(sep)
print(u)
print(sep)
print(norm_v)
print(norm_u)
print(dist_uv)

[[0.26524953]
 [0.55767317]
 [0.89475216]]
[[0.88371673]
 [0.44631044]
 [0.04857381]]
1.0871697679555463
0.9912152565510658
1.0540024360138136


### Matrix multiplication
If we have matrices of compatible dimensions, we can mutliply them. For two matrices to be compatible for matrix mutliplication (abbrev. matul), the number of columns of the first be the same as the number of the rows in the second. The output has the number of rows from the first and the number of columns from the second. 

Consider that we have two matrices A and B of dimensions $m \times n$ and $n \times p$ respectively, the their matrix product is defined as:

$$C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}$$

For example, suppose we have two matrices $A$ and $B$ defined as follows

$$A = \begin{bmatrix}2 & 3 \\ 4 & 5\end{bmatrix}$$

and 

$$B = \begin{bmatrix}1 & 2 & 3\\ 4 & 5 & 6\end{bmatrix}$$

Hence, $C = AB$, is 
$$C = \begin{bmatrix}14 & 19 & 24 \\ 24 & 33 & 42\end{bmatrix}$$

Note that matmul is not commutative, but is associative and distributive. Matrix Multiplication is actually a form of function composition over linear mappings!

In [19]:
x = np.array([[2, 3], [4, 5]])
y = np.array([[1, 2, 3], [4, 5, 6]])
print(x @ y) # @ is the symbol for matrix multiplication in Python

[[14 19 24]
 [24 33 42]]


## Vector-Vector Products

Suppose that we have two vectors $u$ and $v$ $\in \mathbb{R}^{n}$. There are two main ways we can defined a product in terms of $u$ and $v$: inner product and outer product.

### Inner Product
The inner product is also called the dot product. Treating vectors $u$ and $v$ as matricies, their dot product is defined as follows:

$$u \bullet v = u^{T}v = u_{1}v_{1} + u_{2}v_{2} + \ldots u_{n}v_{n}$$ 

The dot product is useful as it has a geometric interpretation:
$$cos ~ \theta = \frac{u \bullet v}{\vert\vert u \vert\vert \vert\vert v \vert\vert}$$

where $\theta$ is the angle between vectors $u$ and $v$

### Outer Product
The outer product between vectors $u$ and $v$ are defined as $x \otimes v = xv^{T}$. One way to remember the difference between the inner and outer product is that the adjective indicates the placement of the tranpose operation. If it is inside of the vectors it is an inner product, if it is on the right vector it is an outer product.

In [41]:
u = np.random.random((3, 1))
v = np.random.random((3, 1))
ones = np.ones_like(u)

print(u)
print(sep)
print(v)
print(sep)
print(np.transpose(u) @ v)
print(sep)
print(u @ np.transpose(v))
print(sep)
print(u @ np.transpose(ones))

[[0.29590262]
 [0.68956713]
 [0.30488254]]
[[0.42884117]
 [0.98008657]
 [0.35413682]]
[[0.91070084]]
[[0.12689522 0.29001018 0.10479001]
 [0.29571477 0.67583549 0.24420111]
 [0.13074618 0.29881129 0.10797013]]
[[0.29590262 0.29590262 0.29590262]
 [0.68956713 0.68956713 0.68956713]
 [0.30488254 0.30488254 0.30488254]]


## Matrix-Vector Multiplication

We can also multiply matrices and vectors of appropriate dimensions. Recall that computationally, we can treat vectors as thourh they are matrices with a single column. Suppose that we have an $m \times n$ matrix $M$ and a vector of size $n$ called $x$, i.e. $M \in \mathbb{R}^{m \times n} \text{ and } x \in \mathbb{R}^{n}$. The operation $Mx$ will yield a vector in $\mathbb{R}^{m}$.

For example, suppose

$$M = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6\end{bmatrix}$$

and 

$$x = \begin{bmatrix}1 \\ 2 \\ 3\end{bmatrix}$$

Then $Mx$ is
$$\begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6\end{bmatrix} \begin{bmatrix}1 \\ 2 \\ 3\end{bmatrix} = $$
$$\begin{bmatrix}1 + 4 + 9 \\ 4 + 10 + 18\end{bmatrix} = \begin{bmatrix}14 \\ 32\end{bmatrix}$$

Notice that the input has the same number of columns of $M$ and the output has the same number of rows. Recall that we mentioned that matrices encode functions; hence we can see that an $m \times n$ matrix encodes a function with the domain $\mathbb{R}^{n}$ and the range $\mathbb{R}^{m}$, i.e. $M: \mathbb{R}^{n} \to \mathbb{R}^{m}$

In [34]:
M = np.random.random((2, 3))
x = np.random.random((3, 1))
print(M)
print(sep)
print(x)
print(sep)
print(M @ x)

[[0.8512076  0.86231019 0.69342643]
 [0.39889822 0.84434217 0.41829395]]
[[0.85895431]
 [0.16611442]
 [0.87856824]]
[[1.48361303]
 [0.85039254]]


### Affine Transformation
An affine transformation is a special transformation of the form $Ax + b$ where $A \in \mathbb{R}^{m \times n}, x \in \mathbb{R}^{n}, b \in \mathbb{R}$. Many AI algorithms use affine transformations

### Matrix Inverse
Suppose we have $Ax = y$, the matrix $A^{-1}$, called the inverse of $A$ is a special matrix where $A^{-1}y = x$. Note that $A$ and $A^{-1}$ have the same dimensions and $A$ must be square. Since matrices are functions, you can think of it as an inverse funciton. Just like other functions, not every matrix has an inverse. Their computation is beyond the scope of the course, but you should be aware of its existence.

For non-square matrices, there are pseudo-inverses such as the Moore-Penrose Pseudo-inverse

In [37]:
M = np.array([[2, 3], [5, 6]])
print(M)
print(sep)
print(np.linalg.inv(M))

[[2 3]
 [5 6]]
[[-2.          1.        ]
 [ 1.66666667 -0.66666667]]


## Identity Matrices

Identity Matrices are square matrices with 1 on its leading diagonal. An idenity matrix of size $n \times n$ is denoted by $I_{n}$. For example,

$$I_{2} = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}$$

Identity matrices are special as $AI_{n} = A$ for all square matrices

In [38]:
I_3 = np.eye(3)
print(I_3)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
