# Basic Linear Algebra

Linear Algebra is an important field of mathematics for artificial intelligence. Many AI algorithms that we will cover in this course are expressed or framed in terms of vectors and matrices - the primary objects of study of linear algebra

## Terminology
* A **tensor** is an $n$-dimensional array. The entries in a tensor, also called its components or elements, are drawn from some set. For the most part in this course, we will manipulate tensors defined over the set of real numbers, $\mathbb{R}$, or the set of integers, $\mathbb{Z}$
* The number of dimensions of a tensor is called its **rank**.
    * A rank 0 tensor is called a scalar, e.g. $2$, $3.8989$. By convention, scalars are denoted by common letter variables, e.g. $x = 3.8989$
    * A rank 1 tensor is called a (column) vector, e.g. $\begin{bmatrix}2.1 \\ 3.4\end{bmatrix}$. By convention, we use common letters to denote vectors or common letter with arrows overhead, e.g. $x = \begin{bmatrix}2.1 \\ 3.4\end{bmatrix}$ or $\vec{x} = \begin{bmatrix}2.1 \\ 3.4\end{bmatrix}$   Vectors are often used to represent directions through space or points in space.
    * A rank 2 tensor is called a matrix, e.g. $\begin{bmatrix}2 & 3 \\ 4 & 5\end{bmatrix}$. By convention, we use capital letters to denote matrices, e.g. $M = \begin{bmatrix}2 & 3 \\ 4 & 5\end{bmatrix}$
* While rank tells us the number of dimensions, each dimension has a size. We need to describe the set of tensors drawn from a particular set with each dimension having a particular size. By convention, a vector of size $n$ over the set of real numbers is said to be an element of the set $\mathbb{R}^n$, and a matrix with $n$ rows and $m$ columns over the set of real numbers is said to be an element of the set $\mathbb{R}^{n \times m}$. Note that vectors of size $n$ can be descibed as being matrices of size $n \times 1$
*  The $i$<sup>th</sup> component of a vector $x$ is denoted as $x_i$
* The component of a matrix $M$ at row $i$ and column $j$ is denoted as either $M_{ij}$ or $m_{ij}$
* The $j$<sup>th</sup> column of a matrix $M$ is denoted as $M_{:,j}$ or $m_{j}$
* The $j$<sup>th</sup> row of a matrix $M$ is denoted as $M_{j,:}$ or $m^{T}_{j}$
* Suppose that we have numbers $x_1, x_2, x_3, \ldots x_n$. If $y = a_1x_1 + a_2x_2 + a_3x_3 + \ldots + a_nx_n$, where $x_1, x_2, x_3, \ldots x_n$ are some other numbers, then $y$ is a linear combination of  $x_1, x_2, x_3, \ldots x_n$

## Excercises
1. Give an example of a tensor from the following sets:
    1. $\mathbb{R}^2$
    2. $\mathbb{Z}^{3 \times 3}$
    3. $\mathbb{R}^{5 \times 1}$
    4. $\mathbb{R}^{1 \times 5}$
    5. $\mathbb{R}$
2. To what set do the following tensors belong?
    1. $\begin{bmatrix}2 & 5 & 5.34 \\ 3 & 4.5 & -1.2\end{bmatrix}$
    2. $\begin{bmatrix}2 \\ 5\end{bmatrix}$
    2. $\begin{bmatrix}2 & 5\end{bmatrix}$
3. Consider the matrix $$M = \begin{bmatrix}1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}$$
    1. What is $M_{11}$ ?
    2. What is $m_{1}$ ?
    3. What is $m^{T}_{1}$ ?
4. Consider the vector $$x = \begin{bmatrix}1 \\ 2 \\ 3 \end{bmatrix}$$
    1. What is $x_{1}$ ?
    2. What is $x_{3}$ ?

## Code
The `numpy` and `torch` libraries give us facilities to manipulate and store tensors. We shall look at how to create them here

In [9]:
import numpy as np # import numpy library

x_row = np.array([1, 2, 3]) # a row vector i.e. 1 x 3 matrix
x_col = np.array([[1], [2], [3]]) # a vector 
M = np.array([[1, 2, 3], [4, 5, 6]]) # a 2 x 3  matrix

n, m = M.shape
print(n)
print(m)

M_11 = M[1][1] # Access component
M_11_another = M[1,1] # Access component
print(M_11)
print(M_11_another)

# column vectors are treated as matrices
x_1 = x_col[1][0]
x_2 = x_col[2][0]

M_1 = M[:,1] # column slice
M_T_1 = M[1,:] # row slice

2
3
5
5


We can also generate random vectors are matrices

In [13]:
M1 = np.random.random((2, 3)) # a random 2 x 3 matrix


print(M1)

print("===========================")

v1 = np.random.random((1, 3)) # a random column vector

print(v1)

print("===========================")
v2 = np.random.random(3) # a random row vector

print(v2)
print("===========================")

# can also reshape matrices and vectors

v3 = np.random.random(10)
v3_col = v3.reshape((1, 10))
v3_col_another_way = v3.reshape((1, -1)) # -1 autofills
M2 = v3.reshape((5, 2))

print(v3)
print(v3_col)
print(v3_col_another_way)
print(M2)

print("===========================")


[[0.63681631 0.80692745 0.05084435]
 [0.1490703  0.64540222 0.71097893]]
[[0.07824615 0.2147977  0.16358979]]
[0.78096241 0.21063345 0.90759046]
[0.67805556 0.70589363 0.02504688 0.25919887 0.27971133 0.93691473
 0.76819015 0.95574815 0.93567279 0.91218531]
[[0.67805556 0.70589363 0.02504688 0.25919887 0.27971133 0.93691473
  0.76819015 0.95574815 0.93567279 0.91218531]]
[[0.67805556 0.70589363 0.02504688 0.25919887 0.27971133 0.93691473
  0.76819015 0.95574815 0.93567279 0.91218531]]
[[0.67805556 0.70589363]
 [0.02504688 0.25919887]
 [0.27971133 0.93691473]
 [0.76819015 0.95574815]
 [0.93567279 0.91218531]]


## Operations on Vectors and Matrices
There are several operations that can be performed on vectors and matrices, and many familiar operations have vector and matrix equivalents

### Addition and Hadamard Product

Addition and Hadamard Product are examples of component-wise binary operations. In such operations, both operands must be of the same size. The output is the same size as the inputs. They are defined as follows:

#### Addition
$
\begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix}
$ $+$ $
\begin{bmatrix} 
b_{11} & b_{12} & \ldots & b_{1m} \\ 
b_{21} & b_{22} & \ldots & b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
b_{n1} & b_{n2} & \ldots & b_{nm} 
\end{bmatrix}
$ $=$ $\begin{bmatrix} 
a_{11} + b_{11} & a_{12} + b_{12} & \ldots & a_{1m}  + b_{1m}\\ 
a_{21} + b_{21} & a_{22} + b_{22} & \ldots & a_{2m} + b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} + b_{n1} & a_{n2} + b_{n2} & \ldots & a_{nm} + b_{nm} 
\end{bmatrix}$

Or more succintly, $(A + B)_{ij} = a_{ij} + b_{ij}$

### Hadamard Product
$
\begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix}
$ $\odot$
$\begin{bmatrix} 
b_{11} & b_{12} & \ldots & b_{1m} \\ 
b_{21} & b_{22} & \ldots & b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
b_{n1} & b_{n2} & \ldots & b_{nm} 
\end{bmatrix}$
 $=$ $\begin{bmatrix} 
a_{11} \cdot b_{11} & a_{12} \cdot b_{12} & \ldots & a_{1m}  \cdot b_{1m}\\ 
a_{21} \cdot b_{21} & a_{22} \cdot b_{22} & \ldots & a_{2m} \cdot b_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} \cdot b_{n1} & a_{n2} \cdot b_{n2} & \ldots & a_{nm} +\cdot b_{nm} 
\end{bmatrix}$

Or more succintly, $(A \odot B)_{ij} = a_{ij} \cdot b_{ij}$


The operations on vectors are defined accordingly. Note that these operations are commutative and associative.




In [16]:
# numpy code

A = np.random.random((3, 2))
B = np.random.random((3, 2))

print(A)
print('+')
print(B)
print('=')
print(A + B)

print("===========================")

print(A)
print('+')
print(B)
print('=')
print(A * B)


[[0.10415958 0.17669042]
 [0.67719257 0.16923912]
 [0.13625725 0.39413645]]
+
[[0.89634298 0.90858737]
 [0.42194416 0.68144016]
 [0.01336825 0.79445802]]
=
[[1.00050255 1.08527779]
 [1.09913673 0.85067928]
 [0.1496255  1.18859448]]
[[0.10415958 0.17669042]
 [0.67719257 0.16923912]
 [0.13625725 0.39413645]]
+
[[0.89634298 0.90858737]
 [0.42194416 0.68144016]
 [0.01336825 0.79445802]]
=
[[0.0933627  0.16053868]
 [0.28573745 0.11532633]
 [0.00182152 0.31312487]]


## Scalar Multiplication and Addition

You can also mulitply and add each component of a vector or matrix by a scalar

### Scalar Multiplication

$\alpha\begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix} = \begin{bmatrix} 
\alpha \cdot a_{11} & \alpha \cdot a_{12} & \ldots & \alpha \cdot a_{1m} \\ 
\alpha \cdot a_{21} & \alpha \cdot a_{22} & \ldots & \alpha \cdot a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
\alpha \cdot a_{n1} & \alpha \cdot a_{n2} & \ldots &  \alpha \cdot a_{nm} 
\end{bmatrix}$

Or more succintly, $(\alpha A)_{ij} = \alpha \cdot a_{ij}$

### Scalar Addition
$b + \begin{bmatrix} 
a_{11} & a_{12} & \ldots & a_{1m} \\ 
a_{21} & a_{22} & \ldots & a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
a_{n1} & a_{n2} & \ldots & a_{nm} 
\end{bmatrix} = \begin{bmatrix} 
b + a_{11} & b + a_{12} & \ldots & b + a_{1m} \\ 
b + a_{21} & b + a_{22} & \ldots & b + a_{2m} \\
\vdots & \vdots & \ldots & \vdots \\ 
b + a_{n1} & b + a_{n2} & \ldots &  b + a_{nm} 
\end{bmatrix}$

Or more succintly, $(b + A)_{ij} = b + a_{ij}$

The operations on vectors are defined accordingly. Note that these operations are commutative and associative.

In [17]:
M1 = np.random.random((5, 2))

print(M1)
print("====================")
print(2 * M1)
print("====================")
print(2 + M1)

[[0.44742846 0.94016838]
 [0.30012059 0.88776268]
 [0.95213562 0.47838518]
 [0.19786747 0.7840602 ]
 [0.25802859 0.73093425]]
[[0.89485691 1.88033675]
 [0.60024119 1.77552536]
 [1.90427124 0.95677035]
 [0.39573494 1.5681204 ]
 [0.51605718 1.4618685 ]]
[[2.44742846 2.94016838]
 [2.30012059 2.88776268]
 [2.95213562 2.47838518]
 [2.19786747 2.7840602 ]
 [2.25802859 2.73093425]]


### Transpose

The transpose operation flips columns and rows

Suppose $M = \begin{bmatrix}2 & 3 & 4\\ 5 & 6 & 7\end{bmatrix}$, then its transpose, denoted as $M^T$ is
$M^{T} = \begin{bmatrix}2 & 5 \\ 3 & 6\\ 4 & 7\end{bmatrix}$

Sometimes for convenience, we write column vectors as the tranpose of row vectors. For example, instead of writing $v = \begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix}$, we write $v = \begin{bmatrix} 2 & 3 & 4 \end{bmatrix}^{T}$

### Matrix multiplication
If we have matrices of compatible dimensions, we can mutliply them. For two matrices to be compatible for matrix mutliplication (abbrev. matul), the number of columns of the first be the same as the number of the rows in the second. The output has the number of rows from the first and the number of columns from the second. 

Consider that we have two matrices A and B of dimensions $m \times n$ and $n \times p$ respectively, the their matrix product is defined as:

$$C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}$$

For example, suppose we have two matrices $A$ and $B$ defined as follows

$$A = \begin{bmatrix}2 & 3 \\ 4 & 5\end{bmatrix}$$

and 

$$B = \begin{bmatrix}1 & 2 & 3\\ 4 & 5 & 6\end{bmatrix}$$

Hence, $C = AB$, is 
$$C = \begin{bmatrix}14 & 19 & 24 \\ 24 & 33 & 42\end{bmatrix}$$

Note that matmul is not commutative, but is associative

In [19]:
x = np.array([[2, 3], [4, 5]])
y = np.array([[1, 2, 3], [4, 5, 6]])
print(x @ y)

[[14 19 24]
 [24 33 42]]


## Vector-Vector Products

Suppose that we have two vectors $u$ and $v$ $\in \mathbb{R}^{n}$. There are two main ways we can defined a product in terms of $u$ and $v$: inner product and outer product.

### Inner Product
The inner product is also called the dot product. Treating vectors $u$ and $v$ as matricies, their dot product is defined as follows:

$$u \bullet v = u^{T}v = u_{1}v_{1} + u_{2}v_{2} + \ldots u_{n}v_{n}$$ 

The dot product is useful as it has a geometric interpretation:
$$cos \theta = \frac{u \bullet v}{\vert\vert u \vert\vert \vert\vert v \vert\vert}$$

where $\theta$ is the angle between vectors $u$ and $v$

### Outer Product
The outer product between vectors $u$ and $v$ are defined as $xv^{T}$. One way to remember the difference between the inner and outer product is that the adjective indicates the placement of the tranpose operation. If it is inside of the vectors it is an inner product, if it is on the right vector it is an outer product.