# Linear Algebra in Deep Learning with Numpy library
This notebook contains basic linear algebra knowledge that will be used in deep learning context. Most of the content comes from [Goodfellow, I. (2016). Deep learning. Chapter 2](https://www.deeplearningbook.org/contents/linear_algebra.html) and combined with corresponding useful numpy packages.

In [2]:
# potential useful packages
import numpy as np

## 1. Scalars, Vectors, Matrices and Tensors

Linear algebra involves three types of mathematical objects:

- **Scalars *a***: A scalar is a single number which is usually written in lowercase italics format. It could be integers, real, natural number, etc. In terms of numpy code, we will have the following representation:

In [3]:
a = 16
b = 1.5

- **Vectors** $\mathbf{x}$: A vector is an array of numbers,represented by lowercase in bold. The element of the vector are written in italic with the same name and a subscript indicating the order. Vectors by default are considered as a column vector with shape $(n,1)$. It can be represented a follows:

$$\mathbf{x} = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$$

> The first element of $\mathbf{x}$ is $x_1$, the second element is $x_2$ and so on. If each element is in $\mathbb{R}$ and the vector has $n$ elements, then the vector lies in the set formed by taking the Cartesian product of $\mathbb{R}$ n times, denoted as $\mathbb{R}^n$

In [35]:
x = np.array([1,2,3])
print(f'''Assume that x is a vector of [1,2,3]. Then we can convert it to NumPy array as follows
{x}
with the shape of {x.shape}. Note that (3,) is a 1D dimension. It could be used as (3,1) or (1,3).''')

Assume that x is a vector of [1,2,3]. Then we can convert it to NumPy array as follows
[1 2 3]
with the shape of (3,). Note that (3,) is a 1D dimension. It could be used as (3,1) or (1,3).


- **Matrices** $\mathbf{A}$: A matrix is a 2-D array of numbers, so each element is identified by two indices instead of just one. It is represented with uppercase name with bold.

$$\mathbf{A} = \begin{bmatrix}A_{1,1} & A_{1,2} \\ A_{2,1} & A_{2,2} \end{bmatrix}$$

> If each element in $\mathbf{A}$ is $\mathbb{R}$ and the matrix has a height of $m$ and a width of $n$, then we say $\mathbf{A} \in \mathbb{R}^{m \times n}$. The element is represented by its name in italics and the subscripts $(i,j)$. $A_{i,j}$ refers to the element located at the ith row and j's column. We can use $A_{i,:}$ to represent the row vector at ith row. And $A_{:,j}$ to represent the column vector at jth column. Besides, $f(\mathbf{A})_{i,j}$ gives element $(i,j)$ of the matrix computed by applying the function $f$ to $\mathbf{A}$

In [33]:
A = np.array([[1,2],[3,4]])
print(f'''Assume that A is a 2x2 matrix. Then we can convert it to NumPy array as follows
{A}
with the shape of {A.shape}.''')

Assume that A is a 2x2 matrix. Then we can convert it to NumPy array as follows
[[1 2]
 [3 4]]
with the shape of (2, 2).


- **Tensors** **A**: Tensors refers to araay with more than two axes. For example, the element in three dimension tensors will be denoted as $\mathbf{A}_{i,j,k}$

In [32]:
A = np.random.rand(2,3,4)
print(f'''Assume that A is a 2x3x4 tensor. Then we can convert it to NumPy array as follows
{A}
with the shape of {A.shape}.''')

Assume that A is a 2x3x4 tensor. Then we can convert it to NumPy array as follows
[[[0.90429045 0.51346806 0.51058203 0.77640878]
  [0.55438188 0.38068479 0.74452339 0.76922769]
  [0.43752228 0.18290167 0.91413351 0.14065467]]

 [[0.74316206 0.20264338 0.16512099 0.75602439]
  [0.93776713 0.93973664 0.85236632 0.81140926]
  [0.63777136 0.15546842 0.65456628 0.02069073]]]
with the shape of (2, 3, 4).


## Transpose
Next, we will look into **transpose**. The transpose of a matrix is the mirror image of the matrix across a diagonal line, called the **main diagonal**, running down to the right, starting from its upper left corner, represented as $[A_{1,1},A_{2,2}]$ given the matrix below.

$$\mathbf{A} = \begin{bmatrix}A_{1,1} & A_{1,2} \\ A_{2,1} & A_{2,2}\\ A_{3,1} & A_{3,2} \end{bmatrix}$$

The transpose of a matrix $\mathbf{A}$, denoted as $\mathbf{A}^T$.

$$(\mathbf{A}^T)_{i,j} = A_{j,i}$$

Therefore, the transpose matrix of a $(m\times n)$ matrix will be in shape $(n \times m)$
$$\mathbf{A} = \begin{bmatrix}A_{1,1} & A_{1,2} \\ A_{2,1} & A_{2,2}\\ A_{3,1} & A_{3,2} \end{bmatrix}$$
$$\mathbf{A}^T = \begin{bmatrix}A_{1,1} & A_{2,1} & A_{3,1} \\ A_{1,2} & A_{2,2} & A_{3,2} \end{bmatrix}$$

Now, let's apply the transpose to different types of objects in the Numpy library. With the Numpy library, we use the function [transpose](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html). 

```numpy.transpose```: <br>
For scalar, it returns the same original scalar <br>
For 1-D array, such as ```x=[1,2,3]```, it returns the same original array. <br>
For 2-D array, this is a standard matrix transpose.<br>
For an n-D array, if the axes are given, the axes indicates how the axes are permuted. If the axes are not provided, then ```transpose(a).shape == a.shape[::-1]```, which means order reversed. <br>
Transpose can be achieved with code ```numpy.transpose(a, axes)``` or ```a.T``` or ```a.transpose()```.


In [46]:
# Scalar: a scalar is its own transpose 
print(f'=========Scalar==========')
a = np.array(1)
print(f'scalar a is {a}, with the shape of {a.shape}. Its transpose is {a.T}.')

# Vector: a column vector can be represented by x=[x1,x2,x3]^T
print(f'========Vector==========')
x=np.array([1,2,3])
print(f'vector x is {x}, with the shape of {x.shape}. Its transpose is {x.T} with the shape of {x.T.shape}')

# Matrix: a standard transpose
print(f'========Matrix==========')
A = np.random.rand(2,3)
print(f'Matrix A is \n {A}\nwith the shape of {A.shape}. Its transpose is \n{A.T}\nwith the shape of {A.T.shape}')

# Tensors: axes are used to define the order to permuted
print(f'========Tensors==========')
A = np.random.rand(2,3,4)
print(f'''Matrix A is 
{A}
with the shape of {A.shape}. Its transpose with axes {(1,2,0)} is 
{A.transpose(1,2,0)}
with the shape of {A.transpose(1,2,0).shape}
If there is no axes specified, its transpose will be 
{A.transpose()} 
with the shape of {A.transpose().shape}, which equals to {A.shape[::-1]} as described above.
''')



scalar a is 1, with the shape of (). Its transpose is 1.
vector x is [1 2 3], with the shape of (3,). Its transpose is [1 2 3] with the shape of (3,)
Matrix A is 
 [[0.62973542 0.48580326 0.00715737]
 [0.83683936 0.45504935 0.69064289]]
with the shape of (2, 3). Its transpose is 
[[0.62973542 0.83683936]
 [0.48580326 0.45504935]
 [0.00715737 0.69064289]]
with the shape of (3, 2)
Matrix A is 
[[[0.42537071 0.64978069 0.56510938 0.11906118]
  [0.691248   0.00104459 0.55017754 0.90614339]
  [0.87149028 0.00828296 0.4460087  0.06388437]]

 [[0.84921842 0.27944867 0.62701422 0.67875341]
  [0.59908552 0.64888357 0.04152135 0.17798288]
  [0.39932848 0.37564803 0.8194944  0.19858984]]]
with the shape of (2, 3, 4). Its transpose with axes (1, 2, 0) is 
[[[0.42537071 0.84921842]
  [0.64978069 0.27944867]
  [0.56510938 0.62701422]
  [0.11906118 0.67875341]]

 [[0.691248   0.59908552]
  [0.00104459 0.64888357]
  [0.55017754 0.04152135]
  [0.90614339 0.17798288]]

 [[0.87149028 0.39932848]
  [0.008

##  Addition and Broadcasting

Matrices can add to each other as long as they have the same shape. $\mathbf{C}=\mathbf{A}+\mathbf{B}$ means $\mathbf{C}_{i,j}=\mathbf{A}_{i,j}+\mathbf{B}_{i,j}$ <br>
>Here is the link to [numpy add function](https://numpy.org/doc/stable/reference/generated/numpy.add.html). More specifically, addition can be performed through ```np.add(x1,x2)``` or simply ```x1+x2```

A Scalar can be added or multiply a matrix. $\mathbf{D}=a \cdot \mathbf{B} + c$ means $\mathbf{D}_{i,j}=a \cdot \mathbf{B}_{i,j}+c$

A vector can be added to a matrix. $\mathbf{C}=\mathbf{A}+\mathbf{b}$ means $\mathbf{C}_{i,j}=\mathbf{A}_{i,j}+\mathbf{b}_j$ In order words, the vector $\mathbf{b}$ is added to each row of the matrix. This implicit copying of $\mathbf{b}$ is called broadcasting. Note that $\mathbf{b}$ needs to have the length as the column numbers in $\mathbf{A}$.


In [45]:
# Matrix Add Matrix
print(f'=========Matrix Addition==========')
A = np.random.rand(2,3)
B = np.random.rand(2,3)
C = A + B
print(f'''Matrix Addition: C = A + B where A is
{A}
with A's shape of {A.shape} and B is 
{B}
with B's shape of {B.shape} Therefore, C is
{C}
with C's shape {C.shape}''')

print(f'=========Scalar Addition or Multiplication with Matrix==========')
B = np.random.rand(2,3)
a = 10
c = 1
D = a*B+c
print(f'''Scalar Addition or Multiplication with Matrix: D = aB +c where B is
{B}
with B's shape of {B.shape} and a is {a}, c is {c}. Therefore, D is
{D}
with D's shape {D.shape}''')

print(f'=========Broadcasting Vector Addition to Matrix==========')
A = np.random.rand(2,3)
b = np.arange(3)+1
C = A + b
print(f'''Broadcasting Vector Addition to Matrix: C = A + b where A is
{A}
with A's shape of {A.shape} and b is 
{b} 
with b's shape of {b.shape}. Therefore,  C is
{C}
with D's shape {C.shape}''')


Matrix Addition: C = A + B where A is
[[0.09594255 0.70736944 0.93644153]
 [0.17836344 0.71707368 0.57324095]]
with A's shape of (2, 3) and B is 
[[0.70167053 0.72664501 0.52842336]
 [0.317407   0.07947515 0.05825339]]
with B's shape of (2, 3) Therefore, C is
[[0.79761308 1.43401445 1.46486489]
 [0.49577044 0.79654883 0.63149435]]
with C's shape (2, 3)
Scalar Addition or Multiplication with Matrix: D = aB +c where B is
[[0.64285344 0.95243844 0.4865877 ]
 [0.53803726 0.16630698 0.29865821]]
with B's shape of (2, 3) and a is 10, c is 1. Therefore, D is
[[ 7.42853437 10.52438439  5.86587701]
 [ 6.38037262  2.66306981  3.98658208]]
with D's shape (2, 3)
Broadcasting Vector Addition to Matrix: C = A + b where A is
[[0.41230429 0.37491798 0.57669525]
 [0.59736341 0.35173594 0.79662354]]
with A's shape of (2, 3) and b is 
[1 2 3] 
with b's shape of (3,). Therefore,  C is
[[1.41230429 2.37491798 3.57669525]
 [1.59736341 2.35173594 3.79662354]]
with D's shape (2, 3)
