## Segment 1: Data Structures for Algebra

### Scalars (Rank 0 Tensors) in Base Python
* No Dimensions
* Single Number
* Denoted in <font color=blue>_lowercase, italics_</font> , e.g.: <font color=green>_x_</font>

In [1]:
x = 7
y = 3.5
py_sum = x + y

print(f"x = {x}\n   type(x) = {type(x)}\ny = {y}\n   type(y) = {type(y)}\nsum = {py_sum}\n   type(sum) = {type(py_sum)}")


x = 7
   type(x) = <class 'int'>
y = 3.5
   type(y) = <class 'float'>
sum = 10.5
   type(sum) = <class 'float'>


### Scalars in PyTorch

* PyTorch and TensorFlow are the two most popular *automatic differentiation* libraries  in Python, itself the most popular programming language in ML
* PyTorch tensors are designed to be pythonic, i.e., to feel and behave like NumPy arrays
* The advantage of PyTorch tensors relative to NumPy arrays is that they easily be used for operations on GPU (see [here](https://pytorch.org/tutorials/beginner/examples_tensor/two_layer_net_tensor.html) for example) 
* Documentation on PyTorch tensors, including available data types, is [here](https://pytorch.org/docs/stable/tensors.html)

In [2]:
import torch

In [3]:
x_pt = torch.tensor(7)
x_pt

tensor(7)

In [4]:
x_pt.shape

torch.Size([])

### Vectors (Rank 1 Tensors) in NumPy
* One-dimensional array
* Arranged in an order, so element can be accessed by its index
   * Elements are scalars so not bold
* Representing a point in space
   * Vector of length two represents location in 2D matrix
   * Length of three represents location in 3d cube
   * Length of n represents location in n-dimensional tensor
* Denoted in <font color=blue>_lowercase, italics, Bold_</font> , e.g.: <font color=green><b>_x_<b></font>

In [5]:
import numpy as np

In [6]:
x = np.array([7, 13, 42])
x

array([ 7, 13, 42])

In [7]:
len(x)

3

In [8]:
x.shape

(3,)

In [9]:
type(x)

numpy.ndarray

In [10]:
x[0]

7

In [11]:
type(x[0])

numpy.int64

In [12]:
# Transposing a regular 1-D array has no effect...
x_t = x.T
x_t

array([ 7, 13, 42])

In [13]:
x_t.shape

(3,)

In [14]:
# ...but it does we use nested "matrix-style" brackets: 
y = np.array([[7, 13, 42]])
y

array([[ 7, 13, 42]])

In [15]:
y.shape

(1, 3)

In [16]:
y_t = y.T
y_t

array([[ 7],
       [13],
       [42]])

In [17]:
y_t.shape

(3, 1)

In [18]:
y_t.T

array([[ 7, 13, 42]])

In [19]:
y_t.T.shape

(1, 3)

### Zero Vectors

Have no effect if added to another vector

In [20]:
z = np.zeros(3)
z

array([0., 0., 0.])

### Vectors in PyTorch

In [21]:
x_pt = torch.tensor([7, 13, 42])
x_pt

tensor([ 7, 13, 42])

### Norm
* Norms are function that quantify vector magnitude

### $L^2$ Norm
* Discribed by: <font color=red>$||x||_{2}=\sqrt{\sum_{i} x_i^2}$</font>

* Measures simple (Euclidean) distance from origin

* Most common norm in machine learning

* Instead of <font color=blue>$||x||_{2}$</font>, it can be denote as <font color=blue>$||x||$</font>

* If ||__x__|| = 1, __x__ is  <font color=blue>_"Unit vector"_</font>



In [22]:
x

array([ 7, 13, 42])

In [23]:
(7**2 + 13**2 + 42**2)**(1/2)

44.51965857910413

In [24]:
np.linalg.norm(x)

44.51965857910413

### $L^1$ Norm

* Discribed by: <font color=red>$||x||_{1}=\sum_{i} |x_i|$</font>
* Another common norm in ML
* used whenever difference between zero and non-zero is key

In [25]:
x

array([ 7, 13, 42])

In [26]:
np.abs(7) + np.abs(13) + np.abs(42)

62

### Generalized $L^p$ Norm

* Discribed by: <font color=red>$||x||_{p}=(\sum_{i} |x_i|^p)^{1/p}$</font>
* p must be :
   * A real number
   * Greater than or equal to one

### Squared $L^2$ Norm

Discribed by: <font color=red>$||x||_{2}^2=\sum_{i} x_i^2 = x^Tx$</font>

In [27]:
x

array([ 7, 13, 42])

In [28]:
(7**2 + 13**2 + 42**2)

1982

In [29]:
# we'll cover tensor multiplication more soon but to prove point quickly: 
np.dot(x, x)

1982

### Max Norm

In [30]:
x

array([ 7, 13, 42])

In [31]:
np.max([np.abs(7), np.abs(13), np.abs(42)])

42

### Orthogonal Vectors

* x and y are orthogonal vectors if <font color=red>$x^Ty = 0$</font>
* n-dimensional space has max n mutually orthogonal vectors
* <font color=blue>_"Orthonormal vectors"_</font> are orthogonal and all have unit norm
   * Basis vectors are an example, i=(1,0), j=(0,1)

In [32]:
i = np.array([1, 0])
i

array([1, 0])

In [33]:
j = np.array([0, 1])
j

array([0, 1])

In [34]:
np.dot(i, j) # detail on the dot operation coming up...

0

### Matrices (Rank 2 Tensors) in NumPy

* Two-dimensional array of numbers
* Denoted in <font color=blue>_Uppercase, Italics, Bold_</font> , e.g.: <font color=green><b>_X_<b></font>
* Individual scalar elements denoted in uppercase, italics only
    * Element in top-left corner of matrix <font color=green><b>_X_<b></font> above would be <font color=green>$X_{0, 0}$</font>
    

In [35]:
# Use array() with nested brackets: 
X = np.array([[25, 2], [5, 26], [3, 7]])
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [36]:
X.shape

(3, 2)

In [37]:
X.size

6

In [38]:
# Select left column of matrix X (zero-indexed)
X[:,0]

array([25,  5,  3])

In [39]:
# Select middle row of matrix X: 
X[1,:]

array([ 5, 26])

In [40]:
# Another slicing-by-index example: 
X[0:2, 0:2]

array([[25,  2],
       [ 5, 26]])

### Matrices in PyTorch


In [41]:
X_pt = torch.tensor([[25, 2], [5, 26], [3, 7]])
X_pt

tensor([[25,  2],
        [ 5, 26],
        [ 3,  7]])

In [42]:
X_pt.shape

torch.Size([3, 2])

In [43]:
X_pt[1,:]

tensor([ 5, 26])

### Higher-Rank Tensors

* Denoted in <font color=blue>_Uppercase, Italics, Bold, Sans Serif_</font> 

As an example, rank 4 tensors are common for images, where each dimension corresponds to: 

1. Number of images in training batch, e.g., 32
2. Image height in pixels, e.g., 28
3. Image width in pixels, e.g., 28
4. Number of color channels, e.g., 3 for full-color images (RGB)

In [44]:
images_pt = torch.zeros([32, 28, 28, 3])

In [46]:
#images_pt

# Segment 2: Common Tensor Operations

### Tensor Transposition

In [47]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [48]:
X.shape

(3, 2)

In [49]:
X.T

array([[25,  5,  3],
       [ 2, 26,  7]])

In [50]:
X_pt.T

tensor([[25,  5,  3],
        [ 2, 26,  7]])

In [51]:
X.T.shape

(2, 3)

### Basic Arithmetical Properties
Adding or multiplying with scalar applies operation to all elements and tensor shape is retained: 

In [52]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [53]:
X*2

array([[50,  4],
       [10, 52],
       [ 6, 14]])

In [54]:
X+2

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [55]:
X*2+2

array([[52,  6],
       [12, 54],
       [ 8, 16]])

In [56]:
X_pt*2+2 # could alternatively use torch.mul() or torch.add()

tensor([[52,  6],
        [12, 54],
        [ 8, 16]])

In [57]:
torch.add(torch.mul(X_pt, 2), 2)

tensor([[52,  6],
        [12, 54],
        [ 8, 16]])

### Hadamard product
If two tensors have the same size, operations are often by default applied element-wise. This is **not matrix multiplication**, which we'll cover later, but is rather called the <font color=red>**Hadamard product**</font> or simply the <font color=red>**element-wise product**. </font>

The mathematical notation is $A \odot X$

In [58]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [59]:
A = X+2
A

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [60]:
A * X

array([[675,   8],
       [ 35, 728],
       [ 15,  63]])

In [61]:
A_pt = X_pt + 2

In [62]:
A_pt * X_pt

tensor([[675,   8],
        [ 35, 728],
        [ 15,  63]])

### Reduction

Calculating the sum across all elements of a tensor is a common operation. For example: 

* For vector ***x*** of length *n*, we calculate $\sum_{i=1}^{n} x_i$
* For matrix ***X*** with *m* by *n* dimensions, we calculate $\sum_{i=1}^{m} \sum_{j=1}^{n} X_{i,j}$

In [63]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [64]:
X.sum()

68

In [65]:
torch.sum(X_pt)

tensor(68)

In [66]:
# Can also be done along one specific axis alone, e.g.:
X.sum(axis=0) # summing over all rows

array([33, 35])

In [67]:
X.sum(axis=1) # summing over all columns

array([27, 31, 10])

In [68]:
torch.sum(X_pt, 0)

tensor([33, 35])

In [69]:
torch.sum(X_pt, 1)

tensor([27, 31, 10])

Many other operations can be applied with reduction along all or a selection of axes, e.g.:

* maximum
* minimum
* mean
* product

They're fairly straightforward and used less often than summation, so you're welcome to look them up in library docs if you ever need them.

### The Dot Product

For two vectors with the same length *n* $x, y \in \mathbb{R}^n$
* $x \cdot y$
* $x^Ty$
* $\langle x,y \rangle$

Regardless which notation you use (I prefer the first), the calculation is the same; we calculate products in an element-wise fashion and then sum reductively across the products to a scalar value.

$x \cdot y = x^T y \in \mathbb{R} = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} \begin{bmatrix} y_1 \\[0.3em] y_2 \\[0.3em] \vdots \\[0.3em] y_n \end{bmatrix} = \sum_{i=1}^{n}{x_i y_i}$

The inner products are a special case of matrix multiplication.

It is always the case that $x^T y = y^T x$.

In [70]:
x

array([ 7, 13, 42])

In [71]:
y = np.array([0, 1, 2])
y

array([0, 1, 2])

In [72]:
7*0 + 13*1 + 42*2

97

In [73]:
np.dot(x, y)

97

In [74]:
x_pt

tensor([ 7, 13, 42])

In [75]:
y_pt = torch.tensor([0, 1, 2])
y_pt

tensor([0, 1, 2])

In [76]:
np.dot(x_pt, y_pt)

97

In [79]:
torch.dot(torch.tensor([7, 13, 42.]), torch.tensor([0, 1, 2.]))

tensor(97.)

## Segment 3: Matrix Properties

### Frobenius Norm

* Discribed by: <font color=red>$||X||_{F}=\sqrt{\sum_{i, j} X_{i, j}^2}$</font>
* Analogous to $L^2$ norm of vector
* Measures the size of matrix in ters of Euclidean distance

In [80]:
X = np.array([[1, 2], [3, 4]])
X

array([[1, 2],
       [3, 4]])

In [81]:
(1**2 + 2**2 + 3**2 + 4**2)**(1/2)

5.477225575051661

In [82]:
np.linalg.norm(X) # same function as for vector L2 norm

5.477225575051661

In [83]:
X_pt = torch.tensor([[1, 2], [3, 4.]]) # torch.norm() supports floats only

In [84]:
torch.norm(X_pt)

tensor(5.4772)

### Matrix Multiplication (with a Vector)

In [85]:
A = np.array([[3, 4], [5, 6], [7, 8]])
A

array([[3, 4],
       [5, 6],
       [7, 8]])

In [86]:
b = np.array([1, 2])
b

array([1, 2])

In [87]:
np.dot(A, b) # even though technically dot products are between vectors only

array([11, 17, 23])

In [88]:
A_pt = torch.tensor([[3, 4], [5, 6], [7, 8]])
A_pt

tensor([[3, 4],
        [5, 6],
        [7, 8]])

In [89]:
b_pt = torch.tensor([1, 2])
b_pt

tensor([1, 2])

In [90]:
torch.matmul(A_pt, b_pt)

tensor([11, 17, 23])

### Matrix Multiplication (with Two Matrices)
The result of the multiplication of two matrixes $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{n \times p}$ is the matrix:

$C = AB \in \mathbb{R}^{m \times n}$

That is, we are multiplying the columns of $A$ with the rows of $B$:

$C_{ij}=\sum_{k=1}^n{A_{ij}B_{kj}}$

The number of columns in $A$ must be equal to the number of rows in $B$.

In [91]:
A

array([[3, 4],
       [5, 6],
       [7, 8]])

In [92]:
B = np.array([[1, 9], [2, 0]])
B

array([[1, 9],
       [2, 0]])

In [93]:
np.dot(A, B)

array([[11, 27],
       [17, 45],
       [23, 63]])

Note that matrix multiplication is not "commutative" (i.e., $AB \neq BA$) so uncommenting the following line will throw a size mismatch error:

In [95]:
#np.dot(B, A)

In [96]:
B_pt = torch.from_numpy(B)
B_pt

tensor([[1, 9],
        [2, 0]])

In [97]:
# another neat way to create the same tensor with transposition: 
B_pt = torch.tensor([[1, 2], [9, 0]]).T
B_pt

tensor([[1, 9],
        [2, 0]])

In [98]:
torch.matmul(A_pt, B_pt)

tensor([[11, 27],
        [17, 45],
        [23, 63]])

## Special Matrices

### Symmetric Matrices

* Square
* $X^T = X$

\begin{pmatrix}
0 & 1 & 2\\
1 & 7 & 8\\
2 & 8 & 9
\end{pmatrix}

In [99]:
X_sym = np.array([[0, 1, 2], [1, 7, 8], [2, 8, 9]])

In [100]:
X_sym.T == X_sym

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

### Identity Matrices

\begin{pmatrix}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1
\end{pmatrix}

In [101]:
I = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])

In [102]:
x_pt = torch.tensor([25, 2, 5])
x_pt

tensor([25,  2,  5])

In [103]:
torch.matmul(I, x_pt)

tensor([25,  2,  5])

### Hilbert matrix
In linear algebra, a Hilbert matrix, introduced by Hilbert (1894), is a ***square matrix*** with entries being the unit fractions:

$$ H_{i,j} = \frac{1}{i + j - 1}$$

\begin{pmatrix}
1 & 1/2 & 1/3\\
1/2 & 1/3 & 1/4\\
1/3 & 1/4 & 1/5
\end{pmatrix}

The Hilbert matrix can be regarded as derived from the integral

$$H_{i,j} = \int_{0}^1 x^{i+j-2}dx$$

Hilbert matrices are examples of "ill-conditioned" matrices that are difficult to use in numerical calculations.

In [104]:
from scipy.linalg import hilbert

In [105]:
hilbert(3)

array([[1.        , 0.5       , 0.33333333],
       [0.5       , 0.33333333, 0.25      ],
       [0.33333333, 0.25      , 0.2       ]])

### Hermitian matrix
In mathematics, a **Hermitian matrix** (or **self-adjoint matrix**) is a **complex square matrix** that is equal to its own **conjugate transpose**â€”that is, the element in the i-th row and j-th column is equal to the complex conjugate of the element in the j-th row and i-th column
for all indices i and j:

$$ A Hermitian \leftrightarrow a_{ij} = \overline{a_{ij}} $$

\begin{pmatrix}
3 & 3 + i\\
3 - i & 2
\end{pmatrix}

complex conjugates:

\begin{pmatrix}
3 & 3 - i\\
3 + i & 2
\end{pmatrix}

Transpose:

\begin{pmatrix}
3 & 3 + i\\
3 - i & 2
\end{pmatrix}


### Jacobian matrix

$$ J_{i, j} = \displaystyle \frac{\partial f_i}{\partial x_{j}} $$

\begin{equation}
J_{f}=\begin{bmatrix}\frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_{2}} & \cdots & \frac{\partial f_1}{\partial x_{n}}\\
\frac{\partial f_2}{\partial x_{1}} & \frac{\partial f_2}{\partial x_{2}} & \cdots & \frac{\partial f_2}{\partial x_{n}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial f_m}{\partial x_{1}} & \frac{\partial f_m}{\partial x_{2}} & \cdots & \frac{\partial f_m}{\partial x_{n}}
\end{bmatrix}
\end{equation}

### Hessian matrix

\begin{equation}
H_{f}=\begin{bmatrix}\frac{\partial^{2}f}{\partial x_{1}^{2}} & \frac{\partial^{2}f}{\partial x_{1}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{1}\partial x_{n}}\\
\frac{\partial^{2}f}{\partial x_{2}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{2}^{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{2}\partial x_{n}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial^{2}f}{\partial x_{n}\partial x_{1}} & \frac{\partial^{2}f}{\partial x_{n}\partial x_{2}} & \cdots & \frac{\partial^{2}f}{\partial x_{n}^{2}}
\end{bmatrix}
\end{equation}

That is, the entry of the i-th row and the j-th column is:

$$ (H_f)_{i,j} = \displaystyle \frac{\partial^{2}f}{\partial x_{i}\partial x_{j}} $$

The Hessian matrix of a function $f$ is the ***Jacobian matrix*** of the gradient of the function $f$; that is:

$$ H(f(x)) = J(\nabla f(x))$$


## Matrix Inversion

$X^{-1}X = I_n$

In [106]:
X = np.array([[4, 2], [-5, -3]])
X

array([[ 4,  2],
       [-5, -3]])

In [107]:
Xinv = np.linalg.inv(X)
Xinv

array([[ 1.5,  1. ],
       [-2.5, -2. ]])

### Matrix Inversion Where No Solution
Can only be calculated if:
* Matrix isn't "singular"
   * that is, all columns of matrix must be linearly independent
   * Matrix is square $n_{row} = n_{col}$ (i.e. "vector span" = "matrix range"
      * avoids "Overdetermination" $n_{row} > n_{col}$ (i.e. $n_{equations} > n_{dimensions}$)
      * avoids "Underdetermination" $n_{row} < n_{col}$ (i.e. $n_{equations} < n_{dimensions}$)

Note that solving equation may still be possible by other means if matrix can't be inverted

In [108]:
X = np.array([[-4, 1], [-8, 2]])
X

array([[-4,  1],
       [-8,  2]])

In [None]:
# Uncommenting the following line results in a "singular matrix" error
# Xinv = np.linalg.inv(X)