<a href="https://colab.research.google.com/github/2k177/ML/blob/main/ML_foundation/notebooks/1_linear_algebra_into.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Segment 1: Data Structures for Algebra

###Scalars (Rank 0 Tensors) in Base Python

In [1]:
x = 25
x

25

In [2]:
type(x)

int

In [3]:
y = 3

In [4]:
sum = x+y
sum

28

In [5]:
x_float = 25.0
float_sum = x_float + y
float_sum

28.0

In [6]:
type(float_sum)

float

##Scalars in PyTorch


1. PyTorch and TensorFlow are the two most popular automatic differentiation libraries (a focus of the Calculus I and Calculus II subjects in the ML Foundations series) in Python, itself the most popular programming language in ML
2. PyTorch tensors are designed to be pythonic, i.e., to feel and behave like NumPy arrays
3. The advantage of PyTorch tensors relative to NumPy arrays is that they easily be used for operations on GPU.
4. Documentation on PyTorch tensors, including available data types

In [7]:
import torch

In [8]:
x_pt = torch.tensor(25) #type specification is optional

In [9]:
x_pt

tensor(25)

In [10]:
x_pt = torch.tensor(25, dtype=torch.float16)

In [11]:
x_pt

tensor(25., dtype=torch.float16)

In [12]:
x_pt.shape #no dimention Bcoz scalar tension

torch.Size([])

### Scalars in TensorFlow (version 2.0 or later)

Tensors created with a wrapper, all of which [you can read about here](https://www.tensorflow.org/guide/tensor):  

* `tf.Variable`
* `tf.constant`
* `tf.placeholder`
* `tf.SparseTensor`

Most widely-used is `tf.Variable`, which we'll use here. 

As with TF tensors, in PyTorch we can similarly perform operations, and we can easily convert to and from NumPy arrays

Also, a full list of tensor data types is available [here](https://www.tensorflow.org/api_docs/python/tf/dtypes/DType).

In [13]:
import tensorflow as tf

In [14]:
x_tf = tf.Variable(25, dtype=tf.int16)
x_tf

<tf.Variable 'Variable:0' shape=() dtype=int16, numpy=25>

In [15]:
x_tf.shape

TensorShape([])

In [16]:
y_tf = tf.Variable(3, dtype=tf.int16)

In [17]:
x_tf + y_tf

<tf.Tensor: shape=(), dtype=int16, numpy=28>

In [18]:
x_tf + 2 #tf_variable can be operated with common python operation

<tf.Tensor: shape=(), dtype=int16, numpy=27>

In [19]:
tf_sum = tf.add(x_tf, y_tf)
tf_sum

<tf.Tensor: shape=(), dtype=int16, numpy=28>

In [20]:
tf_sum.numpy()

28

## Vectors (Rank 1 tensors) in Numpy

In [21]:
import numpy as np

In [22]:
x = np.array([25, 2, 5])
x

array([25,  2,  5])

In [23]:
len(x)

3

In [24]:
x.shape

(3,)

In [25]:
type(x)

numpy.ndarray

In [26]:
x[0]

25

In [27]:
type(x[0])

numpy.int64

## Vector transposition

In [28]:
# Transposing a regular 1-D array has no effect.....:)
x_t = x.T
x_t

array([25,  2,  5])

In [29]:
x_t.shape

(3,)

In [30]:
y = np.array([[12,2,5]])
y

array([[12,  2,  5]])

In [31]:
y.shape

(1, 3)

In [32]:
y_t = y.T
y_t

array([[12],
       [ 2],
       [ 5]])

In [33]:
y_t.shape

(3, 1)

In [34]:
y_t.T

array([[12,  2,  5]])

In [35]:
y_t.T.shape

(1, 3)

## Zero vectors
Have no effect if added to another vector

In [36]:
z= np.zeros(3)
x

array([25,  2,  5])

## Vecor in pytorch and Tensorflow

In [37]:
x_pt = torch.tensor([25,5,3])
x_pt

tensor([25,  5,  3])

In [38]:
x_tf = tf.Variable([25, 2, 5])
x_tf

<tf.Variable 'Variable:0' shape=(3,) dtype=int32, numpy=array([25,  2,  5], dtype=int32)>

# $L^2$  Norm

In [39]:
x

array([25,  2,  5])

In [40]:
(25**2 + 2**2 + 5**2) ** (1/2) #euclidain distance from origin

25.573423705088842

In [41]:
np.linalg.norm(x)

25.573423705088842

##$L^1$ Norm

In [42]:
x

array([25,  2,  5])

In [43]:
np.abs(25) + np.abs(2) + np.abs(5)

32

## Squared $L^2$  norm

In [44]:
x

array([25,  2,  5])

In [45]:
(25**2 + 2**2 + 5**2)

654

In [46]:
np.dot(x,x) #sqauared L2 norm = xT.x

654

## Max norm

In [47]:
x

array([25,  2,  5])

In [48]:
np.max([np.abs(25), np.abs(2), np.abs(5)])

25

## Orthogonal vectors

In [49]:
i = np.array([1, 0])
i

array([1, 0])

In [50]:
j = np.array([0, 1])
j

array([0, 1])

In [51]:
np.dot(i, j)

0

# Matrices(Rank 2 tensor) in Numpy

In [52]:
# Use array() with nested brackets
X = np.array([[25, 2], [5,26], [3,7]])
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [53]:
X.shape

(3, 2)

In [54]:
X.size

6

In [55]:
# select left colum of matrix X (Zero-indexed)
X[:,0]

array([25,  5,  3])

In [56]:
# middle row
X[1,:]

array([ 5, 26])

In [57]:
#Another slicing by index example
X[0:2, 0:2]

array([[25,  2],
       [ 5, 26]])

## Matrices in Pytorch

In [58]:
X_pt = torch.tensor([[25,2], [5,26], [3,7]])
X_pt

tensor([[25,  2],
        [ 5, 26],
        [ 3,  7]])

In [59]:
X_pt.shape

torch.Size([3, 2])

In [60]:
X_pt[1,:]

tensor([ 5, 26])

## Matrices in TensorFlow

In [61]:
X_tf = tf.Variable([[25,3], [2,5], [3,11]])
X_tf

<tf.Variable 'Variable:0' shape=(3, 2) dtype=int32, numpy=
array([[25,  3],
       [ 2,  5],
       [ 3, 11]], dtype=int32)>

In [62]:
X_tf.shape

TensorShape([3, 2])

In [63]:
X_tf[0,:]

<tf.Tensor: shape=(2,), dtype=int32, numpy=array([25,  3], dtype=int32)>

In [64]:
tf.rank(X_tf) #2 dimentional

<tf.Tensor: shape=(), dtype=int32, numpy=2>

### Higher-Rank Tensors

As an example, rank 4 tensors are common for images, where each dimension corresponds to: 

1. Number of images in training batch, e.g., 32
2. Image height in pixels, e.g., 28 for [MNIST digits](http://yann.lecun.com/exdb/mnist/)
3. Image width in pixels, e.g., 28
4. Number of color channels, e.g., 3 for full-color images (RGB)

In [65]:
image_pt = torch.zeros([32, 28, 28, 3])
image_pt

tensor([[[[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         ...,

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]],

         [[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.],
          ...,
          [0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]]],


        [[[0., 0.

In [66]:
image_tf = tf.zeros([32, 28, 28, 3])
image_tf

<tf.Tensor: shape=(32, 28, 28, 3), dtype=float32, numpy=
array([[[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        ...,

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         ...,
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]],


   

## Segment 2: Common Tensor Operations

### Tensor Transposition

In [67]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [68]:
X.T

array([[25,  5,  3],
       [ 2, 26,  7]])

In [69]:
X_pt.T

tensor([[25,  5,  3],
        [ 2, 26,  7]])

In [70]:
tf.transpose(X_tf)

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[25,  2,  3],
       [ 3,  5, 11]], dtype=int32)>

## Basic Arithmetic operation

In [71]:
X*2

array([[50,  4],
       [10, 52],
       [ 6, 14]])

In [72]:
X+2

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [73]:
X*2+2

array([[52,  6],
       [12, 54],
       [ 8, 16]])

In [74]:
X_pt*2+2 #Python operators are overloaded

tensor([[52,  6],
        [12, 54],
        [ 8, 16]])

In [75]:
torch.add(torch.mul(X_pt,2),2)

tensor([[52,  6],
        [12, 54],
        [ 8, 16]])

In [76]:
X_tf*2+2

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[52,  8],
       [ 6, 12],
       [ 8, 24]], dtype=int32)>

In [77]:
tf.add(tf.multiply(X_tf,2),2)

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[52,  8],
       [ 6, 12],
       [ 8, 24]], dtype=int32)>

If two tensors have the same size, operations are often by default applied element-wise. This is **not matrix multiplication**, which we'll cover later, but is rather called the **Hadamard product** or simply the **element-wise product**. 

The mathematical notation is $A \odot X$

In [78]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [79]:
A = X+2
A

array([[27,  4],
       [ 7, 28],
       [ 5,  9]])

In [80]:
A+X

array([[52,  6],
       [12, 54],
       [ 8, 16]])

In [81]:
A * X

array([[675,   8],
       [ 35, 728],
       [ 15,  63]])

In [82]:
A_pt = X_pt + 2

In [83]:
A_pt

tensor([[27,  4],
        [ 7, 28],
        [ 5,  9]])

In [84]:
A_pt * X_pt

tensor([[675,   8],
        [ 35, 728],
        [ 15,  63]])

In [85]:
A_pt + X_pt

tensor([[52,  6],
        [12, 54],
        [ 8, 16]])

In [86]:
A_tf = X_pt + 2

In [87]:
A_tf + X_tf

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[52,  7],
       [ 9, 33],
       [ 8, 20]], dtype=int32)>

In [88]:
A_tf * X_tf

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[675,  12],
       [ 14, 140],
       [ 15,  99]], dtype=int32)>

## Reduction

Calculating the sum across all elements of a tensor is a common operation. For example: 

* For vector ***x*** of length *n*, we calculate $\sum_{i=1}^{n} x_i$
* For matrix ***X*** with *m* by *n* dimensions, we calculate $\sum_{i=1}^{m} \sum_{j=1}^{n} X_{i,j}$

In [89]:
X

array([[25,  2],
       [ 5, 26],
       [ 3,  7]])

In [90]:
X.sum()

68

In [91]:
torch.sum(X_pt)

tensor(68)

In [92]:
tf.reduce_sum(X_tf)

<tf.Tensor: shape=(), dtype=int32, numpy=49>

In [93]:
X.sum(axis=0) # summing all rows

array([33, 35])

In [94]:
X.sum(axis=1) #Summing all columns

array([27, 31, 10])

In [95]:
torch.sum(X_pt, 0)

tensor([33, 35])

In [96]:
torch.sum(X_pt, 1)

tensor([27, 31, 10])

In [97]:
tf.reduce_sum(X_tf,1)

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([28,  7, 14], dtype=int32)>

Many other operations can be applied with reduction along all or a selection of axes, e.g.:

* maximum
* minimum
* mean
* product

They're fairly straightforward and used less often than summation, so you're welcome to look them up in library docs if you ever need them.

## The Dot product

If we have two vectors (say, ***x*** and ***y***) with the same length *n*, we can calculate the dot product between them. This is annotated several different ways, including the following: 

* $x \cdot y$
* $x^Ty$
* $\langle x,y \rangle$

Regardless which notation you use (I prefer the first), the calculation is the same; we calculate products in an element-wise fashion and then sum reductively across the products to a scalar value. That is, $x \cdot y = \sum_{i=1}^{n} x_i y_i$

The dot product is ubiquitous in deep learning: It is performed at every artificial neuron in a deep neural network, which may be made up of millions (or orders of magnitude more) of these neurons.

In [98]:
x

array([25,  2,  5])

In [99]:
y = np.array([0,1,2])
y

array([0, 1, 2])

In [100]:
25*0 + 2*1 + 5*2

12

In [101]:
np.dot(x,y)

12

In [102]:
x_pt

tensor([25,  5,  3])

In [103]:
y_pt = torch.tensor([0,1,2])
y_pt

tensor([0, 1, 2])

In [104]:
torch.dot(x_pt, y_pt)

tensor(11)

In [105]:
np.dot(x_pt, y_pt)

11

In [106]:
torch.dot(torch.tensor([25,2,5.]),torch.tensor([0,1,2.]))

tensor(12.)

In [107]:
x_tf

<tf.Variable 'Variable:0' shape=(3,) dtype=int32, numpy=array([25,  2,  5], dtype=int32)>

In [108]:
y_tf = tf.Variable([0,1,2])
y_tf

<tf.Variable 'Variable:0' shape=(3,) dtype=int32, numpy=array([0, 1, 2], dtype=int32)>

In [109]:
tf.reduce_sum(tf.multiply(x_tf, y_tf))

<tf.Tensor: shape=(), dtype=int32, numpy=12>

### Matrix Multiplication (with a Vector)

In [110]:
A = np.array([[3,4], [5,6], [7,8]])
A

array([[3, 4],
       [5, 6],
       [7, 8]])

In [111]:
b = np.array([1,2])
b

array([1, 2])

In [112]:
np.dot(A, b) #np.dot does bothe dot product and matrix multiplication

array([11, 17, 23])

In [113]:
A_pt = torch.tensor([[3,4], [5,6], [7,8]])
A_pt

tensor([[3, 4],
        [5, 6],
        [7, 8]])

In [114]:
b_tp = torch.tensor([1,2])
b_tp

tensor([1, 2])

In [115]:
torch.matmul(A_pt, b_tp)

tensor([11, 17, 23])

In [116]:
A_tf = tf.Variable([[3,4], [5,6], [7,8]])
A_tf

<tf.Variable 'Variable:0' shape=(3, 2) dtype=int32, numpy=
array([[3, 4],
       [5, 6],
       [7, 8]], dtype=int32)>

In [117]:
b_tf = tf.Variable([1,2])
b_tf

<tf.Variable 'Variable:0' shape=(2,) dtype=int32, numpy=array([1, 2], dtype=int32)>

In [118]:
tf.linalg.matvec(A_tf, b_tf)

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([11, 17, 23], dtype=int32)>

## Matrix Multiplication(with two matrices)


In [119]:
A

array([[3, 4],
       [5, 6],
       [7, 8]])

In [120]:
B = np.array([[1,9], [2,0]])
B

array([[1, 9],
       [2, 0]])

In [121]:
np.dot(A,B)

array([[11, 27],
       [17, 45],
       [23, 63]])

Note the matrix multiplication is not commutative (i.e., AB != BA)

In [122]:
np.dot(B,A)

ValueError: ignored

In [123]:
B_pt = torch.from_numpy(B)
B_pt

tensor([[1, 9],
        [2, 0]])

In [124]:
# Another neat way to create same tensor with transiposition 
B_pt = torch.tensor([[1,2],[9,0]]).T
B_pt

tensor([[1, 9],
        [2, 0]])

In [125]:
torch.matmul(A_pt, B_pt)

tensor([[11, 27],
        [17, 45],
        [23, 63]])

In [126]:
B_tf = tf.convert_to_tensor(B, dtype=tf.int32)
B_tf

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[1, 9],
       [2, 0]], dtype=int32)>

In [127]:
tf.matmul(A_tf, B_tf) 

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[11, 27],
       [17, 45],
       [23, 63]], dtype=int32)>

## Symmetric Matrices

In [128]:
X_sym = np.array([[0,1,2],[1,7,8], [2,8,9]])
X_sym

array([[0, 1, 2],
       [1, 7, 8],
       [2, 8, 9]])

In [129]:
X_sym.T

array([[0, 1, 2],
       [1, 7, 8],
       [2, 8, 9]])

In [130]:
X_sym.T == X_sym

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

#identical Matrices

In [131]:
I_pt = torch.tensor([[1,0,0], [0,1,0], [0,0,1]])
I_pt

tensor([[1, 0, 0],
        [0, 1, 0],
        [0, 0, 1]])

In [132]:
x_pt = torch.tensor([25,2,5])
x_pt

tensor([25,  2,  5])

In [133]:
torch.matmul(I_pt, x_pt)

tensor([25,  2,  5])

## Frobenius Norm

In [134]:
X = np.array([[1,2],[3,4]])
X

array([[1, 2],
       [3, 4]])

In [135]:
(1**2 + 2**2 + 3**2 + 4**2) ** (1/2)

5.477225575051661

In [136]:
np.linalg.norm(X) #Same function which we used for L2 norm

5.477225575051661

In [137]:
X_pt = torch.tensor([[1,2], [3,4.]])
X_pt

tensor([[1., 2.],
        [3., 4.]])

In [138]:
torch.norm(X_pt)

tensor(5.4772)

In [139]:
X_tf = tf.Variable([[1,2], [3,4.]])
X_tf

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[1., 2.],
       [3., 4.]], dtype=float32)>

In [140]:
tf.norm(X_tf)

<tf.Tensor: shape=(), dtype=float32, numpy=5.477226>

## Matrix Inversion

In [142]:
X = np.array([[4,2],[-5,-3]])
X

array([[ 4,  2],
       [-5, -3]])

In [149]:
y = np.array([4,-7])
y

array([ 4, -7])

In [150]:
Xinv = np.linalg.inv(X)
Xinv

array([[ 1.5,  1. ],
       [-2.5, -2. ]])

In [151]:
w = np.dot(Xinv, y)
w

array([-1.,  4.])

In [153]:
#Show that y= Xw
y = np.dot(X,w)
y

array([ 4., -7.])

In pytorch and tensorflow

In [154]:
torch.inverse(torch.tensor([[4,2],[-5,-3.]])) #float

tensor([[ 1.5000,  1.0000],
        [-2.5000, -2.0000]])

In [155]:
tf.linalg.inv(tf.Variable([[4,2],[-5,-2.]]))

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-1.0000001, -1.0000001],
       [ 2.5000002,  2.0000002]], dtype=float32)>

## Matrix Inversion where no soution

In [156]:
X = np.array([[-4,1],[-8,2]])
X

array([[-4,  1],
       [-8,  2]])

In [157]:
Xinv = np.linalg.inv(X) 

LinAlgError: ignored

## Trace operator

In [160]:
A = np.array([[25,2],[5,4]])
A

array([[25,  2],
       [ 5,  4]])

In [161]:
25+4

29

In [163]:
np.trace(A)

29

In [182]:
A_pt = torch.tensor([[25,2],[5,4.]])
A_pt

tensor([[25.,  2.],
        [ 5.,  4.]])

In [183]:
torch.trace(A_pt)

tensor(29.)

In [184]:
A_tf = tf.Variable([[25,2],[5,4.]])
A_tf

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[25.,  2.],
       [ 5.,  4.]], dtype=float32)>

In [185]:
tf.linalg.trace(A_tf)

<tf.Tensor: shape=(), dtype=float32, numpy=29.0>

In [186]:
#Prove forbeniuos norm == trace(AAT)^0.5

In [187]:
A_pt

tensor([[25.,  2.],
        [ 5.,  4.]])

In [188]:
A_T = A_pt.T
A_T

tensor([[25.,  5.],
        [ 2.,  4.]])

In [192]:
result = torch.matmul(A_pt, A_pt.T)
result
torch.trace(result) ** 0.5

tensor(25.8844)

In [190]:
A_pt = torch.tensor([[25,2],[5,4.]])
torch.norm(A_pt)

tensor(25.8844)