<a href="https://colab.research.google.com/github/bharaniabhishek123/ML-Introduction/blob/main/Linear_Algebra_and_Pytorch_Tensor_Operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The most important data structure in linear algebra which applies to Machine Learning is arguably the matrix, a 2-d array of numbers where each entry can be indexed via its row and column. 


We can think of an Excel spreadsheet, where you have offers from Company X and Company Y as two rows, and the columns represents some characteristic of each offer such as starting salary, bonus, or position.



``` 
              Salary        Bonus            Position

Google        150,000       23,000           Software Engineer

Facebook      180,000       27,000           Data Scientist

```
The table format is especially suited to keep track of such data, where you can index by row and column to find, for example, Company X’s starting position. Matrices, similarly, are a multipurpose tool to hold all kinds of data, where the data we work in this book are of numerical form.




**In deep learning, matrices are often used to represent both datasets and weights in a neural network. A dataset, for example, has many individual data points with any number of associated features.**

$X_{m,n} =
 \begin{pmatrix}
  x_{1,1} & x_{1,2} & \cdots & x_{1,n} \\
  X_{2,1} & x_{2,2} & \cdots & x_{2,n} \\
  \vdots  & \vdots  & \ddots & \vdots  \\
  x_{m,1} & x_{m,2} & \cdots & x_{m,n}
 \end{pmatrix}$

**Matrix Operations**

Matrices can be added, subtracted, and multiplied - there is no division of matrices, but there exists a similar concept called inversion. 

 $ \begin{pmatrix}
  2 & 3 & 4  \\
  5 & 6 & 7  \\       
 \end{pmatrix} 
       +
\begin{pmatrix}
  5 & 6 & 7  \\
  8 & 9 & 1  \\       
 \end{pmatrix}  
 = 
\begin{pmatrix}
  7 & 9 & 11  \\
  13 & 15 & 9  \\       
 \end{pmatrix}  \\ $


$2 * \begin{pmatrix}
  2 & 3 & 4  \\
  5 & 6 & 7  \\       
 \end{pmatrix} = 
 \begin{pmatrix} 2*2 & 2*3 & 2*4  \\ 2*5 & 2*6 & 2*7  \\   \end{pmatrix}
 = \begin{pmatrix} 4 & 6 & 8  \\ 10 & 12 & 14  \\       \end{pmatrix}  $

$ \begin{pmatrix}
  2 & 3   \\
  4 & 5   \\  \end{pmatrix} * \begin{pmatrix}
  5 & 6   \\
  7 & 8   \\       
 \end{pmatrix} = \begin{pmatrix} 2*5 + 3*7 & 2*6 + 3*8 \\ 4*5 + 5*7 & 4*6 + 5 * 8 \\       \end{pmatrix}= \begin{pmatrix} 31 &  30  \\ 55  & 64  \\       \end{pmatrix}  $

Note : Two matrices are only multiplicable if the dimensions align, i.e. A is of dimension m by k and B is of dimension k by n. 

Other way of saying if the rows of A and the columns of B must have the same length, so two matrices can only be multiplied


If this weren’t the case, the formula for matrix multiplication would give us an indexing error. 

We’ll call the formula for matrix multiplication presented above the dot product interpretation of matrix multiplication, ​which will make more sense after reading the Vector Operations section.

However, there are few exceptions to this rule and we will understand them deeper in Broadcasting Rules. 



Matrix Multiplication are not commutative \\
$ A . B !=  B.A  $ \\
Matrix Multiplication are Associative \\
$ A . (B + C) = A.B + B.C $ \\



In [None]:
import numpy as np 
import torch 

In [None]:
n_arr = np.random.rand(3,3)
n_arr

In [None]:
py_tensor = torch.tensor(n_arr)
py_tensor


In [None]:
c= 10
new_py_tensor = py_tensor * c 

**Broadcasting Rules** \\
The term broadcasting is how the tensor operation will take place incase, we are operating on tensors of different sizes. 
For example you are having an image input of 256x256x3 (3d input) and when we add one more dimension of row (to form mini-batches for SGD). 

input = torch.empty(8, 1, ,6 ,1)
bias = torch.empty(7,1,5) 

if we do addition or multiplication or substraction  

it would be of shape (8, 7, 6, 5)


### Starting from last dimension and going towards first dimension . Two dimensions are broadcastable only when 

## Rule Condition 1 : they are equal 
## or 
## Rule Condition 2 : either of them is 1
## or 
## Rule Condition 3 : either of them does not exist


In [1]:
input_ = torch.empty(8, 1,6 ,1)
bias = torch.empty(7,1,5) 

NameError: ignored

In [2]:
y = input_ + bias 
z = input_ * bias # or y = torch.matmul(input_ , bias) 
u = input_ - bias

NameError: ignored

In [None]:
y.shape == z.shape == u.shape

In [3]:
y.shape

NameError: ignored

##Where as

In [None]:
input_ = torch.empty(8, 7,6 ,4)
bias = torch.empty(7,6,5) 

## Tensor Multiplication

In [None]:
tensor1 = torch.tensor([[1,2],[3,4]])
tensor2 = torch.tensor([[1,2,3],[4,5,6]])

In [None]:
tensor3 = torch.matmul(tensor1,tensor2)

## Gradients in Pytorch



In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
z = torch.tensor(1.5, requires_grad=True)

In [None]:
f = x**2 + y**2 + z**2

In [None]:
f.backward()

In [None]:
x.grad

In [None]:
y.grad

In [None]:
z.grad

In [None]:
import torch.nn as nn

In [None]:
in_dim, out_dim = 256, 10 
vec = torch.randn(256)
layer = nn.Linear(in_dim, out_dim, bias=True)
out = layer(vec)

In [None]:
# out = torch.matmul(W, vec) + b 

In [None]:
class BaseClassifier(nn.Module):
    def __init__(self, in_dim, feature_dim, out_dim):
        super(Model, self).__init__()
        self.layer1 = nn.Linear(in_dim, feature_dim, bias=True)
        
        self.layer2 = nn.Linear(feature_dim, out_dim, bias=True)
        self.relu = nn.Relu()
        
    def forward(self, inp):
        int_out = self.layer1(inp)
        int_out = self.relu(int_out)
        out = self.layer2(int_out)
        return out 
        