<a href="https://colab.research.google.com/github/bharaniabhishek123/ML-Introduction/blob/main/01_Linear_Algebra_and_PyTorch_Tensor_Operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Data Structure

The most important data structure in linear algebra which applies to Machine Learning is arguably the matrix, a 2-d array of numbers where each entry can be indexed via its row and column. 


We can think of an Excel spreadsheet, where you have offers from Company X and Company Y as two rows, and the columns represents some characteristic of each offer such as starting salary, bonus, or position.



``` 
              Salary        Bonus            Position

Google        150,000       23,000           Software Engineer

Facebook      180,000       27,000           Data Scientist

```
The table format is especially suited to keep track of such data, where you can index by row and column to find, for example, Company X’s starting position. Matrices, similarly, are a multipurpose tool to hold all kinds of data, where the data we work in this book are of numerical form.




**A dataset, for example, has many individual data points with any number of associated features. These are called as Tensors in pytorch. They can run on GPU or TPU (a faster way to execute)**


$X_{m,n} =
 \begin{pmatrix}
  x_{1,1} & x_{1,2} & \cdots & x_{1,n} \\
  X_{2,1} & x_{2,2} & \cdots & x_{2,n} \\
  \vdots  & \vdots  & \ddots & \vdots  \\
  x_{m,1} & x_{m,2} & \cdots & x_{m,n}
 \end{pmatrix}$

In [None]:
import torch 

In [None]:
raw_data = [[1,2,3],[4,5,6],[7,8,9]]
py_tensor_1 = torch.tensor(raw_data)

In [None]:
py_tensor_2 = torch.ones_like(py_tensor_1)
print("Tensor with all Ones : \n {}".format(py_tensor_2))

Tensor with all Ones : 
 tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])


In [None]:
py_tensor_3 = torch.rand_like(py_tensor_1, dtype=torch.float)
print("Tensor with random values : \n {}".format(py_tensor_3))


Tensor with random values : 
 tensor([[0.1809, 0.0916, 0.2173],
        [0.1591, 0.6418, 0.4377],
        [0.8001, 0.7870, 0.9866]])


In [None]:
zeros = torch.zeros(2,3)
print(zeros)

ones = torch.ones(2,3)
print(ones)

torch.manual_seed(2508)
random = torch.rand(2,3)

print(random)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[0.7136, 0.0417, 0.6294],
        [0.8594, 0.3111, 0.8449]])


In [None]:
torch.manual_seed(2508) # wrapper for initializing tensor , helps in identical computation on reproduction of same results.
torch.rand(2,3)

tensor([[0.7136, 0.0417, 0.6294],
        [0.8594, 0.3111, 0.8449]])

In [None]:
torch.rand(2,3)

tensor([[0.4358, 0.3457, 0.0320],
        [0.6301, 0.5709, 0.6966]])

In [None]:
py_tensor = torch.rand((3,3), dtype=torch.float64) * 20 
print(py_tensor)

py_tensor_int = py_tensor.to(torch.int32)
print(py_tensor_int)

tensor([[17.9249,  5.6975, 14.5636],
        [ 0.7927, 19.0194,  4.2181],
        [ 5.5401, 12.4066, 11.8336]], dtype=torch.float64)
tensor([[17,  5, 14],
        [ 0, 19,  4],
        [ 5, 12, 11]], dtype=torch.int32)


## Creating a ndarray which is similar to tensor in pytorch

In [None]:
import numpy as np 
n_arr = np.random.rand(3,3)
n_arr

array([[0.83128801, 0.63010425, 0.3429538 ],
       [0.58333915, 0.70321775, 0.72309351],
       [0.18792482, 0.43513872, 0.87896944]])

## Creating a Tensor directly from ndarray . There are number of ways to create a tensor.

In [None]:
py_tensor = torch.tensor(n_arr)
py_tensor


tensor([[0.8313, 0.6301, 0.3430],
        [0.5833, 0.7032, 0.7231],
        [0.1879, 0.4351, 0.8790]], dtype=torch.float64)

## Properties of Tensor such as Shape, DataType and Device



In [None]:
print(f"Shape of tensor: {py_tensor_3.shape}")
print(f"Datatype of tensor: {py_tensor_3.dtype}")
print(f"Device tensor is stored on: {py_tensor_3.device}")

Shape of tensor: torch.Size([3, 3])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


**Matrix Operations**

Matrices can be added, subtracted, and multiplied - there is no division of matrices, but there exists a similar concept called inversion. 

 $ \begin{pmatrix}
  2 & 3 & 4  \\
  5 & 6 & 7  \\       
 \end{pmatrix} 
       +
\begin{pmatrix}
  5 & 6 & 7  \\
  8 & 9 & 1  \\       
 \end{pmatrix}  
 = 
\begin{pmatrix}
  7 & 9 & 11  \\
  13 & 15 & 9  \\       
 \end{pmatrix}  \\ $


$2 X \begin{pmatrix}
  2 & 3 & 4  \\
  5 & 6 & 7  \\       
 \end{pmatrix} = 
 \begin{pmatrix} 2*2 & 2*3 & 2*4  \\ 2*5 & 2*6 & 2*7  \\   \end{pmatrix}
 = \begin{pmatrix} 4 & 6 & 8  \\ 10 & 12 & 14  \\       \end{pmatrix}  $

$ \begin{pmatrix}
  2 & 3   \\
  4 & 5   \\  \end{pmatrix} X \begin{pmatrix}
  5 & 6   \\
  7 & 8   \\       
 \end{pmatrix} = \begin{pmatrix} 2*5 + 3*7 & 2*6 + 3*8 \\ 4*5 + 5*7 & 4*6 + 5 * 8 \\       \end{pmatrix}= \begin{pmatrix} 31 &  36  \\ 55  & 64  \\       \end{pmatrix}  $

Note : Two matrices are only multiplicable if the dimensions align, i.e. A is of dimension m by k and B is of dimension k by n. 

Other way of saying if the rows of A and the columns of B must have the same length, so two matrices can only be multiplied


If this weren’t the case, the formula for matrix multiplication would give us an indexing error. 

We’ll call the formula for matrix multiplication presented above the dot product interpretation of matrix multiplication, ​which will make more sense after reading the Vector Operations section.

However, there are few exceptions to this rule and we will understand them deeper in Broadcasting Rules. 



Matrix Multiplication are not commutative \\
$ A . B !=  B.A  $ \\
Matrix Multiplication are Associative \\
$ A . (B + C) = A.B + B.C $ \\



## Tensor Operations



1.   Matrix Transpose
2.   Multiplication
3.   Move to GPU
4.   In_place Operations
5.   Joining



## Transpose

In [None]:
tensor = torch.tensor([[2, 3, 4], [5, 6, 7]])
tensor_T = tensor.T #.T transpose the tensor

## Scalar Multiplication

In [None]:
c= 10
new_tensor = torch.tensor([[2, 3, 4], [5, 6, 7]])
new_tensor = tensor * c 
print("New Tensor after multiplication : \n {}".format(tensor))


New Tensor after multiplication : 
 tensor([[2, 3, 4],
        [5, 6, 7]])


## tensor multiplication

In [None]:
a = torch.tensor([[2, 3], [4, 5]])

b  = torch.tensor([[5, 6], [7, 8]])

out1 = a @ b  
out2 = a.matmul(b)


print("result of multiplication a x b {}".format(out1))
out1 == out2 

result of multiplication a x b tensor([[31, 36],
        [55, 64]])


tensor([[True, True],
        [True, True]])

## element-wise product

In [None]:
out1 = a * b
out2 = a.mul(b)

print("result of elementwise multiplication a * b {}".format(out1))

out1 == out2 

result of elementwise multiplication a * b tensor([[10, 18],
        [28, 40]])


tensor([[True, True],
        [True, True]])

**Broadcasting Rules** \\

Performing Operation only when there is some similarity in shape between tensors.
The term broadcasting is how the tensor operation will take place incase, we are operating on tensors of different sizes. 
For example you are having an image input of 256x256x3 (3d input) and when we add one more dimension of row (to form mini-batches for SGD). 

input = torch.empty(10, 256, 256 ,3)
bias = torch.empty(1,3) 

if we do addition or multiplication or substraction  

it would be of shape (10, 256,256, 3)


### Starting from last dimension and going towards first dimension . Two dimensions are broadcastable only when 

## Rule Condition 1 : they are equal 
## or 
## Rule Condition 2 : either of them is 1
## or 
## Rule Condition 3 : either of them does not exist


## Very Common Operation in ML where we multiply a tensor having weights(parameters) by batch of input (subset of input) for example a weight of shape (2,4) * (1, 4) returns a learned weight of (2,4).

In [None]:
ones = torch.zeros(2,2) + 1 
ones

tensor([[1., 1.],
        [1., 1.]])

In [None]:
a = torch.ones(10, 256,256,3) # Compare dims last to first 
b = a * torch.rand(256,3) # 3rd and 2nd dim are indentical to a 

c = a * torch.rand(1,3) # 3rd dim is 3, 2nd dim is 1



In [None]:

input_ = torch.empty(10, 256,256,3) # Compare dims last to first 
bias = torch.empty(1,1,3) 


y = input_ + bias 
z = input_ * bias # or y = torch.matmul(input_ , bias) 
u = input_ - bias

In [None]:
y.shape == z.shape == u.shape

True

In [None]:
y.shape

torch.Size([10, 256, 256, 3])

##Where as

In [None]:
input_ = torch.empty(10, 256,256,3)
bias = torch.empty(1,2,3) 

## Note : The error is intentional to show the size mismatch.

In [None]:
y = input_ + bias 
z = input_ * bias # or y = torch.matmul(input_ , bias) 
u = input_ - bias

RuntimeError: ignored

In [None]:

# bitwise operations
print('\nBitwise XOR:')
b = torch.tensor([1, 5, 11])
c = torch.tensor([2, 7, 10])
print(torch.bitwise_xor(b, c))

# comparisons:
print('\nBroadcasted, element-wise equality comparison:')
d = torch.tensor([[1., 2.], [3., 4.]])
e = torch.ones(1, 2)  # many comparison ops support broadcasting!
print(torch.eq(d, e)) # returns a tensor of type bool

# reductions:
print('\nReduction ops:')
print(torch.max(d))        # returns a single-element tensor
print(torch.max(d).item()) # extracts the value from the returned tensor
print(torch.mean(d))       # average
print(torch.std(d))        # standard deviation
print(torch.prod(d))       # product of all numbers
print(torch.unique(torch.tensor([1, 2, 1, 2, 1, 2]))) # filter unique elements


Bitwise XOR:
tensor([3, 2, 1])

Broadcasted, element-wise equality comparison:
tensor([[ True, False],
        [False, False]])

Reduction ops:
tensor(4.)
4.0
tensor(2.5000)
tensor(1.2910)
tensor(24.)
tensor([1, 2])


## Alter the Tensor Inplace 
Note : Any function call ending with _ suffix are in place changes to operand. No intermediate results


In [None]:
tensor1 = torch.tensor([[1,2],[3,4]])
tensor2 = torch.tensor([[1,2,3],[4,5,6]])

In [None]:
print(tensor1)
tensor1.add_(5)
print(tensor1)
tensor1.mul_(5)
print(tensor1)

tensor([[1, 2],
        [3, 4]])
tensor([[6, 7],
        [8, 9]])
tensor([[30, 35],
        [40, 45]])


## Copying a Tensor

Two or more paths in the model using same layer weights.

In [None]:
a = torch.ones(3,3)
b = a.clone() # by default copies the autgrade value from source tensor 

c = a.detach().clone() # does not store  store computation history for c 

id(a) == id(b)


False

## Moving to GPU

In [None]:
if torch.cuda.is_available():
    print('We have a GPU!')
else:
    print('Sorry, CPU only.')

Sorry, CPU only.


## Manipulating Tensor Shapes

Sometimes, you'll need to change the shape of your tensor. Below, we'll look at a few common cases, and how to handle them.

### Changing the Number of Dimensions

One case where you might need to change the number of dimensions is passing a single instance of input to your model. PyTorch models generally expect *batches* of input.

For example, imagine having a model that works on 3 x 226 x 226 images - a 226-pixel square with 3 color channels. When you load and transform it, you'll get a tensor of shape `(3, 226, 226)`. Your model, though, is expecting input of shape `(N, 3, 226, 226)`, where `N` is the number of images in the batch. So how do you make a batch of one?

In [None]:
a = torch.rand(3, 226, 226)
b = a.unsqueeze(0)

print(a.shape)
print(b.shape)

torch.Size([3, 226, 226])
torch.Size([1, 3, 226, 226])


The `unsqueeze()` method adds a dimension of extent 1. `unsqueeze(0)` adds it as a new zeroth dimension - now you have a batch of one!

So if that's *un*squeezing? What do we mean by squeezing? We're taking advantage of the fact that any dimension of extent 1 *does not* change the number of elements in the tensor.

Continuing the example above, let's say the model's output is a 20-element vector for each input. You would then expect the output to have shape `(N, 20)`, where `N` is the number of instances in the input batch. That means that for our single-input batch, we'll get an output of shape `(1, 20)`.

What if you want to do some *non-batched* computation with that output - something that's just expecting a 20-element vector?

In [None]:
a = torch.rand(1, 20)
print(a.shape)
print(a)

b = a.squeeze(0)
print(b.shape)
print(b)

c = torch.rand(2, 2)
print(c.shape)

d = c.squeeze(0)
print(d.shape)

torch.Size([1, 20])
tensor([[0.3892, 0.2961, 0.3125, 0.0115, 0.0909, 0.2101, 0.5266, 0.4698, 0.0137,
         0.4796, 0.6653, 0.5503, 0.0496, 0.8904, 0.1333, 0.6137, 0.6695, 0.8352,
         0.4915, 0.3754]])
torch.Size([20])
tensor([0.3892, 0.2961, 0.3125, 0.0115, 0.0909, 0.2101, 0.5266, 0.4698, 0.0137,
        0.4796, 0.6653, 0.5503, 0.0496, 0.8904, 0.1333, 0.6137, 0.6695, 0.8352,
        0.4915, 0.3754])
torch.Size([2, 2])
torch.Size([2, 2])


You can see from the shapes that our 2-dimensional tensor is now 1-dimensional, and if you look closely at the output of the cell above you'll see that printing `a` shows an "extra" set of square brackets `[]` due to having an extra dimension.

You may only `squeeze()` dimensions of extent 1. See above where we try to squeeze a dimension of size 2 in `c`, and get back the same shape we started with. Calls to `squeeze()` and `unsqueeze()` can only act on dimensions of extent 1 because to do otherwise would change the number of elements in the tensor.

Another place you might use `unsqueeze()` is to ease broadcasting. Recall the example above where we had the following code:

```
a =     torch.ones(4, 3, 2)

c = a * torch.rand(   3, 1) # 3rd dim = 1, 2nd dim identical to a
print(c)
```

The net effect of that was to broadcast the operation over dimensions 0 and 2, causing the random, 3 x 1 tensor to be multiplied element-wise by every 3-element column in `a`.

What if the random vector had just been  3-element vector? We'd lose the ability to do the broadcast, because the final dimensions would not match up according to the broadcasting rules. `unsqueeze()` comes to the rescue:

In [None]:
a = torch.ones(4, 3, 2)
b = torch.rand(   3)     # trying to multiply a * b will give a runtime error
c = b.unsqueeze(1)       # change to a 2-dimensional tensor, adding new dim at the end
print(b.shape)
print(c.shape)
print(a * c)             # broadcasting works again!

torch.Size([3])
torch.Size([3, 1])
tensor([[[0.0765, 0.0765],
         [0.2740, 0.2740],
         [0.5332, 0.5332]],

        [[0.0765, 0.0765],
         [0.2740, 0.2740],
         [0.5332, 0.5332]],

        [[0.0765, 0.0765],
         [0.2740, 0.2740],
         [0.5332, 0.5332]],

        [[0.0765, 0.0765],
         [0.2740, 0.2740],
         [0.5332, 0.5332]]])


In [None]:
batch_me = torch.rand(3, 226, 226)
print(batch_me.shape)
batch_me.unsqueeze_(0)
print(batch_me.shape)

torch.Size([3, 226, 226])
torch.Size([1, 3, 226, 226])


## Gradients in Pytorch



In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
z = torch.tensor(1.5, requires_grad=True)

In [None]:
f = x**2 + y**2 + z**2

In [None]:
f.backward()

In [None]:
x.grad

tensor(4.)

In [None]:
y.grad

tensor(6.)

In [None]:
z.grad

tensor(3.)

In [None]:
import torch.nn as nn

In [None]:
in_dim, out_dim = 256, 10 
vec = torch.randn(256)
layer = nn.Linear(in_dim, out_dim, bias=True)
out = layer(vec)

In [None]:
# out = torch.matmul(W, vec) + b 

In [None]:
class BaseClassifier(nn.Module):
    def __init__(self, in_dim, feature_dim, out_dim):
        super(Model, self).__init__()
        self.layer1 = nn.Linear(in_dim, feature_dim, bias=True)
        
        self.layer2 = nn.Linear(feature_dim, out_dim, bias=True)
        self.relu = nn.Relu()
        
    def forward(self, inp):
        int_out = self.layer1(inp)
        int_out = self.relu(int_out)
        out = self.layer2(int_out)
        return out 
        