<a href="https://colab.research.google.com/github/JayGhiya/DataScience/blob/master/01_tensor_operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**PYTORCH101**  

### *Tensors*

An short introduction about PyTorch and about the chosen functions. 
- function 1 - **torch.as_tensor(data, dtype=None, device=None) → Tensor**
- function 2 - **torch.stack(tensors, dim=0, out=None) → Tensor**
- function 3 - **torch.cat(tensors, dim=0, out=None) → Tensor**
- function 4 - **torch.matmul(input, other, out=None) → Tensor**
- function 5 - **torch.squeeze(input, dim=None, out=None) → Tensor**

In [0]:
# Import torch and other required modules
import torch
import numpy

## Function 1 - torch.as_tensor(data, dtype=None, device=None) → Tensor

Converts the data into a torch tensor. If the data is an ndarray of the corresponding dtype and the device is CPU, no copy will be performed.

In [17]:
# Example 1 - working 
np_array = numpy.array([[4.,5.,6.],[7.,8.,9.],[10.,11.,12.]])
t_1 = torch.as_tensor(data = np_array , device=torch.device('cpu')) 
t_1.size(),t_1

(torch.Size([3, 3]), tensor([[ 4.,  5.,  6.],
         [ 7.,  8.,  9.],
         [10., 11., 12.]], dtype=torch.float64))

In [18]:
# Example 2 - working
t_1[0][0] = 10
np_array

array([[10.,  5.,  6.],
       [ 7.,  8.,  9.],
       [10., 11., 12.]])

Torch as tensor api supports building tensor from tuple,list,ndarray and scalar but not set. As set does not support duplicate values which may be part of tensor and needed for further computation.

In [13]:
# Example 3 - breaking (to illustrate when it breaks)
torch.as_tensor({1,23,3,4})

RuntimeError: ignored

To summarize we have used torch.as_tensor to generate a tensor out of numpy array without copying the data of numpy array by saving space. Also we saw how modifying tensor automatically modifies the numpy array which is a handy thing as we can do direct visualizations on numpy array using matplotlib lib after using operations on tensor.

torch.tensor() always copies data. If you have a Tensor data and just want to change its requires_grad flag, use requires_grad_() or detach() to avoid a copy. If you have a numpy array and want to avoid a copy, use torch.as_tensor().

```
# reference:  https://pytorch.org/docs/stable/torch.html#torch.as_tensor
```



## Function 2 - torch.stack(tensors, dim=0, out=None) → Tensor
It adds new tensors along a new dimension. Now let us try to add a tensor using stack to existing tensor t_1. We will use torch.from_numpy() to generate a new tensor and then use torch.stack() to perform the stack operation along dimension 0.![alt text](https://)




In [23]:
# Example 1 - working
t_2 = torch.from_numpy(numpy.array([[1.,8.,10.],[12.,14.,16.],[19.,20.,25.]]))
t_2
t_3 = torch.stack([t_1,t_2])
t_3,t_3.size()

(tensor([[[10.,  5.,  6.],
          [ 7.,  8.,  9.],
          [10., 11., 12.]],
 
         [[ 1.,  8., 10.],
          [12., 14., 16.],
          [19., 20., 25.]]], dtype=torch.float64), torch.Size([2, 3, 3]))

The stack function can be used when we are training on a batch of say 2d images which will have 2 dimensions. So what batch operation will do is stack n number of images along 0 dimension for that batch and then iterate over that dimension to get list of images.



In [27]:
# Example 2 - working
for image in t_3:
  #do training
  print(image)

tensor([[10.,  5.,  6.],
        [ 7.,  8.,  9.],
        [10., 11., 12.]], dtype=torch.float64)
tensor([[ 1.,  8., 10.],
        [12., 14., 16.],
        [19., 20., 25.]], dtype=torch.float64)


As we are stacking tensors along dimension 0 it is important to understand that tensors that need to be stacked have to be of same operation. if they are not then error will be thrown stating dimensions are not of same size.

In [32]:
# Example 3 - breaking (to illustrate when it breaks)
t_4 = torch.from_numpy(numpy.array([[1.,8.,10.,11.],[12.,14.,16.,17.],[19.,20.,25.,26.]]))
t_5 = torch.stack([t_1,t_4])

RuntimeError: ignored

Here we saw how stack operation of tensor can be used to load multiple examples of same tensor size across a new dimension which is zero by default.


```
# reference: https://pytorch.org/docs/master/generated/torch.stack.html
```



It can be used when we are training with multiple examples. For instance can be used to stack 2d images.

## Function 3 - torch.cat(tensors, dim=0, out=None) → Tensor

Concatenation joins tensors along an existing axis.
[Reference](https://deeplizard.com/learn/video/kF2AlpykJGY)

In [5]:
# Example 1 - working
# here we are creating tensors using random function of torch. What we are specifying as arguments is the size of the tensor.
t_6 = torch.randn(2, 2)
t_7 = torch.randn(2,2)
t_6,t_7,t_6.size()

(tensor([[-0.4515,  0.4627],
         [ 0.5606,  0.0564]]), tensor([[1.4285, 1.0836],
         [0.8295, 0.5393]]), torch.Size([2, 2]))

The concat operation comes very handy where want to merge features from different data source/points in one tensor. Let us look how it is done and what impact does it have on our tensor.

In [7]:
# Example 2 - Now let us do concatenation of both the tensors along second dimension
t_8 = torch.cat([t_6,t_7],dim=1)
t_8,t_8.size()

(tensor([[-0.4515,  0.4627,  1.4285,  1.0836],
         [ 0.5606,  0.0564,  0.8295,  0.5393]]), torch.Size([2, 4]))

Concatenation can only happen along existing dimensions. if dimension is used which is not there in tensor then error will be thrown regarding the same.

In [8]:
# Example 3 - breaking (to illustrate when it breaks)
t_8 = torch.cat([t_6,t_7],dim=2)

IndexError: ignored

Here in this example we saw usage of concatenation for pytorch tensors.

The function has to be used in places where we want to concatenate data from multiple data sources.

## Function 4 - torch.matmul(input, other, out=None) → Tensor

The function is responsible for doing matrix multiplication operations on two tensors. [Reference](https://pytorch.org/docs/stable/torch.html#torch.matmul)

In [11]:
# Example 1 - working
t_9 = torch.randn(2,2)
t_10 = torch.randn(2,2)
t_11 = torch.matmul(t_9,t_10)
t_9,t_10,t_11 


(tensor([[0.5482, 0.6253],
         [0.6028, 1.0405]]), tensor([[-0.6773, -0.1903],
         [-0.3803,  0.0977]]), tensor([[-0.6091, -0.0432],
         [-0.8040, -0.0130]]))

Now let us try to multiply a matrix and vector

In [14]:
# Example 2 - working
t_12 = torch.randn(4, 4)
t_13 = torch.randn(4)
t_14 = torch.matmul(t_12, t_13)
t_12, t_13, t_14 , t_14.size()

(tensor([[-5.9827e-01, -6.5229e-01,  1.9788e+00,  3.1657e-02],
         [ 3.5171e-01, -1.3246e-01,  3.5843e-01, -1.0040e+00],
         [ 1.0360e+00,  9.9712e-04,  9.0087e-01,  3.4729e-01],
         [-2.0146e-01,  1.5975e+00,  1.0360e+00, -1.2339e-01]]),
 tensor([ 1.5434,  2.3169,  1.1878, -0.3348]),
 tensor([-0.0948,  0.9978,  2.5550,  4.6622]),
 torch.Size([4]))

Matrix multiplication is only possible when number of columns from the first matrix are equal to number of rows from the second column.

In [15]:
# Example 3 - breaking (to illustrate when it breaks)
t_15 = torch.randn(4,5)
t_16 = torch.randn(4,6)
t_17 = torch.matmul(t_15,t_16)

RuntimeError: ignored

The matrix multiplication operation is the most used one in terms of deep learning. For instance it also is used in simple problems like linear regression where the target variable's equation is y=mx+b.

The function should be used while doing matrix multiplication.

## Function 5 - torch.squeeze(input, dim=None, out=None) → Tensor

Squeeze function will get rid of dimensions which have a value 1. It may save lot of time while dealing with tensors without the dimension with value 1. [Reference](https://www.codementor.io/@packt/how-to-perform-basic-operations-in-pytorch-code-10al39a4c4
)

In [18]:
# Example 1 - working
#we will use tensor ones to create a tensor
t_18 = torch.ones(5,1,2)
t_18

tensor([[[1., 1.]],

        [[1., 1.]],

        [[1., 1.]],

        [[1., 1.]],

        [[1., 1.]]])

now let us get rid of extra dimension with a value of 1 to simplify further operations

In [19]:
# Example 2 - 
t_19  = torch.squeeze(t_18)
t_19

tensor([[1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.],
        [1., 1.]])

Squeeze only works with matrices having dimension with value 1. otherwise it will not do any operation. Squeeze also works on existing dimensions. So specifying a dimension which is not preexisting on tensor will throw out a error as below.

In [25]:
# Example 3 - breaking (to illustrate when it breaks)
t_20 = torch.squeeze(torch.ones(2,2),dim=2)
t_20

IndexError: ignored

Here we saw the basic usage of squeeze to save time in unnecessary operations with dimensions having value 1.

The function comes very handy when we have a training approach to process samples one by one. let us say this is the size (200 * 1 * 100) where 200 is the number of text samples. In a nlp problem, we are looking at one word at a time so there is 1 in the tensor. In order to simplify the further operations on tensor we can get rid of extra dimension and make our life simple.

## Conclusion

Summarize what was covered in this notebook, and where to go next

## Reference Links

* Official documentation for `torch.Tensor`: https://pytorch.org/docs/stable/tensors.html
* https://www.codementor.io/@packt/how-to-perform-basic-operations-in-pytorch-code-10al39a4c4
*   https://deeplizard.com/learn/video/fCVuiW9AFzY
*   https://towardsdatascience.com/building-efficient-custom-datasets-in-pytorch-2563b946fd9f



In [28]:
!pip install jovian --upgrade --quiet

[?25l[K     |████                            | 10kB 21.3MB/s eta 0:00:01[K     |███████▉                        | 20kB 1.7MB/s eta 0:00:01[K     |███████████▉                    | 30kB 2.2MB/s eta 0:00:01[K     |███████████████▊                | 40kB 2.5MB/s eta 0:00:01[K     |███████████████████▋            | 51kB 2.0MB/s eta 0:00:01[K     |███████████████████████▋        | 61kB 2.2MB/s eta 0:00:01[K     |███████████████████████████▌    | 71kB 2.4MB/s eta 0:00:01[K     |███████████████████████████████▍| 81kB 2.6MB/s eta 0:00:01[K     |████████████████████████████████| 92kB 2.3MB/s 
[?25h  Building wheel for uuid (setup.py) ... [?25l[?25hdone


In [0]:
import jovian

In [31]:
jovian.commit()

[31m[jovian] Error: Failed to detect Jupyter notebook or Python script. Skipping..[0m
