<a href="https://colab.research.google.com/github/akib1162100/ML_base/blob/main/14_Basic_Pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. Pytorch Overview
#### PyTorch is a deep learning framework and a scientific computing package. It provides a flexible and efficient way to build, train, and deploy neural networks.

### Installing PyTorch with Anaconda and Conda

#### Getting started with PyTorch is very easy. The recommended best option is to use the Anaconda Python package manager.
 ##### Let's go over the steps:

   * Download and install Anaconda (choose the latest Python version). (No need to install if already installed)
   * Go to [PyTorch's site](https://pytorch.org/get-started/locally/) and find the get started locally section. (Choose CPU version)
   * Specify the appropriate configuration options for your particular environment.
   * Run the presented command in the terminal to install PyTorch.

![image.png](attachment:image.png)

You may also RUN the following command below :
###### command : conda install pytorch torchvision torchaudio cpuonly -c pytorch

### 2. PyTorch's tensor Library
*  A tensor is a n-dimensional array.  Tensors are super important for deep learning and neural networks because they are the data structure that we ultimately use for building and training our neural networks. On top of the tensor library, PyTorch has much more to offer in terms of building and training neural networks.

![image.png](attachment:image.png)




### This table gives us a list of PyTorch packages and their corresponding descriptions. These are the primary PyTorch components we'll be learning about and using as we build neural networks in this course.


| Package |Description |
| --- | --- |
|torch | The top-level PyTorch package and tensor library |
torch.nn | A subpackage that contains modules and extensible classes for building neural networks. |
torch.autograd | A subpackage that supports all the differentiable Tensor operations in PyTorch. |
torch.nn.functional | A functional interface that contains typical operations used for building neural networks like loss functions, activation functions, and convolution operations. |
torch.optim  |	A subpackage that contains standard optimization operations like SGD and Adam. |
torch.utils  |	A subpackage that contains utility classes like data sets and data loaders that make data preprocessing easier. |
torchvision  |	A package that provides access to popular datasets, model architectures, and image transformations for computer vision.  |

### 3. Pytorch Basics

In [None]:
#in order to import torch library you need to run the following command
import torch

###### To check the version, we use



In [None]:
print(torch.__version__)

2.1.0


### Explanation of Tensors - Data Structures of Deep Learning

#### What is a tensor?
* A PyTorch Tensor is basically the same as a numpy array: it does not know anything about deep learning or computational graphs or gradients, and is just a generic n-dimensional array to be used for arbitrary numeric computation.

* The biggest difference between a numpy array and a PyTorch Tensor is that a PyTorch Tensor can run on either CPU or GPU. To run operations on the GPU, just cast the Tensor to a cuda datatype.

* The inputs, outputs, and transformations within neural networks are all represented using tensors, and as a result, neural network programming utilizes tensors heavily.

### Indexes required to access an element :

| Indexes required | Computer science | Mathematics |
| --- | --- | --- |
| 0 | number | scalar |
| 1 | array | vector |
| 2 | 2d-array | matrix |


###### For example, suppose we have this array:

In [None]:
a = [1,2,3,4]

##### Now, suppose we want to access (refer to) the number in this data structure. We can do it using a single index like so:

In [None]:
a[2]

3

##### This logic works the same for a vector.

In [None]:
dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

##### Now, suppose we want to access (refer to) the number in this data structure. In this case, we need two indexes to locate the specific element.

In [None]:
dd[0][2]

3

###### This logic works the same for a matrix.
###### Note that, if we have a number or scalar, we don't need an index, we can just refer to the number or scalar directly.

### Basic Tensor Operation



In [None]:
# Create two tensors
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])

In [None]:
# Sum of tensors
sum_result = torch.add(x, y)

In [None]:
# Subtract tensors
sub_result = torch.sub(x, y)

In [None]:
# Divide tensors
div_result = torch.div(x, y)

In [None]:
# Multiply tensors
mul_result = torch.mul(x, y)

In [None]:
# Print the results
print("x:", x)
print("y:", y)
print("Sum:", sum_result)
print("Subtract:", sub_result)
print("Divide:", div_result)
print("Multiply:", mul_result)

x: tensor([1, 2, 3])
y: tensor([4, 5, 6])
Sum: tensor([5, 7, 9])
Subtract: tensor([-3, -3, -3])
Divide: tensor([0.2500, 0.4000, 0.5000])
Multiply: tensor([ 4, 10, 18])


## Tensors are generalizations
* Let's look at what happens when there are more than two indexes required to access (refer to) a specific element within these data structures we have been considering.
*  When more than two indexes are required to access a specific element, we stop giving specific names to the structures, and we begin using more general language.

## Mathematics

In mathematics, we stop using words like scalar, vector, and matrix, and we start using the word tensor or nd-tensor. The 'n' tells us the number of indexes required to access a specific element within the structure.

## Computer science

 In computer science, we stop using words like, number, array, 2d-array, and start using the word multidimensional array or nd-array. The 'n'  tells us the number of indexes required to access a specific element within the structure.

| Indexes required | Computer science | Mathematics |
| --- | --- | --- |
| n | nd-array  | nd-tensor  |


##### Let's make this clear. For practical purposes in neural network programming, tensors and nd-arrays are one in the same.
* ##### Tensors and nd-arrays are the same thing!

So tensors are multidimensional arrays or nd-arrays for short. The reason we say a tensor is a generalization is because we use the word tensor for all values of *n* like so:

* A scalar is a 0 dimensional tensor
* A vector is a 1 dimensional tensor
* A matrix is a 2 dimensional tensor
* A nd-array is an n dimensional tensor

###### Tensors allow us to drop these specific terms and just use a term to identify the number of dimensions we are working with.

## Rank, Axes and Shape

The rank, axes, and shape are three tensor attributes that will concern us most when starting out with tensors in deep learning. These concepts build on one another starting with rank, then axes, and building up to shape.

### Rank of a tensor

The rank of a tensor refers to the number of dimensions present within the tensor. Suppose we are told that we have a rank-2 tensor. This means all of the following:

   * We have a matrix
   * We have a 2d-array
   * We have a 2d-tensor
   
##### A tensor's rank tells us how many indexes are needed to refer to a specific element within the tensor.


### Axes of a tensor
If we have a tensor, and we want to refer to a specific dimension, we use the word axis in deep learning.
###### An axis of a tensor is a specific dimension of a tensor.
If we say that a tensor is a rank 2 tensor, we mean that the tensor has 2 dimensions, or equivalently, the tensor has two axes.

### Length of an axis

###### The length of each axis tells us how many indexes are available along each axis.
Suppose we have a tensor called t, and we know that the first axis has a length of three while the second axis has a length of four.
 Since the first axis has a length of three, this means that we can index three positions along the first axis like so: \
`t[0] ` \
`t[1] ` \
`t[2] `



In [None]:
t = [
[1,2,3,9],
[4,5,6,8],
[7,8,9,3]
]

In [None]:
# try accessing index 0,1 and 2. Then try to access index 3
t[2]

[7, 8, 9, 3]

###### All of these indexes are valid, but we can't move passed index 2.

Since the second axis has a length of four, we can index four positions along the second axis. This is possible for each index of the first axis, so we have:

`t[0][0]` \
`t[1][0]` \
`t[2][0]`

`t[0][1]` \
`t[1][1]` \
`t[2][1]`

`t[0][2]` \
`t[1][2]` \
`t[2][2]`

`t[0][3]` \
`t[1][3]` \
`t[2][3]`  `

In [None]:
# try accessng index 4 along the second axis
t[2][3]

3

### Tensor axes example

Let's look at some examples to make this solid. We'll consider the same tensor dd as before:

In [None]:
dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

Each element along the first axis, is an array:

In [None]:
dd[0]

[1, 2, 3]

In [None]:
dd[1]

[4, 5, 6]

In [None]:
dd[2]

[7, 8, 9]

For this example, Each element along the second axis, is a number:

In [None]:
dd[0][0]

1

In [None]:
dd[1][0]

4

In [None]:
dd[2][1]

8

In [None]:
dd[2][2]

9

##### Note that, with tensors, the elements of the last axis are always numbers. Every other axis will contain n-dimensional arrays.

### Shape of a tensor
The shape of a tensor is determined by the length of each axis, so if we know the shape of a given tensor, then we know the length of each axis, and this tells us how many indexes are available along each axis.

##### The shape of a tensor gives us the length of each axis of the tensor.



Let's consider the same tensor dd as before:

In [None]:
 dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

To work with this tensor's shape, we'll create a torch.Tensor object like so:

In [None]:
t = torch.tensor(dd)

print(t)

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])


In [None]:
type(t)

torch.Tensor

Now, we have a torch.Tensor object, and so we can ask to see the tensor's shape:

In [None]:
t.shape

torch.Size([3, 3])

This allows us to see the tensor's shape is 3 x 3. Note that, in PyTorch, size and shape of a tensor are the same thing.
The shape of 3 x 3 tells us that each axis of this rank two tensor has a length of 3 which means that we have three indexes available along each axis.

one of the types of operations we must perform frequently when we are programming our neural networks is called reshaping.

As our tensors flow through our networks, certain shapes are expected at different points inside the network, and as neural network programmers, it is our job to understand the incoming shape and have the ability to reshape as needed.

Before we look at reshaping tensors, recall how we reshaped the list of terms we started with earlier:

* `Shape 6 x 1`

   * number
   * scalar
   * array
   * vector
   * 2d-array
   * matrix
   
* `Shape 2 x 3`

   * number, array, 2d-array
   * scalar, vector, matrix
*  Shape 3 x 2

   * number, scalar
   * array, vector
   * 2d-array, matrix



Each of these groups of terms represent the same underlying data only with differing shapes. This is just a little example to motivate the idea of reshaping.

Let's look at our example tensor dd again:

In [None]:
t = torch.tensor(dd)
t

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

This torch.Tensor is a rank 2 tensor with a shape of [3,3] or 3 x 3.
Now, suppose we need to reshape t to be of shape [1,9]. This would give us one array along the first axis and nine numbers along the second axis:
    

In [None]:
#in order to reshape run the following command
t.reshape(1,9)

tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [None]:
# in order to check the shape of the reshaped tensor
t.reshape(1,9).shape

torch.Size([1, 9])

### CNN tensor input shape and feature maps

In this section, we will look at a practical example that demonstrates the use of the tensor concepts rank, axes, and shape.  \
To do this, we'll consider a tensor input to a convolutional neural network. Without further ado, let's get started.

Convolutional neural networks are the go-to networks for image recognition tasks because they are well suited for detecting spacial patterns.

Remember that the shape of a tensor encodes all the relevant information about a tensor's axes, rank, and indexes, so we'll consider the shape in our example, and this will enable us to work out the other values. Let's begin.

### Shape of a CNN input

The shape of a CNN input typically has a length of four. This means that we have a rank-4 tensor with four axes.
For images, the raw data comes in the form of pixels that are represented by a number and are laid out using two dimensions, height and width.

#### we'll work backwards, considering the axes from right to left. Remember, the last axis, which is where we'll start, is where the actual numbers or data values are located.


###### Image height and width

To represent two dimensions, we need two axes.
![image.png](attachment:image.png)

The image height and width are represented on the last two axes. Possible values here are 28 x 28.

The image height and width might vary depending on the deep learning model we will be using.

##### Image color channels
The next axis represents the color channels. Typical values here are 3 for RGB images or 1 if we are working with grayscale images. This color channel interpretation only applies to the input tensor.

In terms of accessing data at this point, we need three indexes. We choose a color channel, a height, and a width to arrive at a specific pixel value.

#####  Image batches

This brings us to the first axis of the four which represents the batch size. In neural networks, we usually work with batches of samples opposed to single samples, so the length of this axis tells us how many samples are in our batch.

This allows us to see that an entire batch of images is represented using a single rank-4 tensor.

Suppose we have the following shape [3, 1, 28, 28] for a given tensor. Using the shape, we can determine that we have a batch of three images.

                   ` [Batch, Channels, Height, Width] `

Each image has a single color channel, and the image height and width are 28 x 28 respectively

This gives us a single rank-4 tensor that will ultimately flow through our convolutional neural network.

####  NCHW vs NHWC vs CHWN

It's common when reading API documentation and academic papers to see the B replaced by an N. The N standing for number of samples in a batch.

Furthermore, another difference we often encounter in the wild is a reordering of the dimensions. Common orderings are as follows:

   * NCHW
   * NHWC
   * CHWN
   
As we have seen, PyTorch uses NCHW


####  Output channels and feature maps

Let's look at how the interpretation of the color channel axis changes after the tensor is transformed by a convolutional layer.

Suppose we have a tensor that contains data from a single 28 x 28 grayscale image. This gives us the following tensor shape: [1, 1, 28, 28]


# More on tensor

PyTorch tensors are the data structures we'll be using when programming neural networks in PyTorch.

When programming neural networks, data preprocessing is often one of the first steps in the overall process, and one goal of data preprocessing is to transform the raw input data into tensor form.

### Instances of the torch.Tensor class

PyTorch tensors are instances of the torch.Tensor Python class. We can create a torch.Tensor object using the class constructor like so:

In [None]:
t = torch.Tensor()

type(t)

torch.Tensor

This creates an empty tensor (tensor with no data), but we'll get to adding data in just a moment.

### Tensor attributes
First, let's look at a few tensor attributes. Every torch.Tensor has these attributes:

   * torch.dtype
   * torch.device
   * torch.layout

Looking at our Tensor t, we can see the following default attribute values:


In [None]:
print(t.dtype)
print(t.device)
print(t.layout)

torch.float32
cpu
torch.strided


### Tensors have a torch.dtype

The dtype, which is torch.float32 in our case, specifies the type of the data that is contained within the tensor. Tensors contain uniform (of the same type) numerical data with one of these types:

| Data type | dtype |	CPU tensor | GPU tensor |
| --- | --- | --- | ---|
| 32-bit floating point |	torch.float32 |	torch.FloatTensor 	| torch.cuda.FloatTensor|
|64-bit floating point |	torch.float64 |	torch.DoubleTensor |	torch.cuda.DoubleTensor|
|16-bit floating point |	torch.float16 |	torch.HalfTensor |	torch.cuda.HalfTensor|
|8-bit integer (unsigned) |	torch.uint8 |	torch.ByteTensor |	torch.cuda.ByteTensor|
|8-bit integer (signed) |	torch.int8 |	torch.CharTensor |	torch.cuda.CharTensor|
|16-bit integer (signed) |	torch.int16 |	torch.ShortTensor |	torch.cuda.ShortTensor|
|32-bit integer (signed) |	torch.int32 |	torch.IntTensor |	torch.cuda.IntTensor|
|64-bit integer (signed) |	torch.int64 |	torch.LongTensor |	torch.cuda.LongTensor|

Notice how each type has a CPU and GPU version. One thing to keep in mind about tensor data types is that tensor operations between tensors must happen between tensors with the same type of data. However, this statement only applies to PyTorch versions lower than 1.3

### PyTorch Tensor Type Promotion

Arithmetic and comparison operations, as of PyTorch version 1.3, can perform mixed-type operations that promote to a common dtype.

The example below was not allowed in version 1.2. However, in version 1.3 and above, the same code returns a tensor with dtype=torch.float32

In [None]:
#let's add data of different type and store them inside result variable
result = torch.tensor([1], dtype=torch.int) + torch.tensor([1],dtype=torch.float32)
result

tensor([2.])

In [None]:
# in order to check the data type of the result

result.dtype

torch.float32

### Tensors have a torch.device
The device, cpu in our case, specifies the device (CPU or GPU) where the tensor's data is allocated. This determines where tensor computations for the given tensor will be performed.
PyTorch supports the use of multiple devices, and they are specified using an index like so:

In [None]:
device = torch.device('cpu')

In [None]:
device

device(type='cpu')

 One thing to keep in mind about using multiple devices is that tensor operations between tensors must happen between tensors that exists on the same device.

 Using multiple devices is typically something we will do as we become more advanced users, so there's no need to worry about that now.

### Creating tensors using data
 These are the primary ways of creating tensor objects (instances of the torch.Tensor class), with data (array-like) in PyTorch:

    torch.Tensor(data)
    torch.tensor(data)
    torch.as_tensor(data)
    torch.from_numpy(data)


##### We'll begin by just creating a tensor with each of the options and see what we get. We'll start by creating some data.
We can use a Python list, or sequence, but numpy.ndarrays are going to be the more common option, so we'll go with a numpy.ndarray like so:

In [None]:
import numpy as np

data = np.array([1,2,3])

type(data)

numpy.ndarray

This gives us a simple bit of data with a type of numpy.ndarray.

Now, let's create our tensors with each of these options 1-4, and have a look at what we get:

In [None]:
o1 = torch.Tensor(data)
o2 = torch.tensor(data)
o3 = torch.as_tensor(data)
o4 = torch.from_numpy(data)

In [None]:
print(o1)
print(o2)
print(o3)
print(o4)

tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([1, 2, 3])
tensor([1, 2, 3])


All of the options (o1, o2, o3, o4) appear to have produced the same tensors except for the first one. The first option (o1) has dots after the number indicating that the numbers are floats, while the next three options have a type of int32.

We will see later which of these options is best for creating tensors. For now, let's see some of the creation options available for creating tensors from scratch without having any data beforehand.

### Creation options without data

Here are some other creation options that are available.

##### We have the torch.eye() function which returns a 2-D tensor with ones on the diagonal and zeros elsewhere. The name eye() is connected to the idea of an identity matrix , which is a square matrix with ones on the main diagonal and zeros everywhere else.

In [None]:
print(torch.eye(2))

tensor([[1., 0.],
        [0., 1.]])


###### We have the torch.zeros() function that creates a tensor of zeros with the shape of specified shape argument.

In [None]:
print(torch.zeros([2,2]))

tensor([[0., 0.],
        [0., 0.]])


##### Similarly, we have the torch.ones() function that creates a tensor of ones:

In [None]:
print(torch.ones([2,2]))

tensor([[1., 1.],
        [1., 1.]])


##### We also have the torch.rand() function that creates a tensor with a shape of the specified argument whose values are random.

In [None]:
print(torch.rand([2,2]))

tensor([[0.8158, 0.9623],
        [0.7792, 0.3234]])


 ### Creating PyTorch Tensors - Best Options
In this section, we'll know the differences between the primary options as well as which options should be used and when.

As we have seen before we have multiple options to create tensor. Among them torch.Tensor() constructor uses the default dtype when building the tensor. We can verify the default dtype using the torch.get_default_dtype() method:

In [None]:
torch.get_default_dtype()

torch.float32

The other calls choose a dtype based on the incoming data. This is called type inference. The dtype is inferred based on the incoming data. Note that the dtype can also be explicitly set for these calls by specifying the dtype as an argument:

In [None]:
torch.tensor(data, dtype=torch.float32)

tensor([1., 2., 3.])

In [None]:
torch.as_tensor(data, dtype=torch.float32)

tensor([1., 2., 3.])

With torch.Tensor(), we are unable to pass a dtype to the constructor. This is an example of the torch.Tensor() constructor lacking in configuration options. This is one of the reasons to go with the torch.tensor() factory function for creating our tensors.

Let's look at the last hidden difference between these alternative creation methods.

### Sharing memory for performance: copy vs share

The third difference is lurking behind the scenes or underneath the hood. To reveal the difference, we need to make a change to the original input data in the numpy.ndarray after using the ndarray to create our tensors.

Let's do this and see what we get:

In [None]:
print('old:', data)

old: [1 2 3]


In [None]:
#change the value of index 0 from the main array
data[0] = 0

In [None]:
#now lets print the array
print('new:', data)

new: [0 2 3]


In [None]:
print(o1)

print(o2)

print(o3)

print(o4)


tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([0, 2, 3])
tensor([0, 2, 3])


Note that originally, we had **data[0]=1**, and also note that we only changed the data in the original numpy.ndarray. Notice we didn't explicity make any changes to our tensors **(o1, o2, o3, o4)**.

However, after setting **data[0]=0**, we can see some of our tensors have changes. The first two o1 and o2 still have the original value of 1 for index 0, while the second two o3 and o4 have the new value of 0 for index 0.

This happens because **torch.Tensor()** and **torch.tensor()** copy their input data while **torch.as_tensor()** and **torch.from_numpy()** share their input data in memory with the original input object.

| Share Data |	Copy Data |
| --- | --- |
|torch.as_tensor()  |	torch.tensor() |
|torch.from_numpy()  |	torch.Tensor()  |

This sharing just means that the actual data in memory exists in a single place. As a result, any changes that occur in the underlying data will be reflected in both objects, the torch.Tensor and the numpy.ndarray.

##### If we have a torch.Tensor and we want to convert it to a numpy.ndarray, we do it like so:

In [None]:
print(o3.numpy())
print(o4.numpy())

[0 2 3]
[0 2 3]


This gives:

In [None]:
print(type(o3.numpy()))
print(type(o4.numpy()))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


### Best options for creating tensors in PyTorch

Given all of these details, these two are the best options:

  *  torch.tensor()
  * torch.as_tensor()

The torch.tensor() call is the sort of go-to call, while torch.as_tensor() should be employed when tuning our code for performance.

##### Some things to keep in mind about memory sharing (it works where it can):

   1. Since numpy.ndarray objects are allocated on the CPU, the as_tensor() function must copy the data from the CPU to the GPU when a GPU is being used.
   2. The memory sharing of as_tensor() doesn't work with built-in Python data structures like lists.



# Reshaping Operations - Tensors For Deep Learning

## Tensor Operation Types

Before we dive in with specific tensor operations, let's get a quick overview of the landscape by looking at the main operation categories that encompass the operations we'll cover. We have the following high-level categories of operations:
1. Reshaping operations
2. Element-wise operations
3. Reduction operations
4. Access operations

## Reshaping Operations For Tensors

Tensor Shape Review:

In [None]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

To determine the shape of this tensor, we look first at the rows 3 and then the columns 4, and so this tensor is a 3 x 4 rank 2 tensor. Remember, rank is a word that is commonly used and just means the number of dimensions present within the tensor.

In PyTorch, we have two ways to get the shape:

In [None]:
t.size()

torch.Size([3, 4])

In [None]:
t.shape

torch.Size([3, 4])

In PyTorch the size and shape of a tensor mean the same thing.

Typically, after we know a tensor's shape, we can deduce a couple of things. First, we can deduce the tensor's rank. The rank of a tensor is equal to the length of the tensor's shape.

In [None]:
len(t.shape)

2

We can also deduce the number of elements contained within the tensor. The number of elements inside a tensor (12 in our case) is equal to the product of the shape's component values.

In [None]:
torch.tensor(t.shape).prod()

tensor(12)

In PyTorch, there is a dedicated function for this:

In [None]:
t.numel()

12

The number of elements contained within a tensor is important for reshaping because the reshaping must account for the total number of elements present. Reshaping changes the tensor's shape but not the underlying data. Our tensor has 12 elements, so any reshaping must account for exactly 12 elements.

# Reshaping A Tensor In PyTorch
Let's look now at all the ways in which this tensor t can be reshaped without changing the rank:

In [None]:
t.reshape([1,12])

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])

In [None]:
t.reshape([2,6])

tensor([[1., 1., 1., 1., 2., 2.],
        [2., 2., 3., 3., 3., 3.]])

In [None]:
t.reshape([3,4])

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

In [None]:
t.reshape([4,3])

tensor([[1., 1., 1.],
        [1., 2., 2.],
        [2., 2., 3.],
        [3., 3., 3.]])

In [None]:
t.reshape(6,2)

tensor([[1., 1.],
        [1., 1.],
        [2., 2.],
        [2., 2.],
        [3., 3.],
        [3., 3.]])

In [None]:
t.reshape(12,1)

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [2.],
        [2.],
        [2.],
        [2.],
        [3.],
        [3.],
        [3.],
        [3.]])

Using the reshape() function, we can specify the row x column shape that we are seeking. Notice how all of the shapes have to account for the number of elements in the tensor. In our example this is:

**rows * columns = 12 elements**

We can use the intuitive words rows and columns when we are dealing with a rank 2 tensor. The underlying logic is the same for higher dimensional tenors even though **we may not be able to use the intuition of rows and columns in higher dimensional spaces**. For example:

In [None]:
t.reshape(2,2,3)

tensor([[[1., 1., 1.],
         [1., 2., 2.]],

        [[2., 2., 3.],
         [3., 3., 3.]]])

In this example, we increase the rank to 3, and so we lose the rows and columns concept. However, the product of the shape's components (2,2,3) still has to be equal to the number of elements in the original tensor ( 12).

Note that PyTorch has another function that you may see called view() that does the same thing as the reshape() function, but don't let these names throw you off. No matter which deep learning framework we are using, these concepts will be the same.

# Changing Shape By Squeezing And Unsqueezing

The next way we can change the shape of our tensors is by squeezing and unsqueezing them.

Squeezing a tensor removes the dimensions or axes that have a length of one.
Unsqueezing a tensor adds a dimension with a length of one.

These functions allow us to expand or shrink the rank (number of dimensions) of our tensor. Let's see this in action.

In [None]:
print(t.reshape([1,12]))

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])


In [None]:
print(t.reshape([1,12]).shape)

torch.Size([1, 12])


In [None]:
print(t.reshape([1,12]).squeeze())

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])


In [None]:
print(t.reshape([1,12]).squeeze().shape)

torch.Size([12])


In [None]:
print(t.reshape([1,12]).squeeze().unsqueeze(dim=0))

tensor([[1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.]])


In [None]:
print(t.reshape([1,12]).squeeze().unsqueeze(dim=0).shape)


torch.Size([1, 12])


# In order to flatten a tensor we call flatten() method and pass the tensor

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])

In [None]:
torch.flatten(t1)

tensor([1, 2, 3, 4])

# Concatenating Tensors

We combine tensors using the cat() function, and the resulting tensor will have a shape that depends on the shape of the two input tensors.

Suppose we have two tensors:

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])

t2 = torch.tensor([
    [5,6],
    [7,8]
])

We can combine t1 and t2 row-wise (axis-0) in the following way:

In [None]:
torch.cat((t1, t2), dim=0)

tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])

We can combine them column-wise (axis-1) like this:

In [None]:
torch.cat((t1, t2), dim=1)

tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])

When we concatenate tensors, we increase the number of elements contained within the resulting tensor. This causes the component values within the shape (lengths of the axes) to adjust to account for the additional elements.

In [None]:
torch.cat((t1, t2), dim=0).shape

torch.Size([4, 2])

In [None]:
torch.cat((t1, t2), dim=1).shape

torch.Size([2, 4])

# Element-wise operations

An element-wise operation operates on corresponding elements between tensors.

Two elements are said to be corresponding if the two elements occupy the same position within the tensor. The position is determined by the indexes used to locate each element.

Suppose we have the following two tensors:

In [None]:
t1 = torch.tensor([
    [1,2],
    [3,4]
], dtype=torch.float32)

In [None]:
t2 = torch.tensor([
    [9,8],
    [7,6]
], dtype=torch.float32)

Both of these tensors are rank-2 tensors with a shape of 2 x 2.

This means that we have two axes that both have a length of two elements each. The elements of the first axis are arrays and the elements of the second axis are numbers.

## Addition Is An Element-Wise Operation

In [None]:
t1 + t2

tensor([[10., 10.],
        [10., 10.]])

This allow us to see that addition between tensors is an element-wise operation. Each pair of elements in corresponding locations are added together to produce a new tensor of the same shape.

So, addition is an element-wise operation, and in fact, all the arithmetic operations, add, subtract, multiply, and divide are element-wise operations.

# Arithmetic Operations Are Element-Wise Operations
An operation we commonly see with tensors are arithmetic operations using scalar values. There are two ways we can do this:

In [None]:
print(t1 + 2)
print(t1.add(2))

tensor([[3., 4.],
        [5., 6.]])
tensor([[3., 4.],
        [5., 6.]])


In [None]:
print(t1 - 2)
print(t1.sub(2))

tensor([[-1.,  0.],
        [ 1.,  2.]])
tensor([[-1.,  0.],
        [ 1.,  2.]])


In [None]:
print(t1 * 2)
print(t1.mul(2))

tensor([[2., 4.],
        [6., 8.]])
tensor([[2., 4.],
        [6., 8.]])


In [None]:
print(t1 / 2)
print(t1.div(2))

tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])
tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])


#### Both of these options work the same.

## So how does this fit in? Let's break it down.

# Broadcasting Tensors

Broadcasting is the concept whose implementation allows us to add scalars to higher dimensional tensors.

We can see what the broadcasted scalar value looks like using the broadcast_to() Numpy function:

In [None]:
import numpy as np

np.broadcast_to(2, t1.shape)

array([[2, 2],
       [2, 2]])

This means the scalar value is transformed into a rank-2 tensor just like t1, and just like that, the shapes match and the element-wise rule of having the same shape is back in play. This is all under the hood of course.



In [None]:
t1 + 2

tensor([[3., 4.],
        [5., 6.]])

is really this:

In [None]:
t1 + torch.tensor(
    np.broadcast_to(2, t1.shape)
    ,dtype=torch.float32
)

tensor([[3., 4.],
        [5., 6.]])

### Comparison Operations Are Element-Wise:

For a given comparison operation between two tensors, a new tensor of the same shape is returned with each element containing either a torch.bool value of True or False.

### Element-Wise Comparison Operation Examples

In [None]:
t = torch.tensor([
    [0,5,0],
    [6,0,7],
    [0,8,0]
], dtype=torch.float32)

In [None]:
t.eq(0)

tensor([[ True, False,  True],
        [False,  True, False],
        [ True, False,  True]])

In [None]:
t.ge(0)

tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])

In [None]:
t.gt(0)

tensor([[False,  True, False],
        [ True, False,  True],
        [False,  True, False]])

In [None]:
t.lt(0)

tensor([[False, False, False],
        [False, False, False],
        [False, False, False]])

In [None]:
t.le(7)

tensor([[ True,  True,  True],
        [ True,  True,  True],
        [ True, False,  True]])

### Element-Wise Operations Using Functions
With element-wise operations that are functions, it's fine to assume that the function is applied to each element of the tensor.

Here are some examples:

In [None]:
t.abs()

tensor([[0., 5., 0.],
        [6., 0., 7.],
        [0., 8., 0.]])

In [None]:
t.sqrt()

tensor([[0.0000, 2.2361, 0.0000],
        [2.4495, 0.0000, 2.6458],
        [0.0000, 2.8284, 0.0000]])

In [None]:
t.neg()

tensor([[-0., -5., -0.],
        [-6., -0., -7.],
        [-0., -8., -0.]])

In [None]:
t.neg().abs()

tensor([[0., 5., 0.],
        [6., 0., 7.],
        [0., 8., 0.]])

### Some Terminology
There are some other ways to refer to element-wise operations, so I just wanted to mention that all of these mean the same thing:

* Element-wise
* Component-wise
* Point-wise

Just keep this in mind if you encounter any of these terms in the wild.

### Tensor Reduction Operations

A reduction operation on a tensor is an operation that reduces the number of elements contained within the tensor.

### Reduction Operation Example
Suppose we the following 3 x 3 rank-2 tensor:

In [None]:
t = torch.tensor([
    [0,1,0],
    [2,0,2],
    [0,3,0]
], dtype=torch.float32)


In [None]:
t.sum()

tensor(8.)

In [None]:
t.numel()

9

In [None]:
t.sum().numel()

1

In [None]:
t.sum().numel() < t.numel()

True

### Common Tensor Reduction Operations

In [None]:
print(t.sum())

print(t.prod())

print(t.mean())

tensor(8.)
tensor(0.)
tensor(0.8889)


All of these tensor methods reduce the tensor to a single element scalar valued tensor by operating on all the tensor's elements.

## Do reduction operations always reduce to a tensor with a single element?


### Reducing Tensors By Axes
To reduce a tensor with respect to a specific axis, we use the same methods, and we just pass a value for the dimension parameter. Let's see this in action.

Suppose we have the following tensor:

In [None]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

This is a 3 x 4 rank-2 tensor. Having different lengths for the two axes will help us understand these reduce operations.

Let's consider the sum() method again. Only, this time, we will specify a dimension to reduce. We have two axes so we'll do both. Check it out.

In [None]:
t.sum(dim=0)

tensor([6., 6., 6., 6.])

When I first saw this when I was learning how this works, I was confused. If you're confused like I was, I highly recommend you try to understand what's happening here before going forward.

Remember, we are reducing this tensor across the first axis, and elements running along the first axis are arrays, and the elements running along the second axis are numbers.

Let's go over what happened here.

### Understanding Reductions By Axes

In [None]:
t[0]


tensor([1., 1., 1., 1.])

In [None]:
t[1]


tensor([2., 2., 2., 2.])

In [None]:
t[2]

tensor([3., 3., 3., 3.])

In [None]:
t[0] + t[1] + t[2]

tensor([6., 6., 6., 6.])

###  Element-wise operations are in play here.

When we sum across the first axis, we are taking the summation of all the elements of the first axis. To do this, we must utilize element-wise addition.

The second axis in this tensor contains numbers that come in groups of four. Since we have three groups of four numbers, we get three sums.

In [None]:
t[0].sum()

tensor(4.)

In [None]:
t[1].sum()

tensor(8.)

In [None]:
t[2].sum()

tensor(12.)

In [None]:
t.sum(dim=1)

tensor([ 4.,  8., 12.])

# Argmax Tensor Reduction Operation

Argmax returns the index location of the maximum value inside a tensor.

When we call the argmax() method on a tensor, the tensor is reduced to a new tensor that contains an index value indicating where the max value is inside the tensor. Let's see this in code.

In [None]:
t = torch.tensor([
    [1,0,0,2],
    [0,3,3,0],
    [4,0,0,5]
], dtype=torch.float32)

In this tensor, we can see that the max value is the 5 in the last position of the last array.

In [None]:
t.max()

tensor(5.)

### Argmax() returns us the index

In [None]:
t.argmax()

tensor(11)

The first piece of code confirms for us that the max is indeed 5, but the call to the argmax() method tells us that the 5 is sitting at index 11. What's happening here?

We'll have a look at the flattened output for this tensor. If we don't specific an axis to the argmax() method, it returns the index location of the max value from the flattened tensor, which in this case is indeed 11.

Let's see how we can work with specific axes now.

In [None]:
t.max(dim=0)

torch.return_types.max(
values=tensor([4., 3., 3., 5.]),
indices=tensor([2, 1, 1, 2]))

In [None]:
t.max(dim=1)

torch.return_types.max(
values=tensor([2., 3., 5.]),
indices=tensor([3, 1, 3]))

In [None]:
t.argmax(dim=1)

tensor([3, 1, 3])

### Accessing Elements Inside Tensors

The last type of common operation that we need for tensors is the ability to access data from within the tensor. Let's look at these for PyTorch.

In [None]:
t = torch.tensor([
    [1,2,3],
    [4,5,6],
    [7,8,9]
], dtype=torch.float32)

In [None]:
t.mean()

tensor(5.)

In [None]:
t.mean().item()

5.0

Check out these operations on this one. When we call mean on this 3 x 3 tensor, the reduced output is a scalar valued tensor. If we want to actually get the value as a number, we use the item() tensor method. This works for scalar valued tensors.

### Have a look at how we do it with multiple values:

In [None]:
t.mean(dim=0).tolist()

[4.0, 5.0, 6.0]

In [None]:
t.mean(dim=0).numpy()

array([4., 5., 6.], dtype=float32)

When we compute the mean across the first axis, multiple values are returned, and we can access the numeric values by transforming the output tensor into a Python list or a NumPy array.