we extensively use PyTorch for implementing our deep learning models. PyTorch is an
open source, community­driven deep learning framework. Unlike Theano, Caffe, and TensorFlow,
PyTorch implements a tape­based
a
utomatic differentiation method that allows us to define and
execute computational graphs dynamically. This is extremely helpful for debugging and also for
constructing sophisticated models with minimal effort.


PyTorch is an optimized tensor manipulation library that offers an array of packages for deep learning.
At the core of the library is the tensor, which is a mathematical object holding some multidimensional
data. A tensor of order zero is just a number, or a scalar. A tensor of order one (1st­order tensor) is an
array of numbers, or a vector. Similarly, a 2nd­order tensor is an array of vectors, or a matrix.
Therefore, a tensor can be generalized as an n­dimensional array of scalars,

In [1]:
#Creating tensors
def describe(x):
  print("Type: {}".format(x.type()))
  print("Shape/size: {}".format(x.shape))
  print("Values: \n{}".format(x))

In [2]:
import torch
describe(torch.Tensor(2,3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1.6877e+25, 1.7612e+19, 8.1446e-33],
        [1.3563e-19, 1.3563e-19, 1.3563e-19]])


In [3]:
describe(torch.rand(2, 3)) # uniform random
describe(torch.randn(2, 3)) # random normal

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.5367, 0.4370, 0.2746],
        [0.8679, 0.7314, 0.1032]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[-1.3305, -0.0389,  0.3890],
        [ 0.5070,  1.0439,  0.1921]])


In [4]:
import torch
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 0., 0.],
        [0., 0., 0.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 1., 1.],
        [1., 1., 1.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[5., 5., 5.],
        [5., 5., 5.]])


In [5]:
x = torch.Tensor([[1, 2, 3],
[4, 5, 6]])
describe(x)


Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])


The values can either come from a list, as in the preceding example, or from a NumPy array. And, of
course, we can always go from a PyTorch tensor to a NumPy array, as well. Notice that the type of the
tensor is DoubleTensor instead of the default FloatTensor (see the next section). This
corresponds with the data type of the NumPy random matrix, a float64

In [6]:
import torch
import numpy as np
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))


Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.8765, 0.9921, 0.2820],
        [0.5108, 0.8104, 0.7614]], dtype=torch.float64)


In [7]:
x = torch.FloatTensor([[1, 2, 3],
[4, 5, 6]])
describe(x)


Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])


In [8]:
x = x.long()
describe(x)


Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])


In [9]:
import torch
x = torch.randn(2, 3)
describe(x)


Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[-1.3648,  0.0116,  0.4653],
        [-0.4047, -0.4885, -0.2205]])


In [10]:
describe(torch.add(x, x))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[-2.7296,  0.0232,  0.9305],
        [-0.8095, -0.9771, -0.4410]])


In [11]:
describe(x + x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[-2.7296,  0.0232,  0.9305],
        [-0.8095, -0.9771, -0.4410]])


In [12]:
x = torch.arange(6)
describe(x)


Type: torch.LongTensor
Shape/size: torch.Size([6])
Values: 
tensor([0, 1, 2, 3, 4, 5])


In [13]:
describe(torch.sum(x, dim=0))

Type: torch.LongTensor
Shape/size: torch.Size([])
Values: 
15


In [14]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)


Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [15]:
describe(x[:1, :2])


Type: torch.LongTensor
Shape/size: torch.Size([1, 2])
Values: 
tensor([[0, 1]])


In [16]:
indices = torch.LongTensor([0, 2])
describe(torch.index_select(x, dim=1, index=indices))


Type: torch.LongTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[0, 2],
        [3, 5]])


In [17]:
indices=torch.LongTensor([0,1])
describe(torch.index_select(x,dim=1,index=indices))

Type: torch.LongTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[0, 1],
        [3, 4]])


In [18]:
indices = torch.LongTensor([0, 0])
describe(torch.index_select(x, dim=1, index=indices))


Type: torch.LongTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[0, 0],
        [3, 3]])


In [19]:
indices=torch.LongTensor([0,0])
describe(torch.index_select(x,dim=1,index=indices))
#Here we are using longtensor to create a indeces, since we are using 0,0 it will takes zero indices and copy it to create a dimensional 1 tensor

Type: torch.LongTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[0, 0],
        [3, 3]])


In [20]:
x=torch.arange(6).view(2,3)
describe(x)


Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [21]:
describe(torch.cat([x, x], dim=0))


Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])


In [22]:
describe(torch.cat([x, x], dim=1))

Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])


In [23]:
describe(torch.stack([x,x]))

Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])


In [24]:
import torch
x1 = torch.arange(6).view(2, 3)
describe(x1)


Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [25]:
x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)


Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[1., 2.],
        [1., 2.],
        [1., 2.]])


In [26]:
x = torch.ones(2, 2, requires_grad=True)
describe(x)
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
True


In [27]:
y = (x + 2) * (x + 5) + 3
describe(y)
print(x.grad is None)


Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[21., 21.],
        [21., 21.]], grad_fn=<AddBackward0>)
True


This line checks if the grad attribute of the tensor x is None. When you perform operations on tensors with requires_grad=True, PyTorch automatically tracks the operations to compute gradients during backpropagation. However, before any gradients are computed, the grad attribute is typically None. If this prints True, it means that the gradient of x has not been computed yet.

In [28]:
z = y.mean()
describe(z)
z.backward()
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([])
Values: 
21.0
False


In [29]:
torch.cuda.is_available()

False

In [30]:
import torch

# Check if CUDA (GPU support) is available
if torch.cuda.is_available():
    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()
    print("Available GPU(s):")
    for i in range(num_gpus):
        gpu_name = torch.cuda.get_device_name(i)
        print(f"GPU {i}: {gpu_name}")
else:
    print("No GPU available, using CPU instead.")


No GPU available, using CPU instead.


In [31]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (device)

cpu


In [32]:
x = torch.rand(3, 3).to(device)
describe(x)
#movees to available devices
#To operate on CUDA and non­CUDA objects, we need to ensure that they are on the same device. If we don’t, the computations will break

Type: torch.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.6681, 0.6807, 0.1296],
        [0.9128, 0.2427, 0.4240],
        [0.9963, 0.5990, 0.2590]])


In [33]:
y = torch.rand(3, 3)
x + y
#This errors occurs because one of the values is in cuda and another is in cpu

tensor([[1.6219, 1.5714, 0.8200],
        [1.6162, 1.0876, 0.4605],
        [1.3038, 0.6765, 0.3624]])

In [34]:
cpu_device = torch.device("cpu")
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y


tensor([[1.6219, 1.5714, 0.8200],
        [1.6162, 1.0876, 0.4605],
        [1.3038, 0.6765, 0.3624]])

Keep in mind that it is expensive to move data back and forth from the GPU. Therefore, the typical
procedure involves doing many of the parallelizable computations on the GPU and then transferring
just the final result back to the CPU. This will allow you to fully utilize the GPUs. If you have several
CUDA­visible devices (i.e., multiple GPUs), the best practice is to use the
CUDA_VISIBLE_DEVICES environment variable when executing the program

In [35]:
CUDA_VISIBLE_DEVICES=0,1,2,3


Exercises

In [47]:
#Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0
a=torch.rand(3,3)
a=a.unsqueeze(0)
print(a.size())

torch.Size([1, 3, 3])


In [49]:
#Remove the extra dimension you just added to the previous tensor.
a= a.squeeze(0)
print(a.size())

torch.Size([3, 3])


In [50]:
#Create a random tensor of shape 5x3 in the interval [3, 7)
rand_tensor=torch.rand(5,3)
scaled_rand_tensor = rand_tensor * (7 - 3) + 3

print(scaled_rand_tensor)

tensor([[6.1259, 5.0900, 3.2454],
        [5.8836, 3.1604, 6.6260],
        [3.8104, 6.2639, 5.7533],
        [6.9629, 3.2211, 4.3908],
        [5.9339, 4.8109, 3.1001]])


In [51]:
# Create a tensor with values from a normal distribution (mean=0, std=1).
a = torch.rand(3, 3)
a.normal_()

tensor([[ 0.1838, -1.6035,  0.7309],
        [-1.0302,  1.2813,  1.8446],
        [ 0.3104, -0.2775,  0.7103]])

In [52]:
#Retrieve the indexes of all the nonzero elements in the tensor
torch.Tensor([1, 1, 1,0, 1])
torch.nonzero(a)

tensor([[0, 0],
        [0, 1],
        [0, 2],
        [1, 0],
        [1, 1],
        [1, 2],
        [2, 0],
        [2, 1],
        [2, 2]])

In [53]:
# Create a random tensor of size (3,1) and then horizontally stack four copies together.
a=torch.rand(3,1)
a.expand(3,4)

tensor([[0.9518, 0.9518, 0.9518, 0.9518],
        [0.4954, 0.4954, 0.4954, 0.4954],
        [0.8395, 0.8395, 0.8395, 0.8395]])

In [54]:
# Return the batch matrix­matrix product of two three­dimensional matrices
# (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).
a = torch.rand(3, 4, 5)
b = torch.rand(3, 5, 4)
torch.bmm(a, b)

tensor([[[0.9854, 1.3888, 0.8378, 1.4988],
         [0.4422, 0.5624, 0.3749, 0.7238],
         [0.4101, 0.9566, 0.3774, 1.0234],
         [0.7446, 0.8541, 0.5801, 1.7118]],

        [[1.1772, 1.6499, 1.0951, 1.0892],
         [1.4830, 1.5132, 1.1408, 1.3860],
         [0.6391, 1.0437, 0.4127, 0.5187],
         [1.1489, 1.7397, 0.9782, 1.0305]],

        [[0.2920, 0.6721, 0.6879, 0.8783],
         [0.5490, 1.1540, 1.0215, 1.2788],
         [1.6487, 2.1102, 1.5260, 1.5979],
         [0.3556, 0.9698, 0.8085, 0.3823]]])

In [55]:
# Return the batch matrix ­matrix product of a 3D matrix and a 2D matrix
# (a=torch.rand(3,4,5), b=torch.rand(5,4)).
a = torch.rand(3, 4, 5)
b = torch.rand(5, 4)
torch.bmm(a, b.unsqueeze(0).expand(a.size(0), *b.size()))

tensor([[[1.3910, 0.7831, 1.8639, 1.3191],
         [0.8465, 0.8477, 1.6496, 1.2122],
         [0.9502, 0.8119, 1.3379, 0.9255],
         [1.2643, 0.6615, 1.4453, 0.9605]],

        [[1.4117, 0.7725, 1.7482, 1.2301],
         [2.1709, 1.7766, 3.0482, 1.9570],
         [1.4136, 1.2976, 2.1164, 1.2363],
         [1.9624, 1.8256, 2.7954, 1.6524]],

        [[1.4171, 1.0198, 1.6781, 0.8490],
         [1.8919, 1.5236, 2.7651, 1.7942],
         [1.9107, 1.5996, 2.4371, 1.3646],
         [1.5211, 1.4489, 2.0934, 1.2512]]])