<a href="https://colab.research.google.com/github/AsRumi/PyTorch/blob/main/PyTorch_Tutorial_00.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
torch.__version__

'2.5.1+cu121'

In [None]:
my_scalar = torch.tensor(10) # A scalar is also known as a 0-dimension tensor.
print(f"Value: {my_scalar.item()}\nDimensions: {my_scalar.ndim}") # Use .item() to get the value of the tensor.

Value: 10
Dimensions: 0


In [None]:
my_vector = torch.tensor([3, 4, 6, 7])
print(f"Value: {my_vector}\nDimensions: {my_vector.ndim}\nShape: {my_vector.shape} # Shape gives you information on how the elements inside the tensor are arranged.")

Value: tensor([3, 4, 6, 7])
Dimensions: 1
Shape: torch.Size([4]) # Shape gives you information on how the elements inside the tensor are arranged.


For this tensor: my_vector = torch.tensor([3, 4, 6, 7])

You can easily find out its dimension by counting the square brackets on one side.

The shape is 4 because there are 4 elements.

Let us look at a matrix now:

In [None]:
MATRIX = torch.tensor([[7, 8],
                       [9, 10]])
print(f"Value: {MATRIX}\nDimensions: {MATRIX.ndim}\nShape: {MATRIX.shape}")

Value: tensor([[ 7,  8],
        [ 9, 10]])
Dimensions: 2
Shape: torch.Size([2, 2])


Shape here is torch.Size([2, 2]) because the matrix is 2 elements long and each element is 2 elements deep.

In [None]:
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
print(f"Values: {TENSOR}\nDimensions: {TENSOR.ndim}\nShape: {TENSOR.shape}")

Values: tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])
Dimensions: 3
Shape: torch.Size([1, 3, 3])


Shape of the above tensor is: torch.Size([1, 3, 3]) because there is one array that is an array of 3 arrays, and in each of those arrays, there are 3 elements.

By convention, MATRICES and TENSORS are capital letters and scalars and vectors are small letters.

Let us see how to create tensors of random values:

In [None]:
RANDOM_TENSOR = torch.rand(size = (3, 4))
print(RANDOM_TENSOR)

tensor([[0.7847, 0.6667, 0.0871, 0.7380],
        [0.9234, 0.6446, 0.4809, 0.2892],
        [0.0010, 0.7478, 0.6425, 0.4239]])


In [None]:
RANDOM_IMAGE = torch.rand(size = (3, 150, 150))
print(RANDOM_IMAGE)

tensor([[[0.7030, 0.0290, 0.7005,  ..., 0.2096, 0.5990, 0.4417],
         [0.2599, 0.7018, 0.8359,  ..., 0.9673, 0.7879, 0.2665],
         [0.6850, 0.8391, 0.8864,  ..., 0.5404, 0.9216, 0.4953],
         ...,
         [0.6295, 0.7250, 0.7176,  ..., 0.2059, 0.4555, 0.6781],
         [0.8312, 0.9165, 0.8854,  ..., 0.3438, 0.7061, 0.9960],
         [0.9499, 0.7638, 0.4621,  ..., 0.8087, 0.2263, 0.9102]],

        [[0.7198, 0.0677, 0.7822,  ..., 0.0637, 0.3627, 0.7509],
         [0.5257, 0.0658, 0.5591,  ..., 0.7355, 0.5196, 0.6017],
         [0.3362, 0.6790, 0.2397,  ..., 0.2857, 0.8156, 0.7435],
         ...,
         [0.2234, 0.6910, 0.4790,  ..., 0.6760, 0.1123, 0.8314],
         [0.6164, 0.2266, 0.7327,  ..., 0.6164, 0.8318, 0.9846],
         [0.8163, 0.2687, 0.1712,  ..., 0.8506, 0.9791, 0.8769]],

        [[0.4478, 0.1770, 0.4997,  ..., 0.5644, 0.0091, 0.2000],
         [0.7956, 0.4916, 0.3361,  ..., 0.4644, 0.0875, 0.9191],
         [0.7098, 0.1668, 0.9279,  ..., 0.1078, 0.2787, 0.

In [None]:
# Create a tensor of all zeros
zeros = torch.zeros(size=(3, 4))
print(zeros, zeros.dtype)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]) torch.float32


In [None]:
# Create a tensor of all ones
ones = torch.ones(size=(3, 4))
print(ones, ones.dtype)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]) torch.float32


In [None]:
# Create a range of values 0 to 10
zero_to_ten = torch.arange(start = 0, end = 10, step = 1)
print(zero_to_ten)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


In [None]:
# Can also create a tensor of zeros similar to another tensor
ten_zeros = torch.zeros_like(input=zero_to_ten) # will have same shape of zero_to_ten
print(ten_zeros)

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])


Tensors also have datatypes: 16, 32, 64-bit float || 8, 16, 32, 64-bit integer

The more precise a tensor is (more number of bits), the more accurate your machine learning algorithm will be, however, also will take up a lot of resources and time during training on these tensors.

torch.float | torch.float32 - 32 bit float

torch.half | torch.float16 - 16 bit float

torch.double | torch.float64 - 64 bit float

By default PyTorch creates tensors of 32 bit floating datatype, if you want to change this behavior, use the "dtype" parameter when declaring a tensor.

In [None]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

Aside from shape issues (tensor shapes don't match up), two of the other most common issues you'll come across in PyTorch are datatype and device issues.

For example, one of tensors is torch.float32 and the other is torch.float16 (PyTorch often likes tensors to be the same format).

Or one of your tensors is on the CPU and the other is on the GPU (PyTorch likes calculations between tensors to be on the same device).

In [None]:
# 16 bit floating tensor:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype = torch.float16) # torch.half would also work

float_16_tensor.dtype, float_16_tensor.device

(torch.float16, device(type='cpu'))

Most common attributes you'll want to find out about tensors are:

shape - what shape is the tensor? (some operations require specific shape rules)

dtype - what datatype are the elements within the tensor stored in?

device - what device is the tensor stored on? (usually GPU or CPU)

In [None]:
float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

Addition (+), subtraction (-), mutliplication (*) of tensors.

In [None]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [None]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

Values inside the tensor don't change unless they're reassigned.

In [None]:
# Subtract and reassign
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [None]:
tensor = tensor + 10
tensor

tensor([1, 2, 3])


PyTorch also has a bunch of built-in functions like torch.multiply() and torch.add() to perform basic operations.

In [None]:
ten_times_tensor = torch.multiply(tensor, 10)
ten_times_tensor

tensor([10, 20, 30])

In [None]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


### Matrix Multiplication

PyTorch implements matrix multiplication functionality in the torch.matmul() method.

The main two rules for matrix multiplication to remember are:

The inner dimensions must match:

(3, 2) @ (3, 2) won't work

(2, 3) @ (3, 2) will work

(3, 2) @ (2, 3) will work

The resulting matrix has the shape of the outer dimensions:

(2, 3) @ (3, 2) -> (2, 2)

(3, 2) @ (2, 3) -> (3, 3)

In [None]:
# Matrix multiplication
matrix_multiplied_tensor = torch.matmul(tensor, tensor)
print(f"{tensor} @ {tensor}\nEquals: {matrix_multiplied_tensor}")

tensor([1, 2, 3]) @ tensor([1, 2, 3])
Equals: 14


In [None]:
%%time
# Matrix multiplication by hand
# (avoid doing operations with for loops at all cost, they are computationally expensive)
value = 0
for i in range(len(tensor)):
  value += tensor[i] * tensor[i]
value

CPU times: user 708 µs, sys: 1 ms, total: 1.71 ms
Wall time: 7.76 ms


tensor(14)

In [None]:
%%time
torch.matmul(tensor, tensor)

CPU times: user 80 µs, sys: 9 µs, total: 89 µs
Wall time: 93.2 µs


tensor(14)

Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches.

One of the ways to make matrices compatible for matrix multiplication is to transpose the matrices.

In [None]:
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11],
                         [9, 12]], dtype=torch.float32)

print(tensor_A)
print(tensor_B)
print("These are not compatible for matrix multiplication, but transposing one of these will favor.")
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7., 10.],
        [ 8., 11.],
        [ 9., 12.]])
These are not compatible for matrix multiplication, but transposing one of these will favor.
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [None]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output)
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


In [None]:
# torch.mm is a shortcut for matmul
torch.mm(tensor_A, tensor_B.T)

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Neural networks are full of matrix multiplications and dot products.

The `torch.nn.Linear()` module, also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input `x` and a weights matrix `A`.

$$ y = x\cdot{A^T} + b $$

Where:

`x` is the input to the layer (deep learning is a stack of layers like `torch.nn.Linear()` and others on top of each other).

`A` is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "`T`", that's because the weights matrix gets transposed).

Note: You might also often see `W` or another letter like `X` used to showcase the weights matrix.

`b` is the bias term used to slightly offset the weights and inputs.

`y` is the output (a manipulation of the input in the hopes to discover patterns in it).

In [None]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input
                         out_features=6) # out_features = describes outer value
x = torch.tensor([[1, 2],
                  [3, 4],
                  [5, 6]], dtype=torch.float32) # shape of this tensor is (3, 2)
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


Finding the min, max, mean, sum, etc (aggregation)

In [None]:
x = torch.arange(0, 100, 10)
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error because mean expects the values to be of a float datatype.
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

print(f"Index where max value occurs: {x.argmax()}")
print(f"Index where min value occurs: {x.argmin()}")

Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450
Index where max value occurs: 9
Index where min value occurs: 0


You can change the datatypes of tensors using `torch.Tensor.type(dtype=None)` where the dtype parameter is the datatype you'd like to use.

In [None]:
# Create an int8 tensor
print(x, x.dtype)
tensor_int8 = x.type(torch.int8)
tensor_int8

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90]) torch.int64


tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=torch.int8)

Mobile-based neural networks often operate with 8-bit integers, smaller and faster to run but less accurate than their float32 counterparts.

### Reshaping, stacking, squeezing and unsqueezing

`torch.reshape(input, shape)`	 -> Reshapes input to shape (if compatible), can also use `torch.Tensor.reshape()`.

`Tensor.view(shape)` -> Returns a view of the original tensor in a different shape but shares the same data as the original tensor.

`torch.stack(tensors, dim=0)` -> Concatenates a sequence of tensors along a new dimension (dim), all tensors must be same size.

`torch.squeeze(input)` -> Squeezes input to remove all the dimenions with value 1.

`torch.unsqueeze(input, dim)` -> Returns input with a dimension value of 1 added at dim.

`torch.permute(input, dims)` -> Returns a view of the original input with its dimensions permuted (rearranged) to dims.

In [None]:
# Create a tensor
import torch
x = torch.arange(1., 9.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8.]), torch.Size([8]))

In [None]:
# Reshape to a different dimension this time
x_reshaped = x.reshape(2, 4)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4.],
         [5., 6., 7., 8.]]),
 torch.Size([2, 4]))

In [None]:
# Add an extra dimension
x_reshaped = x.reshape(1, 8)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8.]]), torch.Size([1, 8]))

In [None]:
# Change view (keeps same data as original but changes view)
# See more: https://stackoverflow.com/a/54507446/7900723
z = x.view(8, 1)
z, z.shape

(tensor([[1.],
         [2.],
         [3.],
         [4.],
         [5.],
         [6.],
         [7.],
         [8.]]),
 torch.Size([8, 1]))

Changing the view of a tensor with torch.view() really only creates a new view of the same tensor, therefore changing z changes x.

In [None]:
# Changing z changes x
z[3] = 0
z, x

(tensor([[1.],
         [2.],
         [3.],
         [0.],
         [5.],
         [6.],
         [7.],
         [8.]]),
 tensor([1., 2., 3., 0., 5., 6., 7., 8.]))

If we wanted to stack our new tensor on top of itself 4 times, we could do so with `torch.stack()`.

In [None]:
x_stacked = torch.stack([x, x, x, x], dim=-2) # Dimension expected to be in range of [-2, 1], 0 and -2 for rows, 1 and -1 for columns
x_stacked

tensor([[1., 2., 3., 0., 5., 6., 7., 8.],
        [1., 2., 3., 0., 5., 6., 7., 8.],
        [1., 2., 3., 0., 5., 6., 7., 8.],
        [1., 2., 3., 0., 5., 6., 7., 8.]])

`torch.squeeze()` removes all single dimensions from a tensor.

In [None]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[1., 2., 3., 0., 5., 6., 7., 8.]])
Previous shape: torch.Size([1, 8])

New tensor: tensor([1., 2., 3., 0., 5., 6., 7., 8.])
New shape: torch.Size([8])



And to do the reverse of `torch.squeeze()` you can use `torch.unsqueeze()` to add a dimension value of 1 at a specific index.

In [None]:
print(f"Previous tensor: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

# Add an extra dimension with unsqueeze
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
print(f"\nNew tensor: {x_unsqueezed}")
print(f"New shape: {x_unsqueezed.shape}")

Previous tensor: tensor([1., 2., 3., 0., 5., 6., 7., 8.])
Previous shape: torch.Size([8])

New tensor: tensor([[1., 2., 3., 0., 5., 6., 7., 8.]])
New shape: torch.Size([1, 8])


You can also rearrange the order of axes values with `torch.permute(input, dims)`, where the input gets turned into a view with new dims.

In [None]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


`torch.permute()` returns a view, therefore changing any values inside the permuted tensor will also change values inside the original tensor.

Indexing a tensor:

In [None]:
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [None]:
print(f"First square bracket:\n{x[0]}")
print(f"Second square bracket: {x[0][0]}")
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


You can also use `:` to specify "all values in this dimension" and then use a comma (`,`) to add another dimension.

In [None]:
# Get all values of 0th dimension and the 0 index of 1st dimension
print(x[:, 0])
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension, prints the middle column of the tensor
print(x[:, :, 1])
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
print(x[:, 1, 1])
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
print(x[0, 0, :]) # same as x[0][0]

tensor([[1, 2, 3]])
tensor([[2, 5, 8]])
tensor([5])
tensor([1, 2, 3])


NumPy to PyTorch:

`torch.from_numpy(ndarray)` - NumPy array -> PyTorch tensor.

`torch.Tensor.numpy()` - PyTorch tensor -> NumPy array.

In [None]:
# NumPy array to tensor
import torch
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

By default, NumPy arrays are created with the datatype float64 and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above).

However, many PyTorch calculations default to using float32.

So if you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), you can use `tensor = torch.from_numpy(array).type(torch.float32)`.

In [None]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

### Reproducibility

To get similar results during multiple experiments.

In [None]:
tensor_A = torch.rand(size = (3, 4))
tensor_B = torch.rand(size = (3, 4))

print(tensor_A)
print(tensor_B)
print(tensor_A == tensor_B)

tensor([[0.5221, 0.4282, 0.5457, 0.1361],
        [0.3974, 0.4310, 0.4273, 0.3377],
        [0.4408, 0.2849, 0.6686, 0.0769]])
tensor([[1.9903e-01, 6.1995e-01, 3.0576e-01, 1.1145e-01],
        [8.8489e-04, 8.8991e-01, 2.0506e-01, 5.3740e-01],
        [3.0945e-01, 9.8135e-01, 5.3891e-01, 7.0623e-01]])
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


But what if you wanted to create two random tensors with the same values.

That's where `torch.manual_seed(seed)` comes in, where seed is an integer (like `42` but it could be anything) that flavours the randomness.

In [None]:
import random

RANDOM_SEED=42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called
# Without this, tensor_D would be different to tensor_C
torch.manual_seed(seed=42) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])



tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

Seed sets the randomness, if you have the same seed, the randomness in that seed is 0.

[The PyTorch Reproducibility Documentation](https://pytorch.org/docs/stable/notes/randomness.html)

Results may not be reproducible between CPU and GPU executions, even when using identical seeds.

## Using GPU with PyTorch to make computations

To set up PyTorch on AWS or your own PC and use an available GPU, follow [this link](https://www.learnpytorch.io/00_pytorch_fundamentals/#1-getting-a-gpu).

To access GPU on Apple devices, follow [this link](https://www.learnpytorch.io/00_pytorch_fundamentals/#21-getting-pytorch-to-run-on-apple-silicon).

In [None]:
# Check for a GPU
import torch
torch.cuda.is_available()

True

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
if torch.cuda.is_available():
    device = "cuda" # Use NVIDIA GPU (if available)
elif torch.backends.mps.is_available():
    device = "mps" # Use Apple Silicon GPU (if available)
else:
    device = "cpu" # Default to CPU if no GPU is available

print(device)

cuda


In PyTorch, it's best practice to write device agnostic code. This means code that'll run on CPU (always available) or GPU (if available).

Knowing the number of GPUs PyTorch has access to is helpful incase you wanted to run a specific process on one GPU and another process on another (PyTorch also has features to let you run a process across all GPUs).

In [None]:
# Counting the number of GPUs available:
torch.cuda.device_count()

1

You can put tensors (and models, we'll see this later) on a specific device by calling `to(device)` on them. Where device is the target device you'd like the tensor (or model) to go to.

To change an existing tensor’s `torch.device` and/or `torch.dtype`, consider using `to()` method on the tensor.

In [None]:
tensor = torch.tensor([1, 2, 3]) # Default on CPU

print(tensor, tensor.device)

tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu


tensor([1, 2, 3], device='cuda:0')

Notice the second tensor has `device='cuda:0'`, this means it's stored on the 0th GPU available (GPUs are 0 indexed, if two GPUs were available, they'd be '`cuda:0`' and '`cuda:1`' respectively, up to '`cuda:n`').

You might want to move the tensor back on the CPU because numpy does not work with tensors on the GPU.

In [None]:
# Copy the tensor back to cpu
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

In [None]:
# This tensor is still on the GPUs memory
print(tensor_on_gpu)

# To check if any tensor is on the GPU or not
print(tensor_on_gpu.is_cuda)

tensor([1, 2, 3], device='cuda:0')
True


### Creating tensors with gradients for automatic differentiation

A tensor can be created with `requires_grad=True` so that `torch.autograd` records operations on them for automatic differentiation.

In [None]:
x = torch.tensor([[1., -1.], [1., 1.]], requires_grad=True)
out = x.pow(2).sum()
out.backward()
x.grad

tensor([[ 2., -2.],
        [ 2.,  2.]])

For all functions and attributes of a tensor, check [this link](https://pytorch.org/docs/stable/tensors.html#id9) out.