# PyTorch Fundamentals Cheat Sheet

### Why this notebook
- Serve as a hands-on reference for the core tensor operations you use in every PyTorch project.
- Provide executable examples you can copy into new experiments without digging through docs.
- Offer quick reminders on syntax, shapes, and device management choices.

### Learning objectives
- Create, inspect, and manipulate tensors of different ranks and datatypes.
- Understand when to reach for random, zero, one, and range initialisers.
- Practise moving tensors across CPU, MPS, and CUDA devices safely.

### Prerequisites
- PyTorch 2.x installed with optional GPU/MPS support.
- Basic Python knowledge and curiosity about tensor algebra.

### How to use this guide
1. Run the import cell to confirm your environment (version + device availability).
2. Execute each section sequentially; edit the examples to explore alternative shapes or dtypes.
3. Use the headings as anchor points when you need to revisit a concept later.
4. Capture your own notes beneath the provided cells so this becomes your personal quick reference.


In [65]:
import torch
import pandas as pd
import numpy as np
print(torch.__version__)
print(torch.mps.is_available())
#!nvidia-smi

2.9.0
True


## Introduction to Tensors
#### Creating tensors 'torch.tensor()'

In [66]:
# Scalar
scalar = torch.tensor(7)
print(scalar)
print(scalar.ndim)
print(scalar.item())

tensor(7)
0
7


In [67]:
# Vector
vector = torch.tensor([7,7])
print(vector)
print(vector.ndim)

tensor([7, 7])
1


In [68]:
# MATRIX
MATRIX = torch.tensor([[7,8],
                       [9,10]])
print(MATRIX)
print(MATRIX.ndim)
print(MATRIX[0])
print(MATRIX[1])
print(MATRIX.shape)

tensor([[ 7,  8],
        [ 9, 10]])
2
tensor([7, 8])
tensor([ 9, 10])
torch.Size([2, 2])


In [69]:
# TENSOR
TENSOR = torch.tensor([[[1,2,3],
                        [4,5,6],
                        [7,8,9]]])
print(TENSOR[0])
print(TENSOR[0][1])
print(TENSOR[0][1][2])
print(TENSOR.shape)
print(TENSOR.ndim)

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([4, 5, 6])
tensor(6)
torch.Size([1, 3, 3])
3


## Random Tensors (for randomizing weights when train start)

#### 'torch.rand()'

In [70]:
random_tensor = torch.rand(1, 3, 4)
print(random_tensor)
print(random_tensor.shape)
print(random_tensor.ndim)

tensor([[[0.1332, 0.9346, 0.5936, 0.8694],
         [0.5677, 0.7411, 0.4294, 0.8854],
         [0.5739, 0.2666, 0.6274, 0.2696]]])
torch.Size([1, 3, 4])
3


In [71]:
random_image_tensor = torch.rand(size=(3, 224,224))
print(random_image_tensor.shape)
print(random_image_tensor.ndim)

torch.Size([3, 224, 224])
3


## Zeros & Ones Tensors (for masking)

#### 'torch.zero'  'torch.ones()'

In [72]:
zeros = torch.zeros(size=(3,4))
print(zeros)
ones = torch.ones(size=(3,4))
print(ones)
zeros.dtype, ones.dtype

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


(torch.float32, torch.float32)

## Range of Tensors / tensor-like

#### 'torch.arange()'  'torch.zeros_like()/torch.ones_like()' for shape copy

In [73]:
range = torch.arange(start=0, end=1000, step=100)
print(range)
zeros_like = torch.zeros_like(input=range)
print(zeros_like)
ones_like = torch.ones_like(input=range)
print(ones_like)

tensor([  0, 100, 200, 300, 400, 500, 600, 700, 800, 900])
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])


## Tensor Datatypes (good for precision/time balance)
#### default type is float_32

In [74]:
float32_tensor = torch.tensor([3.0,6.0,9.0],
                              dtype=torch.float32, # Datatype of tensor
                              device='mps', # Device the tensor will be on
                              requires_grad=False) # Whether to or no to track gradients with operations
print(float32_tensor) 

float16_tensor = float32_tensor.type(torch.float16)
print(float16_tensor) 

tensor([3., 6., 9.], device='mps:0')
tensor([3., 6., 9.], device='mps:0', dtype=torch.float16)


In [75]:
int32_tensor = torch.tensor([3,4,5], dtype=torch.int32, device='mps')
int32_tensor

tensor([3, 4, 5], device='mps:0', dtype=torch.int32)

In [76]:
float16_tensor * int32_tensor

tensor([ 9., 24., 45.], device='mps:0', dtype=torch.float16)

In [77]:
some_tensor = torch.rand(3,3)
some_tensor, some_tensor.dtype, some_tensor.shape, some_tensor.device

(tensor([[0.9663, 0.7687, 0.4566],
         [0.5745, 0.9200, 0.3230],
         [0.8613, 0.0919, 0.3102]]),
 torch.float32,
 torch.Size([3, 3]),
 device(type='cpu'))

## Manipulating Tensors

In [78]:
tensor = torch.tensor([1,2,3,4,5])
tensor + 10, tensor - 10, tensor * 10, tensor / 10,

(tensor([11, 12, 13, 14, 15]),
 tensor([-9, -8, -7, -6, -5]),
 tensor([10, 20, 30, 40, 50]),
 tensor([0.1000, 0.2000, 0.3000, 0.4000, 0.5000]))

In [79]:
torch.add(tensor, 10), torch.sub(tensor, 10), torch.mul(tensor, 10), torch.div(tensor, 10)

(tensor([11, 12, 13, 14, 15]),
 tensor([-9, -8, -7, -6, -5]),
 tensor([10, 20, 30, 40, 50]),
 tensor([0.1000, 0.2000, 0.3000, 0.4000, 0.5000]))

#### Matrix Multiplication (Dot Product)

In [80]:
%time
torch.matmul(tensor, tensor)
tensor @ tensor

CPU times: user 1 μs, sys: 0 ns, total: 1 μs
Wall time: 14.1 μs


tensor(55)

In [81]:
torch.matmul(torch.rand([5,10]), torch.rand([10,4])).shape

torch.Size([5, 4])

#### Transpose

In [82]:
tensor_A = torch.tensor([[1,2],
                        [3,4],
                        [5,6]])
tensor_B = torch.tensor([[7,10],
                        [8,11],
                        [9,12]])
tensor_A.shape,tensor_B.shape

torch.mm(tensor_A, tensor_B.T)

tensor([[ 27,  30,  33],
        [ 61,  68,  75],
        [ 95, 106, 117]])

### Tensor Aggregation (min max mean sum)

In [83]:
x = torch.arange(0,10000, step=10)
torch.min(x), x.min(), torch.max(x), x.max(), torch.mean(x.type(torch.float32)), x.type(torch.float32).mean(), torch.sum(x), x.sum()

(tensor(0),
 tensor(0),
 tensor(9990),
 tensor(9990),
 tensor(4995.),
 tensor(4995.),
 tensor(4995000),
 tensor(4995000))

### Postitional min max (arg)

In [84]:
torch.argmin(x), x.argmin(), torch.argmax(x), x.argmax()

(tensor(0), tensor(0), tensor(999), tensor(999))

### Reshaping view stacking squeezing unsqueezing permute

In [85]:
import torch
x = torch.arange(1.,10.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [86]:
x_reshaped = x.reshape(1,9) # should sum up to same size
x_reshaped, x_reshaped.shape 

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [87]:
z = x.view(1,9) # share same memory as x, changing z changes x
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [88]:
z[:,0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.]))

In [89]:
torch.stack([x,x,x], dim = 1), torch.stack([x,x,x], dim = 0)

(tensor([[5., 5., 5.],
         [2., 2., 2.],
         [3., 3., 3.],
         [4., 4., 4.],
         [5., 5., 5.],
         [6., 6., 6.],
         [7., 7., 7.],
         [8., 8., 8.],
         [9., 9., 9.]]),
 tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.]]))

In [90]:
torch.vstack([x,x,x]), torch.hstack([x,x,x])

(tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8., 9., 5., 2., 3., 4., 5., 6., 7., 8., 9.,
         5., 2., 3., 4., 5., 6., 7., 8., 9.]))

In [91]:
x_reshaped

tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])

In [92]:
x_reshaped.size()

torch.Size([1, 9])

In [93]:
x_reshaped.squeeze() # removes all 1 dims

tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])

In [94]:
x_reshaped.squeeze().shape

torch.Size([9])

In [95]:
x_reshaped.squeeze().unsqueeze(dim=1).shape, x_reshaped.squeeze().unsqueeze(dim=1).unsqueeze(dim=2).shape

(torch.Size([9, 1]), torch.Size([9, 1, 1]))

In [96]:
x_image = torch.rand(size=(224,224,3)) # height width color_channel
x_perm = x_image.permute(2, 0, 1) # rearrange dimensions/axis (usually with images) by indexes
print(x_image.shape, x_perm.shape)

torch.Size([224, 224, 3]) torch.Size([3, 224, 224])


In [97]:
x_image[0,0,0] = 720
x_image[0,0,0], x_perm[0,0,0] # is like view

(tensor(720.), tensor(720.))

### Indexing

In [98]:
import torch

In [99]:
x = torch.arange(1,10).reshape(1,3,3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [100]:
x[0][0][0], x[0][0], x[0]

(tensor(1),
 tensor([1, 2, 3]),
 tensor([[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]))

In [101]:
x[:,1] # all of target dim

tensor([[4, 5, 6]])

In [102]:
x[:,:,1] # all val of 0-1 dem but only index 1 of 2nd

tensor([[2, 5, 8]])

In [103]:
x[:,1,1] # all val of 0 dim with only 1 index val of 1 and 2 dim

tensor([5])

In [104]:
x[0,0,:]# get index 0 of 0th and 1st dim and all of 2nd

tensor([1, 2, 3])

### PyTorch Tensors and Numpy

numpy -> tensor   torch.from_numpy(ndarray)

In [105]:
import torch
import numpy as np
array = np.arange(1.0,8.0)
tensor = torch.from_numpy(array).type(torch.float32) # dont share same memory
torch.arange(1.0,8.0), torch.arange(1.0,8.0).dtype, tensor, tensor.dtype

(tensor([1., 2., 3., 4., 5., 6., 7.]),
 torch.float32,
 tensor([1., 2., 3., 4., 5., 6., 7.]),
 torch.float32)

In [106]:
tensor = torch.ones(7)
numpy_tensor = tensor.numpy() # dont share same memory
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

### Reproducibility (taking random out of random)
random seed

In [107]:
RANDOM_SEED = 42

torch.manual_seed(RANDOM_SEED) # flavor only for one block of code
random_tensor_A = torch.rand(3,3)

torch.manual_seed(RANDOM_SEED)
random_tensor_B = torch.rand(3,3)

print(random_tensor_A, random_tensor_B)
print(random_tensor_A == random_tensor_B)

tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009],
        [0.2566, 0.7936, 0.9408]]) tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009],
        [0.2566, 0.7936, 0.9408]])
tensor([[True, True, True],
        [True, True, True],
        [True, True, True]])


### Accessing a GPU

In [108]:
# !nvidia-smi

In [109]:
# device agnostic code
device = 'mps' if torch.mps.is_available() else 'cpu'
device

'mps'

In [110]:
# putting tensors and models on gpu
tensor = torch.tensor([1,2,3], device='cpu')
print(tensor, tensor.device)
tensor = tensor.to(device)
print(tensor)

tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='mps:0')


In [111]:
tensor.numpy()

TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [None]:
# back to cpu for numpy
tensor = tensor.cpu().numpy()
tensor

array([1, 2, 3], dtype=int64)

## Where to go next
- Re-run the tensor operations on GPU/MPS to benchmark speed differences.
- Chain the primitives here into mini linear layers to rehearse forward passes.
- Jump to `2-Introduction_to_PyTorch_Workflow.ipynb` for a full training loop walkthrough.
- Bookmark this notebook—treat it as your scratchpad for tensor tricks you discover.