# PyTorch: Learn the Basics

* Tensors
* Datasets & DataLoaders
* Transforms
* Build Model
* Autograd
* Optimization
* Save & Load Model

# Tensors

Why this instead of numpy ndarrays? Tensors can run on GPUs. They can also be optimized for automatic differentiation.


In [4]:
import torch
import numpy as np

In [5]:
%load_ext watermark

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


## Initializing a tensor

In [6]:
# directly from data
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

In [7]:
type(x_data)

torch.Tensor

In [8]:
# from a numpy array
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

In [9]:
# creating new tensors from another tensor (e.g. maintains shape)
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")


Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 



In [10]:
# a random tensor, values between 0 and 1
x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Random Tensor: 
 tensor([[0.7808, 0.1112],
        [0.0457, 0.5436]]) 



In [11]:
# constant or random values
shape = (2,3,)
zeros_tensor = torch.zeros(shape)
ones_tensor = torch.ones(shape)
rand_tensor = torch.rand(shape)

print(f"Zeros Tensor: \n {zeros_tensor}")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Random Tensor: \n {rand_tensor} \n")


Zeros Tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])
Ones Tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

Random Tensor: 
 tensor([[0.2606, 0.9567, 0.1587],
        [0.6428, 0.4018, 0.4623]]) 



## Attributes

In [12]:
tensor = torch.rand(3,4)
print(tensor)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

tensor([[0.8739, 0.9027, 0.8701, 0.8615],
        [0.4273, 0.7591, 0.5939, 0.4719],
        [0.6431, 0.4888, 0.5075, 0.9832]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


In [13]:
# show some random attributes
from random import sample
sample([attribute for attribute in dir(tensor) if '_' not in attribute], 20)

['expand',
 'conj',
 'add',
 'dsplit',
 'renorm',
 'sin',
 'fix',
 'eig',
 'smm',
 'exp',
 'asinh',
 'mH',
 'atanh',
 'tril',
 'log1p',
 'short',
 'argmax',
 'heaviside',
 'solve',
 'gcd']

In [14]:
tensor.bernoulli()

tensor([[1., 1., 1., 1.],
        [0., 0., 1., 1.],
        [1., 1., 1., 1.]])

## Operations on Tensors

While the tensors can be run on GPU, by default, they're created on the CPU. Operations can be run on CPU and an [Accelerator](https://docs.pytorch.org/docs/stable/torch.html#accelerators), a `torch.device` like [CUDA](https://docs.pytorch.org/docs/stable/cuda.html) or [MPS](https://docs.pytorch.org/docs/stable/mps.html).

NOTE: To use GPU acceleration in a dev container:
- Remote Server with NVIDIA GPU (CUDA): The most common and effective way to get GPU acceleration in a Dev Container is if your Dev Container is connected to a remote server that has an NVIDIA GPU. In this scenario, you would configure the Dev Container to expose that GPU to the container, and your code would use torch.device("cuda"). This is a very popular setup for professional ML development.
- Run locally, outside the container, to use MPS.

In [15]:
# Check if CUDA (NVIDIA GPU) is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
# elif torch.mps.is_available():  # note this fails in a devcontainer run on mac because it's linux
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    print(f"Using MPS: {torch.mps.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("No GPU found, using CPU.")

No GPU found, using CPU.


In [16]:
# We move our tensor to the current accelerator if available
if torch.accelerator.is_available():
    tensor = tensor.to(torch.accelerator.current_accelerator())

In [17]:
tensor

tensor([[0.8739, 0.9027, 0.8701, 0.8615],
        [0.4273, 0.7591, 0.5939, 0.4719],
        [0.6431, 0.4888, 0.5075, 0.9832]])

In [18]:
# standard indexing

print(tensor[1,0])
print(tensor[:, 2])

tensor(0.4273)
tensor([0.8701, 0.5939, 0.5075])


In [19]:
# joining tensors
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[0.8739, 0.9027, 0.8701, 0.8615, 0.8739, 0.9027, 0.8701, 0.8615, 0.8739,
         0.9027, 0.8701, 0.8615],
        [0.4273, 0.7591, 0.5939, 0.4719, 0.4273, 0.7591, 0.5939, 0.4719, 0.4273,
         0.7591, 0.5939, 0.4719],
        [0.6431, 0.4888, 0.5075, 0.9832, 0.6431, 0.4888, 0.5075, 0.9832, 0.6431,
         0.4888, 0.5075, 0.9832]])


Arithmetic operations

In [20]:
# matrix multiplication, 3 options
y1 = tensor @ tensor.T
y1

tensor([[3.0779, 1.9820, 2.2919],
        [1.9820, 1.3342, 1.4112],
        [2.2919, 1.4112, 1.8768]])

In [21]:
y2 = tensor.matmul(tensor.T)
y2

tensor([[3.0779, 1.9820, 2.2919],
        [1.9820, 1.3342, 1.4112],
        [2.2919, 1.4112, 1.8768]])

In [22]:
# randomize in the shape of y1
y3 = torch.rand_like(y1)
y3

tensor([[0.5899, 0.3826, 0.1132],
        [0.7363, 0.6440, 0.9003],
        [0.6095, 0.0242, 0.0951]])

In [23]:
# note the `out`
torch.matmul(tensor, tensor.T, out=y3)

tensor([[3.0779, 1.9820, 2.2919],
        [1.9820, 1.3342, 1.4112],
        [2.2919, 1.4112, 1.8768]])

In [24]:
# y3 gets overwritten
y3

tensor([[3.0779, 1.9820, 2.2919],
        [1.9820, 1.3342, 1.4112],
        [2.2919, 1.4112, 1.8768]])

In [25]:
# y1 == y2  # will fail

# test for content and shape
torch.equal(y1, y2)

True

In [26]:
# == element wise comparison
torch.all(y1 == y2) # analagous to np.all

tensor(True)

In [27]:
# test if all 3 are equal
assert torch.equal(y1, y2) and torch.equal(y2, y3)

In [28]:
# Element-wise product, also 3 options
z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

tensor([[0.7638, 0.8149, 0.7570, 0.7422],
        [0.1826, 0.5763, 0.3527, 0.2226],
        [0.4135, 0.2390, 0.2575, 0.9667]])

In [29]:
assert torch.equal(z1, z2) and torch.equal(z2, z3)

In [30]:
# convert a single element tensor into a python numerical value
tensor.sum()

tensor(8.3830)

In [31]:
print(tensor.sum().item(), type(tensor.sum().item()))

8.383039474487305 <class 'float'>


## Bridge with NumPy

Move to/from numpy array with `.numpy()` and `torch.from_numpy(x)`

In [32]:
# Convert tensor to numpy array
tensor.numpy()

array([[0.87393975, 0.90272087, 0.87007856, 0.8614827 ],
       [0.4273432 , 0.75912356, 0.593877  , 0.4718554 ],
       [0.6430654 , 0.48883915, 0.50748605, 0.9832276 ]], dtype=float32)

In [33]:
# Convert numpy array to tensor
torch.from_numpy(np.random.random((3,4)))

tensor([[0.0788, 0.9382, 0.1134, 0.1648],
        [0.4984, 0.3085, 0.8413, 0.8414],
        [0.9974, 0.4719, 0.7612, 0.4278]], dtype=torch.float64)

In [34]:
%watermark

Last updated: 2025-08-11T04:02:28.797884+00:00

Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.4.0

Compiler    : GCC 12.2.0
OS          : Linux
Release     : 6.10.14-linuxkit
Machine     : aarch64
Processor   : 
CPU cores   : 7
Architecture: 64bit



In [35]:
%watermark -iv

numpy: 2.3.1
torch: 2.7.1

