
# PyTorch Introduction 

**Pre Reqs:** basic Python/NumPy  
**Goal:** Leave ready to read PyTorch code, build/train simple models, and debug common issues.

**What you'll learn**
1. Tensors & vectorization (device, dtype, shapes, broadcasting)
2. Autograd: building and differentiating computation graphs
3. `nn.Module`, losses, and optimizers
4. Input pipelines with `Dataset` / `DataLoader`
5. Canonical training & evaluation loops (+ checkpoints)
6. Mini project (MNIST)
7. Performance tips: `torch.compile`, mixed precision
8. Common gotchas and debugging patterns


## 1) Framing & Setup

In [2]:

# Environment & reproducibility
import sys, math, time, random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

print("Python:", sys.version.split()[0])
print("PyTorch:", torch.__version__)

#the following determines if we use the GPU or CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device in use:", device)

#for reproducibility, set all random seeds to a fixed value
torch.manual_seed(123)
random.seed(123)


Python: 3.11.5
PyTorch: 2.2.2+cu121
Device in use: cpu


### <mark>Why Pytorch instead of Numpy?

<mark>It can use a GPU as well as a CPU<br>
It can compute gradients automatically (autograd)<br>
It has a lot of useful functions for deep learning (e.g. layers, loss fns, optimizers, etc.)<br>

## 2) Pytorch — Where are things?

![](./pytorch_whereisit.png)<br><br>


## 3) Tensors, Vectorization, Broadcasting

![](./tensors.png)<br><br>

Key ideas: **device**, **dtype**, **shapes**, **broadcasting**, and avoiding Python loops.<br>
<mark>See pytorch_broadcasting.ipynb for further broadcasting details.


In [33]:

# Create tensors directly on the chosen device
x = torch.arange(12, dtype=torch.float32, device=device).view(3, 4)  #could also use .reshape(3,4)
w = torch.randn(4, 2, device=device)
b = torch.zeros(2, device=device)
print(b.shape)
# stopped here 9/8

y = x @ w + b  # broadcast bias
print(f"shapes: x.shape={x.shape}, w.shape={w.shape}, y.shape={y.shape}, b.shape={b.shape}")
print(y[:2])


torch.Size([2])
shapes: x.shape=torch.Size([3, 4]), w.shape=torch.Size([4, 2]), y.shape=torch.Size([3, 2]), b.shape=torch.Size([2])
tensor([[ 3.1669, -2.8220],
        [ 8.9699, -3.8728]])


In [35]:
# Broadcasting Demo 
a = torch.tensor([[1, 2, 3], [4, 5, 6]])  # Shape: (2, 3)
b = torch.tensor([10, 20, 30])           # Shape: (3,)
print(a)
print(b)
# Compare shapes, starting from the last dimension
#  - Dimension 1: 3 and 3 are equal. Compatible.
#  - Dimension 0: 2 and (implicitly) 1. Compatible.
# Essentially, b is treated as if it were [[10, 20, 30], [10, 20, 30]]
# The resulting shape is (max(2,1), max(3,3)) -> (2, 3).
c = a + b

print(c)
# Output:
# tensor([[11, 22, 33],
#         [14, 25, 36]])


tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([10, 20, 30])
tensor([[11, 22, 33],
        [14, 25, 36]])



**Notes**
- Prefer constructing on the right device (`device=device`) or use `.to(device)`.
- `view` vs `reshape`: `view` requires contiguous memory; `reshape` is safer BUT may copy.
- <mark>Use pytorch vectorized operations; avoid explicit Python loops for math on tensors.


## 4) Autograd (automatic differentiation)

<mark>Just to show it works, pytorch handles backpropagation automatically for you.

In [None]:
# A scalar function and its gradient
x = torch.tensor([2.0], requires_grad=True, device=device)  #need totrack x gradient
y = x**2 + 3*x + 1            # dy/dx = 2x + 3 = 7 when x=2
y.backward()
print("x.grad:", x.grad)
print("y.requires_grad:", y.requires_grad)

# Grads accumulate: zero them if reusing tensors (you are responsible for this)
x.grad.zero_()
print("x.grad:", x.grad)
y2 = (x * 5 + 1)
y2.backward()
print("x.grad after second backward:", x.grad)

# Detach and no_grad
x2 = (x.detach() * 10)        # breaks graph
print("x2.requires_grad:", x2.requires_grad)
y2 = (x2 * 5 + 1)
# y2.backward()            # throws exception

#when doing inference or when validating, we don't need (or want) gradients.(per parameter float the model has to track)
with torch.no_grad():
    z = x * 7 + 1               # won't track gradients
    print("z.requires_grad:", z.requires_grad)
    # z.backward()                # throws exception
print("no_grad result:", z.item())

x.grad: tensor([7.])
y.requires_grad: True
x.grad: tensor([0.])
x.grad after second backward: tensor([5.])
x2.requires_grad: False
z.requires_grad: False
no_grad result: 15.0



**Pitfalls**
- <mark>Gradients **accumulate**; be sure to zero them between steps.
- Wrap evaluation in `torch.no_grad()` to save memory/compute.


## 5) Input Pipelines: `Dataset` / `DataLoader`

Pytorch has 2 classes to manage data.<br><br>
**Dataset class:** A wrapper that enables iterating over a dataset. It has 2 methods __len__, to get the total number of items in a dataset, and __getitem__, which gets an item at a particular index<br><br>
**DataLoader class:** Wraps an Iterable around a DataSet.  Generates batches of data from the Dataset at a time.  Can be configured to draw batches sequentially or randomly sample from dataset (keeps model from memorizing training order<br><br>

Pytorch has many built in DataSets types so often you can use them instead of rolling your own.  We will see some of them throughout the course.<br>
For a more in depth look see <a href="https://docs.pytorch.org/tutorials/beginner/basics/data_tutorial.html">Datasets & DataLoaders</a><br>

Below is an example of a custom Dataset and DataLoader.


In [27]:
def make_y_equals_x1_times_x2(n,d, x_range=3.0):
    x = torch.empty(n, 2).uniform_(-x_range, x_range)    
    y = x[:, 0] * x[:, 1]
    return x, y 

# A tiny synthetic dataset
class ToyDataset(Dataset):
    def __init__(self, n=1024, d=2, x_range=3.0):
        super().__init__() # Optional, but good practice
        self.x,self.y = make_y_equals_x1_times_x2(n,d,x_range)

    def __len__(self): return len(self.x)
    
    def __getitem__(self, idx): return self.x[idx], self.y[idx]

train_ds = ToyDataset(2048, d=2)
test_ds  = ToyDataset(512, d=2)


### Or just use the built in TensorDataset and not bother with writing your own

In [28]:
from torch.utils.data import TensorDataset
x,y = make_y_equals_x1_times_x2(2048,2,3)
train_ds = TensorDataset(x,y)                   #can make this a oneliner TensorDataset(*make_y_equals_x1_times_x2(n,d,x_range))

### Now wrap the DataSet with a DataLoader and your in

In [29]:
train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)    #randomly select for each batch
test_loader  = DataLoader(test_ds,  batch_size=256, shuffle=False)  #sequentially select for each batch

xb, yb = next(iter(train_loader))   # a single batch
print("Batch shapes train:", xb.shape, yb.shape, "| dtypes:", xb.dtype, yb.dtype)

Batch shapes train: torch.Size([64, 2]) torch.Size([64]) | dtypes: torch.float32 torch.float32


## 6) `nn.Module`, Losses, Optimizers

### see mlp_x1x2_regression.ipynb for simple MLP predicting a number (like our Value class MLP)
### see mlp_classification.ipynb for simple classification on MNIST dataset