# Lesson 2.5 — Circle Classification with PyTorch

## Goal

In this lesson, we will learn the **core PyTorch training workflow** by solving a very small
classification problem.

Basically, go through: 

![pytorchio](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-training-loop-annotated.png)

We will focus on:
- building a neural network with `nn.Module`
- defining a loss function
- choosing an optimizer
- writing a training loop
- writing a testing (validation) loop

We deliberately avoid:
- DataLoaders
- custom Dataset classes
- CNNs
- images

This lets us focus on the *ideas* that matter. Everything you see here will reappear later
when we train CNNs.

In [None]:
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split

## 1. Create a Nonlinear Classification Problem

Here, we construct a simple but instructive classification problem using synthetic data. We generate 1000 data points, where each point has two input features: 

$x = (x_1, x_2)$
   
The points are arranged as two concentric circles, as shown in the visualization below, with a small amount of noise added to make the problem more realistic. We then split the dataset into:
- Training data (80%), used to learn the model parameters.
- Test data (20%), used to evaluate how well the model generalizes to unseen data.

The visualization below shows only the training data, colored by classes:
- Class 0 (inner circle) (blue color)
- Class 1 (outer circle) (red color)

This dataset is deliberately chosen because no straight line can separate the two classes. As a result, linear models such as linear regression or logistic regression will struggle. However, even a small neural network can learn a nonlinear decision boundary that separates the two circles.

In [None]:
n_samples = 1000

X, y = make_circles(n_samples=1000,
    noise=0.03,
    random_state=42,
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.RdBu);
plt.gca().set_aspect("equal", adjustable="box")

Choose CPU or GPU if available:

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

Initially was numpy array, as discussed in intro to pytorch, convert to torch tensors + move to device

In [None]:
# Tensors (train/test)
X_train_t = torch.tensor(X_train, dtype=torch.float32, device=device)
y_train_t = torch.tensor(y_train, dtype=torch.long, device=device)

X_test_t  = torch.tensor(X_test,  dtype=torch.float32, device=device)
y_test_t  = torch.tensor(y_test,  dtype=torch.long, device=device)

X_train_t.shape, y_train_t.shape

In [None]:
# check to be sure 
print(X_train_t.device, y_train_t.device)

## 2. Define a simple neural network

Here, we define a simple neural network that learns to decide which class or circle a point belongs to. The network works in three steps:

- **Hidden Layer**: The network examines the input point and generates 8 new values from it. You do not need to worry yet about what these values mean,  they are just intermediate numbers that help the network learn patterns.

- **ReLU activation**: This step adds non-linearity, which allows the network to learn curved boundaries instead of straight lines. This is the key reason neural networks can solve this problem.

- **Output**: The network produces two numbers (logits). These two numbers are raw scores that the model uses to decide between classes, one for each class: inner circle and outer circle. The model predicts the class corresponding to the larger of these two numbers. Suppose the model outputs logits for a point: [ 2.3 , -1.1 ]

  Score for class 0 = 2.3

  Score for class 1 = −1.1

  Since 2.3 is larger, the model predicts the point to be in class 0 (inner circle).

This network is small, but it is powerful enough to separate the two circles.

In [None]:
class CircleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(2, 8),
            nn.ReLU(),
            nn.Linear(8, 2)
        )

    def forward(self, x):
        return self.net(x)

Create the model:

In [None]:
model = CircleNet().to(device)
print(model)

## 3. Loss function and optimizer

In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

- The **loss function** tells the model how wrong it is
- The **optimizer** updates the model parameters using gradients

## 4. Training loop

This loop matches the standard PyTorch training pattern.

An **epoch** means one complete pass through all of the training data. Here we have 800 training points, so in each epoch the model sees all 800 points once. We repeat this process for many epochs (500) so the model can gradually learn, adjusting the neural network's weights with each pass through the data. 

What happens with each epoch during the training loop?

- Forward pass: The model takes the input data and produces raw predictions (called logits) 
- Loss computation: The loss function compares the predictions to the true labels and measures how wrong the model is.
- Zeroing gradients: Gradients from the previous step are cleared so they do not accumulate across iterations.
- Backward pass: PyTorch computes how the loss changes with respect to each model parameter using automatic differentiation.
- Optimizer step: The optimizer updates the model parameters in the direction that reduces the loss.

We compute train accuracy from the same logits we just used for the loss, but test accuracy needs a separate forward pass because we never train on the test set.

In [None]:
epochs = 500
train_loss_history = []
test_loss_history  = []
train_acc_history  = []
test_acc_history   = []

for epoch in range(epochs):
    # Put model in training mode
    model.train()

    # 1. Forward pass (TRAIN)
    train_logits = model(X_train_t)  # logits = raw model outputs (one score per class)

    # 2. Compute loss (TRAIN)
    train_loss = loss_fn(train_logits, y_train_t)

    # 3. Reset the gradients to zero
    optimizer.zero_grad()

    # 4. Backward pass
    train_loss.backward()

    # 5. Optimizer step (gradient descent update)
    optimizer.step()

    # Save TRAIN metrics
    train_loss_history.append(train_loss.item())
    train_acc = accuracy_from_logits(train_logits, y_train_t)
    train_acc_history.append(train_acc)

    # Evaluate on TEST data (no gradients)
    model.eval()
    with torch.inference_mode():
        # 6. Forward pass (TEST)
        test_logits = model(X_test_t)

        # 7. Compute loss (TEST)
        test_loss = loss_fn(test_logits, y_test_t)

        # Save TEST metrics
        test_loss_history.append(test_loss.item())
        test_acc = accuracy_from_logits(test_logits, y_test_t)
        test_acc_history.append(test_acc)

    if epoch % 50 == 0:
        print(
            f"Epoch {epoch:4d} | "
            f"Train loss: {train_loss.item():.4f} | Train acc: {train_acc:.3f} | "
            f"Test loss: {test_loss.item():.4f} | Test acc: {test_acc:.3f}"
        )

print("Done training")


In [None]:
plt.figure(figsize=(5, 3))
plt.plot(train_loss_history, color='b', label='train')
plt.plot(test_loss_history, color='r', label='test')
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Loss Over Time")
plt.grid(True)
plt.legend()

Todo: discuss test/train loss functions -- what would loss function look like on test data if we had overfitting?

The plot above shows training and test loss decreasing together, which indicates that the model is learning meaningful patterns and generalizing well to unseen data.

If the model were overfitting, we would expect the training loss to continue decreasing while the test loss would stop improving and begin to increase, as shown in the illustrative plot below. This would indicate that the model is memorizing the training data rather than learning general patterns.

In [None]:
# Illustrative example of overfitting (not from the model)
epochs = len(train_loss_history)

fake_train_loss = train_loss_history
fake_test_loss = [
    l if i < epochs // 3 else l + 0.002 * (i - epochs // 3)
    for i, l in enumerate(train_loss_history)
]

plt.figure(figsize=(5, 3))
plt.plot(fake_train_loss, label="train (illustrative)")
plt.plot(fake_test_loss, label="test (illustrative overfitting)")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Illustrative Example of Overfitting (not from the current model)")
plt.grid(True)
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(5, 3))
plt.plot(train_acc_history, label='train')
plt.plot(test_acc_history, label='test')
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.title("Accuracy Over Time")
plt.grid(True)
plt.legend()

The plot shows training accuracy and test accuracy as a function of training epochs.

- At the beginning, both accuracies are close to 0.5, which is what we expect from random guessing in a two-class problem.

- As training progresses, both curves rise rapidly.

- After roughly 200 epochs, both training and test accuracy approach 1.0.

The close alignment of training and test accuracy near 1.0 shows that the network has learned a correct and generalizable decision boundary for the circle classification problem.

---

## 5. Inspect predictions 

In this section, we examine a few randomly selected test samples to better understand how the trained model makes predictions. We randomly choose five points from the test set and compare:

- The predicted class produced by the model

- The true class label for each point

This helps us verify that the model’s predictions align with the ground-truth labels on unseen data.

In [None]:
# Generate predictions on the test set
model.eval()
with torch.inference_mode():
    test_logits = model(X_test_t)
    predictions = torch.argmax(test_logits, dim=1).cpu()

In [None]:
idx = torch.randperm(len(predictions))[:5]

print("\nRandom indices:", idx.tolist())

print("\nPredicted classes:")
print(predictions[idx])

print("\nTrue classes:")
print(y_test[idx])

Inspecting individual predictions confirms model's high test accuracy.

## 6 Decision boundaries
Let's look at learned decision boundaries. In the next cell, we visualize what the trained network has learned by evaluating it across a grid of possible input values. By asking the model to predict a class at every point in this input space, we can draw boundaries that show which regions lead to different output categories. 

Let's plot decision boundary and plot data for Test and Train with different marker.

In [None]:
# Create a grid of points (CPU tensors are fine here; we'll move the stacked grid to GPU)
x_min, x_max = -1.2, 1.2
y_min, y_max = -1.2, 1.2

xx, yy = torch.meshgrid(
    torch.linspace(x_min, x_max, 200),
    torch.linspace(y_min, y_max, 200),
    indexing="ij"
)

grid = torch.stack([xx.flatten(), yy.flatten()], dim=1).to(device)

# Run model on grid (GPU)
model.eval()
with torch.inference_mode():
    logits = model(grid)
    preds = torch.argmax(logits, dim=1)

Z = preds.reshape(xx.shape).cpu().numpy()

In [None]:
# Plot decision regions + data in subplots
fig, axes = plt.subplots(1, 2, figsize=(8, 4), constrained_layout=True)

# Left: TRAIN
axes[0].contourf(xx.numpy(), yy.numpy(), Z, cmap="coolwarm", alpha=0.6)
axes[0].scatter(
    X_train[:, 0],
    X_train[:, 1],
    c=y_train,
    cmap="coolwarm",
    edgecolors="k",
    s=30
)
axes[0].set_title("Train")
axes[0].set_xlabel("Input A")
axes[0].set_ylabel("Input B")
axes[0].set_aspect("equal", adjustable="box")
axes[0].set_xlim(x_min, x_max)
axes[0].set_ylim(y_min, y_max)

# Right: TEST
axes[1].contourf(xx.numpy(), yy.numpy(), Z, cmap="coolwarm", alpha=0.6)
axes[1].scatter(
    X_test[:, 0],
    X_test[:, 1],
    c=y_test,
    cmap="coolwarm",
    edgecolors="k",
    s=30
)
axes[1].set_title("Test")
axes[1].set_xlabel("Input A")
axes[1].set_ylabel("Input B")
axes[1].set_aspect("equal", adjustable="box")
axes[1].set_xlim(x_min, x_max)
axes[1].set_ylim(y_min, y_max)

plt.show()

- The plots above show the learned decision boundary for both the training data (left) and the test data (right). The background color represents the class predicted by the model, while the points show the true data samples.

- We observe that the same smooth decision boundary correctly separates the inner and outer circles in both cases. The model does not simply memorize the training data, instead, it learns a general rule that applies to unseen test points.

This behavior indicates good generalization and a lack of overfitting. It is consistent with the training and test accuracy values, which are both close to 1.0, and with the similar training and test loss curves observed earlier.


## 7. Mapping to the PyTorch workflow

This lesson covers **Steps 1–4** of the PyTorch workflow:

1. Get data ready (tensors)
2. Build a model
3. Fit the model (training loop)
4. Evaluate the model (testing loop)

CNNs follow the **exact same structure** — they just use images instead of numbers.

## 8. Summary

If you understand this lesson, you understand the *core* of PyTorch training.

Everything we do next (CNN inference, transfer learning, larger datasets) builds directly
on this pattern.