<a href="https://www.kaggle.com/kernels/welcome?src=https://github.com/Code-the-Dream-School/python-200/blob/main/lessons/04_ML_deep_learning/resources/lesson_circle_classification_pytorch.ipynb" target="_blank">
  <img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open In Kaggle" />
</a>

# Lesson 3 — Circle Classification with PyTorch

## 1. Introduction

Last week, you worked on **classification with scikit-learn**. You trained models like K-Nearest Neighbors (KNN), used metrics like accuracy ( precision/recall), and saw the basic idea of a classifier in action: the model takes inputs and assigns them to a category. That foundation is exactly what we build on here. In this notebook, we are still doing classification, given an input point, we want the model to predict which of two classes it belongs to.

What changes in this lesson is not the goal, but the tool we use to reach it. Scikit-learn is excellent for classical machine learning and gives you strong classifiers out of the box, but it is not designed for training neural networks layer-by-layer. For **deep learning**, we use a framework built for that job. In this lesson, that framework is **PyTorch**. PyTorch lets us define a neural network clearly, train it efficiently, and most importantly see the learning process step by step instead of treating the model like a black box.

Before we begin, you’ll see a diagram that summarizes the “training story” we are about to run: data goes into the model, the model makes a prediction, we measure how wrong it was, and then PyTorch updates the model so the next prediction is a little better. If that picture feels overwhelming right now, that’s completely normal. You are not expected to understand it yet. The point of this notebook is that by the end, that diagram will feel like a familiar workflow you can recognize and reuse.

![pytorchio](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-training-loop-annotated.png)
Image adapted from learnpytorch.io - https://www.learnpytorch.io/01_pytorch_workflow/

In this notebook, we will focus on the core building blocks of training a neural network in PyTorch. 
- We will define a neural network using nn.Module,
- Decide how the network should measure its mistakes using a loss function,
- Choose an optimizer that helps the network improve, and
- Write the training and testing loops that allow learning to happen.
- These steps form the foundation of nearly every PyTorch model, no matter how simple or how advanced.

In this lesson, we focus on the core PyTorch workflow for training a neural network: 
- How data flows through a network,
- How errors are measured, and
- How feedback gradually improves the model.



## 2. Importing the Tools.

Here’s what each one does:

- make_circles → creates the circle dataset for us

- train_test_split → splits the data into training + testing

- torch + torch.nn → PyTorch + neural network building blocks

- matplotlib → plotting and visualization

In [None]:
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split

## 3. Create a Concentric Circle Classification Problem

To make this introduction as visual and intuitive as possible, we will use a dataset that is easy to understand. Each example is just a point on a 2D plot, meaning it has two inputs: its x-coordinate and its y-coordinate. Each point belongs to one of two classes. One class forms a small circle in the center, and the other forms a larger ring around it. Instead of calling a ready-made classifier, we will build a neural network ourselves and train it in PyTorch so you can see exactly how learning happens.

Below is the problem we are trying to solve:

![Classification_Problem](./01_Non_Linear_Circle_Classification_Problem.JPG)

This image shows the visual illustration of the dataset. The blue points in the center belong to **Class 0**, which we call the inner circle. The red points around the outside belong to **Class 1**, the outer circle. The grey-shaded region represents the space where a decision boundary could exist, and our task is to learn a boundary that separates these two classes.

Conceptually, the task is simple: given a point on this plot, the model must decide whether it belongs to the inner circle or the outer circle. While straightforward, the circular structure makes the problem visually interesting.

Each data point consists of two numbers, the x- and y-coordinates, which together describe the position of a point on a flat 2D plane. We generate 1000 such points and arrange them into two concentric circles. Each point belongs to exactly one of the two classes. To make the problem more realistic, we add a small amount of noise so the circles are not perfectly clean. This prevents the network from simply memorizing an ideal shape and instead encourages it to learn a general rule that works even when the data is slightly messy.

Once the data is created, we split it into two parts. 
- Most of the points (80 %) are used for **training**, which means the network will see them repeatedly and adjust itself based on the errors it makes.
- A smaller portion (20 %) is set aside as **test data**. These points are never shown to the network during training. They are used only to check whether the network has learned a general rule or whether it has simply memorized the training points.

This separation mirrors how machine learning is used in practice: models must perform well on data they have never seen before.

Finally, we plot the training data to make the problem visible. This simple dataset will serve as a concrete foundation for everything that follows as we build, train, and evaluate a neural network in PyTorch.

In [None]:
n_samples = 1000

X, y = make_circles(n_samples=1000,
    noise=0.03,
    random_state=42,
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.RdBu);
plt.gca().set_aspect("equal", adjustable="box")

We check where the computation will happen. PyTorch can run on the CPU or GPU both. The first line checks whether a GPU is available and chooses it automatically; otherwise, it safely falls back to the CPU. We want to run this lesson only on GPU, which will showup here if you selected GPU before starting this notebook.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

Before we can train a neural network in PyTorch, we need to make sure our data is in the right form and in the right place. Earlier in the lesson, our data lived as NumPy arrays, which are great for exploration and plotting but cannot be used directly by PyTorch’s training machinery. PyTorch models work with tensors, so this step converts our training and test data from NumPy arrays into PyTorch tensors.

Next, we convert each dataset split into tensors with the correct data types. The input features are converted to floating-point tensors because the network performs continuous numerical computations on them. The class labels are converted to integer tensors because they represent category IDs, not numbers to be averaged. Both inputs and labels are then moved onto the selected device so that all computations happen in the same place.

Finally, we quickly check the shapes and devices of the tensors. Seeing shapes like (800, 2) confirms that we have 800 training points, each with two input values. Printing the device confirms that everything is ready for training. At this point, the data is fully prepared and compatible with the network, the loss function, and the optimizer—so we can safely begin the learning process.

In [None]:
# Tensors (train/test)
X_train_t = torch.tensor(X_train, dtype=torch.float32, device=device)
y_train_t = torch.tensor(y_train, dtype=torch.long, device=device)

X_test_t  = torch.tensor(X_test,  dtype=torch.float32, device=device)
y_test_t  = torch.tensor(y_test,  dtype=torch.long, device=device)

X_train_t.shape, y_train_t.shape

In [None]:
# check to be sure 
print(X_train_t.device, y_train_t.device)

## 4. Define a simple neural network

Now that we understand the classification problem we want to solve, we are ready to define the neural network itself. This is where we move from diagrams and intuition into actual PyTorch code. Everything we discussed conceptually in the introduction to neural networks—layers, activations, and learning from data—will now be expressed in just a few lines of Python.

The diagram below shows the **architecture of the neural network** we will build in this lesson. It focuses on structure rather than math: how inputs enter the network, how they move through layers, and what the network produces as output. You do not need to understand every detail yet. As we walk through the code, each part of this diagram will map directly to what we write in PyTorch.

![Network_Architecture](./02_Network_Architecture.JPG)

In PyTorch, every neural network is defined as a Python class that subclasses **nn.Module**. This is one of the most important ideas in PyTorch. By inheriting from nn.Module, we are telling PyTorch that this object has learnable parameters and knows how to turn inputs into outputs. Once a model is defined this way, PyTorch can automatically track its parameters, compute gradients, and update them during training.

Inside this class, we define the structure of the network using **nn.Sequential**. You can think of nn.Sequential as a simple container that stacks layers in order. Data flows through these layers one at a time, exactly as shown from left to right in the diagram. This makes the network easy to read and easy to reason about, especially when you are just getting started.

The main building block inside this stack is **nn.Linear**. A linear layer is a fully connected layer, meaning every input connects to every neuron in the next layer. It performs a weighted combination of its inputs and adds a bias term. These layers are extremely common in neural networks and are often used at the beginning or end of a model to transform data from one size to another.

Our network takes two input values—the x and y coordinates of a point—and passes them through a hidden layer with eight neurons. After this hidden layer, we apply **ReLU**, which is an **activation function**. Activation functions allow neural networks to model complex patterns by introducing non-linear behavior. ReLU is one of the most commonly used activation functions because it is simple and effective. If you would like a refresher on how activation functions work or how they differ, you can revisit the activation function discussion from the introduction to deep learning lesson or explore PyTorch’s activation function documentation.

The final layer of the network produces two output values, one for each class. These outputs are called **logits**, as shown on the right side of the diagram. A logit is a raw confidence score, not a probability. The term comes from logistic regression, which you studied earlier. In both cases, higher logits indicate greater confidence in a class, while lower logits indicate less confidence. In neural networks, we work directly with these raw scores and let PyTorch handle the conversion to probabilities later when needed.

When we pass input data into the model, PyTorch performs what is called a **forward pass**. This simply means the data flows through the layers we defined to produce logits. We describe this flow inside the forward() method of the class. Importantly, we do not need to write any code for how the model learns from its mistakes. As long as the model is built using nn.Module, PyTorch automatically handles gradient computation and parameter updates during training.

What makes this powerful is how much happens behind the scenes with so little code. In just a few lines, we have built a fully functional neural network capable of learning complex decision boundaries. PyTorch handles the low-level details so we can focus on understanding and experimenting with the model.

The code below is a manifestation of whatever we learned above in Python:

In [None]:
class CircleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(2, 8),
            nn.ReLU(),
            nn.Linear(8, 2)
        )

    def forward(self, x):
        return self.net(x)

Create the model:

In [None]:
model = CircleNet().to(device)
print(model)

## 5. Loss function and Optimizer

Now that we have defined a neural network, we need to answer a crucial question: how does the network know whether it is doing a good job or a bad job? A neural network can make predictions, but by itself it has no idea if those predictions are correct. To learn, it needs a way to measure its mistakes.

This is the role of the **loss function**. The loss function takes the model’s output and the true labels and produces a single number that answers the question: “How wrong was the model?” A larger loss means worse performance; a smaller loss means better performance. Learning is simply the process of trying to make this number smaller over time.

In [None]:
loss_fn = nn.CrossEntropyLoss()

**Cross-Entropy Loss**

In this lesson, our network outputs two logits for each data point, one score for each class. These logits are not probabilities; they are raw confidence scores. The question then becomes: how do we turn those two scores into a single number that tells the model how wrong it is?

This is the first time you are seeing a **loss function for classification**. In earlier lessons, you evaluated classifiers using accuracy, precision, and recall. Those tell you how good a trained model is. A loss function plays a different role: it tells the model how wrong it is during training, so it knows how to adjust its internal parameters.

Cross-entropy loss is designed specifically for classification problems. It works naturally with one-hot encoded labels, which you have seen before. The network produces a score for each class, and cross-entropy compares those scores to the correct class. If the model assigns a high score to the correct class, the loss is small. If it assigns a high score to the wrong class, the loss becomes large.

You do not need to understand the full formula to use cross-entropy effectively. Conceptually, it is a smart way of asking: *Did the model strongly prefer the correct class, or the wrong one?* Without a loss function like this, there would be no signal telling the network how to improve.

If you want a very clear, intuitive explanation of cross-entropy loss without unnecessary math, this video does an excellent job: https://www.youtube.com/watch?v=wTTYHM_DMxw

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

**The Optimizer: Gradient Descent in Practice**

Once we can measure how wrong the model is, the next question becomes: how do we fix it? This is the job of the optimizer. The **optimizer** is the component that actually changes the model’s internal parameters. It uses information from the loss to decide how each parameter should be adjusted in order to reduce future errors. In other words, the optimizer is what turns error measurements into learning.

In this lesson, we use the **Adam optimizer**, which is a popular and effective variant of gradient descent. Gradient descent is the idea of taking small steps downhill on the loss surface, each step slightly improving the model. Adam handles the details of step size and direction for us, making training more stable and efficient. It is important to recognize this division of responsibility: The loss function decides *how wrong the model is*. The optimizer decides *how to fix it*.

At this point, we have everything needed for learning to happen:
- We have built the machine (the network architecture)
- Defined how mistakes are measured (the loss function)
- Chosen how the machine improves itself (the optimizer)

The only thing left is to run this process repeatedly in a **training loop**, feeding data forward, measuring error, and letting PyTorch push improvements backward until the network learns the curved decision boundary in the data.



## 6. Training loop (fit the model)


Now that everything is set up, this training loop is where learning actually happens. Up to this point, we have only prepared the pieces: the network architecture, the loss function (the judge), and the optimizer (the mechanic). This loop is the part where we repeatedly run the full training process: the network makes a guess, the judge scores it, and the mechanic makes a small adjustment so the next guess is slightly better.

Before we go further, one quick note: this is the most “code-dense” section of the notebook. Go ahead and run the code first, and then come back and read the explanation while it runs. You don’t have to understand every detail on the first pass; what matters right now is recognizing the overall rhythm of training.

At a high level, what we’re about to do is repeat the same workflow many times. We decide how long to train using epochs = 500. An epoch means one complete pass through the entire training dataset. In this notebook, training for 500 epochs means the network sees the full training data 500 times. We also record loss and accuracy as we go, because later we will plot them and actually see learning happening, not just assume it. It's part of the same PyTorch workflow that we saw earlier:

![pytorchio](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01-pytorch-training-loop-annotated.png)
Image adapted from learnpytorch.io - https://www.learnpytorch.io/01_pytorch_workflow/

This loop is really two loops glued together: a training part, where the model is allowed to improve, and a testing part, where we measure progress honestly on data the model never trained on. We keep both because training numbers can look great even when a model is not truly learning the underlying pattern. Testing is how we check that the network is learning something real.

## 6.1 Training loop (learning mode)

In the training part, we put the model into training mode and run a forward pass on the training data to produce logits. The loss function then compares those logits to the correct labels and produces one number: how wrong the model is overall on this round. That loss becomes the signal PyTorch uses to compute how the model should change. After gradients are computed, the optimizer applies a small update. Nothing magical happens in one step—training is simply many small improvements stacked on top of each other until the network starts drawing the right kind of curved boundary.

## 6.2 Testing loop (evaluation mode)

Right after training for that epoch, we switch into evaluation mode and run the model on the test set. This part looks similar—forward pass, loss, accuracy—but with one key difference: no learning happens here. The model is not allowed to adjust itself using test data. This makes the test results a fair measure of generalization: can the network correctly classify points it did not get to practice on?

Every 50 epochs, we print a short progress report so we can watch the story unfold: ideally, loss drifts downward and accuracy climbs upward on both training and test sets. When the loop finishes, we print "Done training" to mark the end of the learning process.

In [None]:
def accuracy_from_logits(logits, y_true):
    preds = torch.argmax(logits, dim=1)
    return (preds == y_true).float().mean().item()


In [None]:
epochs = 500
train_loss_history = []
test_loss_history  = []
train_acc_history  = []
test_acc_history   = []

for epoch in range(epochs):
    # Put model in training mode
    model.train()

    # 1. Forward pass (TRAIN)
    train_logits = model(X_train_t)  # logits = raw model outputs (one score per class)

    # 2. Compute loss (TRAIN)
    train_loss = loss_fn(train_logits, y_train_t)

    # 3. Reset the gradients to zero
    optimizer.zero_grad()

    # 4. Backward pass
    train_loss.backward()

    # 5. Optimizer step (gradient descent update)
    optimizer.step()

    # Save TRAIN metrics
    train_loss_history.append(train_loss.item())
    train_acc = accuracy_from_logits(train_logits, y_train_t)
    train_acc_history.append(train_acc)

    # Evaluate on TEST data (no gradients)
    model.eval()
    with torch.inference_mode():
        # 6. Forward pass (TEST)
        test_logits = model(X_test_t)

        # 7. Compute loss (TEST)
        test_loss = loss_fn(test_logits, y_test_t)

        # Save TEST metrics
        test_loss_history.append(test_loss.item())
        test_acc = accuracy_from_logits(test_logits, y_test_t)
        test_acc_history.append(test_acc)

    if epoch % 50 == 0:
        print(
            f"Epoch {epoch:4d} | "
            f"Train loss: {train_loss.item():.4f} | Train acc: {train_acc:.3f} | "
            f"Test loss: {test_loss.item():.4f} | Test acc: {test_acc:.3f}"
        )

print("Done training")



**What we just did**

If you step back, you can summarize the entire loop in one sentence: **we ran the same learning cycle hundreds of times until the network improved**. Each epoch followed the same rhythm: the model made predictions on training data, we measured how wrong they were, PyTorch calculated how the model should change, and the optimizer applied a small update. That repeated practice is what turns a random network into one that captures the circular pattern.

Just as importantly, we also evaluated on the test set every epoch. This prevented us from fooling ourselves with training-only results. When training and test metrics improve together, it is a strong sign the model is learning a general rule, not just memorizing particular points. In the next sections, we’ll make this learning visible by plotting the loss and accuracy curves. Instead of guessing whether training worked, we’ll be able to see the judge’s score improving and the model’s predictions becoming more correct over time.

First, let's plot the loss over epochs:

In [None]:
plt.figure(figsize=(5, 3))
plt.plot(train_loss_history, color='b', label='train')
plt.plot(test_loss_history, color='r', label='test')
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Loss Over Time")
plt.grid(True)
plt.legend();

Now that the training loop has finished, this plot helps us visualize how the learning happens instead of just trusting the code. The horizontal axis shows the training epochs, which you can think of as learning rounds. Each step to the right means the network has seen the full training dataset one more time. The vertical axis shows **the loss, which is a measure of how wrong the network’s predictions are**. Lower values mean better predictions.

The **blue curve shows the training loss**, which tells us how well the network is doing on the data it is learning from. At the very beginning, the training loss is high. This makes sense because the network starts with random internal tunings and has no understanding of the circular pattern in the data. Its early guesses are mostly wrong, so the error is large.

As training progresses, the blue line drops steadily. This tells us that the forward-and-backward process is working. Each time data flows forward through the network, the error is measured, and feedback flows backward to make small improvements. Over many epochs, those small changes accumulate, and the network becomes better at separating the inner circle from the outer ring. The smooth downward shape of the curve shows that learning is stable and consistent rather than chaotic.

The **red curve shows the test loss**, which measures how well the network performs on data it has never seen during training. This curve is especially important because it tells us whether the network is learning real patterns or just memorizing the training data. In this plot, the test loss closely follows the training loss and also decreases steadily over time. This is a very good sign. It means the network is not just improving on the training set, but is also learning a general rule that applies to new points.

Notice that the two curves stay close together instead of drifting far apart. If the training loss kept going down while the test loss started going up, that would indicate **overfitting**, an illustrative example of overfitting is shown below. That is not happening here. Instead, both curves move downward together, which tells us the network has found a decision boundary that generalizes well to unseen or test data.

Toward the later epochs, both curves begin to flatten out. This indicates that learning is slowing down. The network has already discovered most of what it can about the data, and further improvements become smaller and smaller. This is exactly what we expect when a model is nearing its best possible performance for a given architecture and dataset.

In simple terms,
- this plot confirms that the neural network has successfully learned the curved structure in the data.
- The repeated cycle of forward propagation, error measurement, and back propagation has shaped the network’s internal behavior so that it can reliably separate the two classes.
- What started as random guessing has turned into a stable, accurate decision-making system, and this curve is the visual proof of that learning process.

In [None]:
# Illustrative example of overfitting (not from the model)
epochs = len(train_loss_history)

fake_train_loss = train_loss_history
fake_test_loss = [
    l if i < epochs // 3 else l + 0.002 * (i - epochs // 3)
    for i, l in enumerate(train_loss_history)
]

plt.figure(figsize=(5, 3))
plt.plot(fake_train_loss, label="train (illustrative)")
plt.plot(fake_test_loss, label="test (illustrative overfitting)")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Illustrative Example of Overfitting (not from the current model)")
plt.grid(True)
plt.legend()
plt.show()

Now, lets plot accuracy:

In [None]:
plt.figure(figsize=(5, 3))
plt.plot(train_acc_history, label='train')
plt.plot(test_acc_history, label='test')
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.title("Accuracy Over Time")
plt.grid(True)
plt.legend();

This accuracy plot gives us a more intuitive view of the same learning process we saw in the loss curve. While loss tells us how wrong the network’s predictions are, **accuracy tells us how often the network gets the answer right**. The horizontal axis again represents training epochs, and the vertical axis shows the fraction of points classified correctly.

At the beginning of training, both the training and test accuracy are close to 50%. This is exactly what we expect. With two classes, a network that has not learned anything yet is essentially guessing, and random guessing gives correct answers about half the time. This matches what we saw earlier in the loss plot, where the loss was high at the start.

Very quickly, the accuracy rises steeply. This tells us that the network is learning the basic structure of the problem early on. After just a few epochs, it figures out that the data is not random and that there is a meaningful boundary separating the two classes. This rapid improvement corresponds to the steep drop in loss during the early part of training.

As training continues, the accuracy curves for both training and test data climb toward 100%. This means the network is correctly classifying almost every point, including points it has never seen before. The fact that the test accuracy closely follows the training accuracy is especially important. It confirms what the loss plot already suggested: the network is not simply memorizing the training data, but is learning a general rule that applies to new inputs.

Eventually, both curves flatten out near the top. This tells us that the network has essentially learned everything it can from this dataset. Additional training does not improve accuracy much because the network is already making nearly perfect predictions. This flattening matches the leveling-off behavior we saw in the loss plot and reinforces the idea that learning has stabilized.

Taken together, the loss and accuracy plots tell a consistent story. 
- The loss steadily decreases, meaning predictions become more confident and precise
- While the accuracy steadily increases, meaning more predictions are correct
- Both plots confirm that the neural network has successfully learned the curved decision boundary in the circles dataset and can generalize that knowledge to unseen data


## 7. Inspect predictions 

At this point, we know from the loss and accuracy plots that the network is performing very well overall. But those plots summarize performance across many points at once. In this section, we slow things down and look at a few individual predictions to make the learning feel more concrete. Instead of thinking about averages and curves, we ask a simple question: **when the network sees a specific point, does it make the correct decision?**

We send the test data through the trained network one more time. The network produces **logits**, which are **raw scores** for each class. A logit is **not a probability**, it doesn’t have to be between 0 and 1, and the two scores don’t have to add up to 1. For now, we keep it simple: to turn logits into a prediction, we choose the class with the **largest logit**. Later, we will often convert logits into probabilities using a function called **softmax**, because probabilities are easier to interpret (values between 0 and 1 that add up to 1). 

Rather than inspecting every test point, we randomly choose five examples. Random selection is important because it avoids cherry-picking “easy” cases and gives us a more honest snapshot of how the model behaves. By printing the indices of the selected points, we make it clear exactly which data points we are examining.

For those five points, we print two things side by side: the predicted classes and the true classes. The predicted classes show what the network decided after learning from the data. The true classes show the correct answers from the dataset. When these two lists match, it means the network made the correct decision for that point.

In [None]:
# Generate predictions on the test set
model.eval()
with torch.inference_mode():
    test_logits = model(X_test_t)
    predictions = torch.argmax(test_logits, dim=1).cpu()

In [None]:
idx = torch.randperm(len(predictions))[:5]
idx_np = idx.cpu().numpy()  # NumPy-compatible indices for y_test (which is a NumPy array)

print("\nRandom indices:", idx.tolist())

print("\nPredicted classes:")
print(predictions[idx])

print("\nTrue classes:")
print(y_test[idx_np])


After this code runs, compare the two printed lists:

- **Predicted classes** → what the model chose (based on the biggest logit)

- **True classes** → the correct labels from the dataset

If they match for most of these random points, that’s a concrete sign the model isn’t just doing well “on average”, it’s making correct decisions on individual, unseen examples.

## 8. Decision boundaries

At this point, we’ve measured performance with loss and accuracy. Now we’re going to visualize what the network actually learned by drawing its decision boundary, the dividing line where the model switches from predicting Class 0 to Class 1. The idea behind a **decision boundary** is simple. For every possible point in the input space, the network must decide which class it belongs to. 

The idea is simple: we’ll cover the 2D input space with a dense grid of points, ask the model to predict a class at every grid point, and then color the regions. This gives us a picture of the model’s “rule” for separating the classes.

**Step 1: Ask the model to classify a grid of points**

In [None]:
# Create a grid of points (CPU tensors are fine here; we'll move the stacked grid to GPU)
x_min, x_max = -1.2, 1.2
y_min, y_max = -1.2, 1.2

xx, yy = torch.meshgrid(
    torch.linspace(x_min, x_max, 200),
    torch.linspace(y_min, y_max, 200),
    indexing="ij"
)

grid = torch.stack([xx.flatten(), yy.flatten()], dim=1).to(device)

# Run model on grid (GPU)
model.eval()
with torch.inference_mode():
    logits = model(grid)
    preds = torch.argmax(logits, dim=1)

Z = preds.reshape(xx.shape).cpu().numpy()

**Step 2: Overlay the real data (train vs test)**

Now that we have the model’s predicted regions, we’ll plot them and overlay the actual data points. We’ll do it twice: once for **training data** and once for **test data** so we can visually check whether the learned boundary generalizes.

In [None]:
# Plot decision regions + data in subplots
fig, axes = plt.subplots(1, 2, figsize=(8, 4), constrained_layout=True)

# Left: TRAIN
axes[0].contourf(xx.numpy(), yy.numpy(), Z, cmap="coolwarm", alpha=0.6)
axes[0].scatter(
    X_train[:, 0],
    X_train[:, 1],
    c=y_train,
    cmap="coolwarm",
    edgecolors="k",
    s=30
)
axes[0].set_title("Train")
axes[0].set_xlabel("Input A")
axes[0].set_ylabel("Input B")
axes[0].set_aspect("equal", adjustable="box")
axes[0].set_xlim(x_min, x_max)
axes[0].set_ylim(y_min, y_max)

# Right: TEST
axes[1].contourf(xx.numpy(), yy.numpy(), Z, cmap="coolwarm", alpha=0.6)
axes[1].scatter(
    X_test[:, 0],
    X_test[:, 1],
    c=y_test,
    cmap="coolwarm",
    edgecolors="k",
    s=30
)
axes[1].set_title("Test")
axes[1].set_xlabel("Input A")
axes[1].set_ylabel("Input B")
axes[1].set_aspect("equal", adjustable="box")
axes[1].set_xlim(x_min, x_max)
axes[1].set_ylim(y_min, y_max)

plt.show()

The figures above show the most important result of the entire lesson: **what the neural network has actually learned**. Instead of looking at numbers or printed predictions, we are now seeing the model’s understanding of the problem laid out visually across the entire input space.

**What to look for in the plot**

Spend a minute just looking:

- Does the boundary form a roughly circular split (inner vs outer region)?

- Do the train and test plots look similar?

- Are the “mistakes” mostly near the border, where the points are inherently ambiguous?

If the test plot looks clean and consistent with train, that’s a strong visual sign the network learned a real pattern—not just memorized the training points.

At this point, the story of the lesson comes full circle. 

- We began with a dataset that could not be solved by a straight line.
- We built a network capable of learning curved relationships, trained it through repeated feedback
- Verified its performance numerically, inspected individual predictions, and finally visualized the learned decision boundary.
- We got the visual proof that the network understands the problem, and it is the moment where everything clicks together.


## 9. Mapping to the PyTorch workflow


At this stage, it is useful to step back and connect everything we have done to the bigger picture of how PyTorch models are built and trained. Even though this lesson focused on a simple neural network and a small synthetic dataset, the workflow you followed is not special to this example. It is the standard pattern that underlies almost every PyTorch project, from toy problems to large-scale deep learning systems. This lesson covered **Steps 1–4** of the PyTorch workflow:

1. First, we prepared the data by converting it into tensors. PyTorch works with tensors, so this step is always required, whether the data represents points on a plane, rows in a table, or pixels in an image.
2. Next, we built a model by defining a neural network class with a clear architecture and a forward pass. This is where we described how inputs should flow through the network to produce outputs.
3. Then, we fit the model using a training loop, repeatedly sending data forward, measuring error, and letting PyTorch adjust the network through back propagation.
4. Finally, we evaluated the model by testing it on unseen data and visualizing its behavior.

What is especially important to notice is that this structure does not change as problems become more complex. **Convolutional Neural Networks (CNNs)**, for example, follow the exact same workflow. The only difference is the type of data and the layers inside the model. Instead of two input numbers, a CNN might take in an image. Instead of simple linear layers, it might use convolutional layers. But the steps, data preparation, model definition, training, and evaluation, remain the same. 

This is why **understanding this lesson means understanding the core of PyTorch training.**



## 10. Summary

In summary, this lesson was not just about classifying circles. 
- It was about learning how neural networks are trained in practice. 
- You saw why simple models can fail, how neural networks can learn more flexible patterns
- How training works through forward and back propagation, and how performance is measured and visualized.

**Everything you build next, larger networks, image models, transfer learning, or real-world datasets, will follow this same pattern**. If you understand this workflow, you have the foundation needed to move confidently into more advanced deep learning topics.