<a href="https://colab.research.google.com/github/christophergaughan/PyTorch/blob/main/Copy_of_PyTorch_cvlassification_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Steps to Build and Fit the Model
1. **Understand the Data**

* The data from question 1 needs to be recalled or recreated. For this example, I'll assume it's a simple 2D dataset (e.g., using scikit-learn's `make_moons`, `make_circles`, or `make_regression`).

2. Define the Model

* Subclassing `nn.Module` allows you to create a custom model by defining layers and the forward pass.
* Include non-linear activation functions (like `ReLU`, `Sigmoid`, or `Tanh`) to ensure the model can fit complex, non-linear data.
* Use a sequential or layer-by-layer structure for clarity.

3. Train the Model

* Define a loss function (e.g., MSELoss for regression, BCEWithLogitsLoss for binary classification).
* Choose an optimizer (e.g., Adam for adaptive learning rates).
* Use a training loop with forward passes, loss computation, backpropagation, and weight updates.


#### Build a model by subclassing nn.Module that incorporates non-linear activation functions and is capable of fitting the data you created in 1.


## Device agnostic code



### Working with NumPy Data in PyTorch: Key Points to Remember- at least for myself

1. **Data Type Conversion:**
   - Data imported from scikit-learn (or other libraries) is typically in the **NumPy data format**, which needs to be converted into PyTorch tensors for compatibility.
   - Use `torch.tensor(data, dtype=torch.float)` to ensure the data is in the correct format.
   - **Example:**
     ```python
     import torch
     import numpy as np

     numpy_data = np.array([[1.0, 2.0], [3.0, 4.0]])
     torch_data = torch.tensor(numpy_data, dtype=torch.float)
     ```

2. **Device Compatibility:**
   - Ensure that any tensor you create is moved to the appropriate device (`cpu` or `cuda`) before computation.
   - A common error arises when tensors remain on the CPU but are used with a model or operation on the GPU.
   - **Example:**
     ```python
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
     torch_data = torch.tensor(numpy_data, dtype=torch.float).to(device)
     ```

3. **Ensure Matching Data Types:**
   - PyTorch models and operations often require `torch.float` for numerical computations, but integers (`torch.int`) may be required for labels or indices.
   - Carefully manage data types when converting from NumPy, as NumPy's `int32` or `float64` might cause issues in PyTorch operations.

4. **Gradient Computation:**
   - If you're using tensors for model inputs that require gradient computation, ensure `requires_grad=True` is set during tensor creation.
   - **Example:**
     ```python
     torch_data = torch.tensor(numpy_data, dtype=torch.float, requires_grad=True).to(device)
     ```

5. **Keep an Eye on CPU/GPU Transfers:**
   - Data generated during computation might end up on the CPU (e.g., results from NumPy or after detaching gradients).
   - Always check the `.device` property of tensors, especially when combining operations.
   - **Example:**
     ```python
     # Check where the tensor is located
     print(torch_data.device)

     # Move to device if necessary
     torch_data = torch_data.to(device)
     ```

6. **Avoid Implicit Data Conversion:**
   - Operations between NumPy arrays and PyTorch tensors may lead to implicit conversions that can cause errors.
   - Convert NumPy arrays to PyTorch tensors explicitly before performing operations.

7. **Random Seed Consistency:**
   - Ensure reproducibility by setting random seeds for both NumPy and PyTorch, especially when generating data or initializing models.
   - **Example:**
     ```python
     import numpy as np
     import torch

     np.random.seed(42)
     torch.manual_seed(42)
     ```

8. **Batch Dimension Awareness:**
   - Ensure that data intended for model inputs includes a batch dimension, even for single samples. Use `.unsqueeze(0)` to add the batch dimension if necessary.
   - **Example:**
     ```python
     single_sample = torch.tensor([1.0, 2.0], dtype=torch.float).to(device)
     single_sample = single_sample.unsqueeze(0)  # Shape becomes [1, 2]
     ```

---

### Common Errors to Watch Out For

1. **Device Mismatch:**
   - **Example Error:** `RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0`.
   - **Solution:** Ensure all tensors and models are on the same device.

2. **Data Type Incompatibility:**
   - **Example Error:** `RuntimeError: Expected scalar type Float but found Double`.
   - **Solution:** Convert NumPy data to the appropriate PyTorch data type (`torch.float` for numerical tensors).

3. **Missing Batch Dimension:**
   - **Example Error:** `RuntimeError: Expected 4-dimensional input for 4-dimensional weight`.
   - **Solution:** Add a batch dimension using `.unsqueeze()` or reshape the data appropriately.

---

### Key Takeaways

- Always **convert NumPy data to PyTorch tensors** with the correct data type (`torch.float` or `torch.int`).
- Pay attention to **device placement** (`cpu` or `cuda`) and ensure consistency across operations.
- Set random seeds for reproducibility.
- Handle batch dimensions explicitly to avoid runtime errors.


## Convert to PyTorch tensors

### Why Use .unsqueeze(1)?
1. Shape of `y_train` and `y_test`:

* The labels (`y_train` and `y_test`) generated by `make_moons` are 1D arrays with shape (`n_samples,`) (e.g., `[0, 1, 0, 1, ...]`).
* PyTorch expects labels for binary classification to have a shape matching the model's output logits when using loss functions like `BCEWithLogitsLoss`. Typically, the logits are shaped (`n_samples, 1`).
2. What `.unsqueeze(1)` Does:

* `.unsqueeze(1)` *adds* an extra dimension, changing the shape from (`n_samples,`) to (`n_samples, 1`).
This ensures that the labels align with the model's output logits, which are usually shaped as (`n_samples, 1`) for binary classification.
3. Why It’s Needed:

Without `.unsqueeze(1)`, you’ll likely encounter a shape mismatch error when calculating the loss, such as:

In [None]:
import torch
from torch import nn

# Setup **device agnostic code**
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

In [None]:
!nvidia-smi

In [None]:
# Create a dataset with Scikit-Learn's make_moons()
import sklearn
from sklearn.datasets import make_moons
# Make 1500 circles
n_samples = 1000

# Create circles
X, y = make_moons(n_samples,
                    noise = 0.1,# we'll increase the noise in this data set
                    random_state=42)


In [None]:
# Turn data into a DataFrame
import pandas as pd
moons = pd.DataFrame({'X1': X[:, 0],
                        'X2': X[:, 1],
                        'label': y})
moons.head()

In [None]:
import matplotlib.pyplot as plt

plt.scatter(x=X[:, 0],
            y=X[:, 1],
            c=y,
            cmap=plt.cm.RdYlBu);

In [None]:
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn

# Normalize features
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
# The data is in numpy arrays, we need to turn into pytorch tensors
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

In [None]:
# Split data randomly
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=42)


In [None]:
# Define the neural network model for binary classification
class MoonModelV2(nn.Module):
    def __init__(self):
        super().__init__()
        # Input layer to the first hidden layer (2 input features, 64 hidden units)
        self.layer_1 = nn.Linear(2, 64)
        self.relu_1 = nn.ReLU()  # Apply ReLU activation for non-linearity

        # Second hidden layer (64 hidden units)
        self.layer_2 = nn.Linear(64, 64)
        self.relu_2 = nn.ReLU()  # Apply ReLU activation for non-linearity

        # Output layer (1 unit for binary classification logits)
        self.layer_3 = nn.Linear(64, 1)

    def forward(self, x):
        # Pass the input through the layers with activations
        x = self.relu_1(self.layer_1(x))
        x = self.relu_2(self.layer_2(x))
        return self.layer_3(x)  # Return raw logits for use with BCEWithLogitsLoss



In [None]:
# Instantiate the model and move it to the appropriate device (GPU)
model_1a = MoonModelV2().to(device)

# Ensure training and testing data are also on the same device
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)


### If you're running a jupyter notebook and re-running cells, this initializing weights could come in handy

In [None]:
# Function to initialize weights for the linear layers
def initialize_weights(m):
    if isinstance(m, nn.Linear):
        # Xavier initialization for weights (good for layers with ReLU activations)
        nn.init.xavier_uniform_(m.weight)
        # Set biases to zero
        nn.init.zeros_(m.bias)

# Apply the weight initialization to all layers of the model
model_1a.apply(initialize_weights)


In [None]:
# Define the Binary Cross-Entropy Loss with Logits
# This loss function is designed for binary classification and expects raw logits
loss_fn = nn.BCEWithLogitsLoss()

# Use the Adam optimizer with a learning rate of 0.01 for efficient training
# Adam dynamically adjusts learning rates for each parameter
optimizer = torch.optim.Adam(params=model_1a.parameters(), lr=0.01)


In [None]:
# Set manual seeds for reproducibility
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Define the number of epochs for training- this is an easy model so it doesn't require much computing power
epochs = 100

for epoch in range(epochs):
    # Set the model to training mode
    model_1a.train()

    # Perform a forward pass to calculate logits
    y_logits = model_1a(X_train).squeeze()  # Squeeze to ensure dimensions match

    # Calculate the loss using BCEWithLogitsLoss
    loss = loss_fn(y_logits, y_train.squeeze())

    # Zero gradients to prevent accumulation
    optimizer.zero_grad()

    # Backpropagate the loss to compute gradients
    loss.backward()

    # Update model weights using the optimizer
    optimizer.step()

    # Set the model to evaluation mode for testing
    model_1a.eval()
    with torch.no_grad():  # Disable gradient computation for efficiency
        # Forward pass for the test data
        test_logits = model_1a(X_test).squeeze()  # Logits for test data

        # Calculate test loss
        test_loss = loss_fn(test_logits, y_test.squeeze())

        # Convert logits to probabilities and round to binary predictions
        test_pred = torch.round(torch.sigmoid(test_logits))

        # Calculate test accuracy
        test_acc = (test_pred == y_test.squeeze()).float().mean().item() * 100

    # Print results every 100 epochs
    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss:.4f}, Test Loss = {test_loss:.4f}, Test Acc = {test_acc:.2f}%")

In [None]:
model_1a.eval()
with torch.inference_mode():
    y_preds = torch.round(torch.sigmoid(model_1a(X_test))).squeeze()
y_preds[:10], y_test[:10]

In [None]:
import requests
from pathlib import Path

# 1. (Optional) Remove the existing (likely invalid) helper_functions.py
# !rm helper_functions.py

# 2. Use the *raw* GitHub URL
url_to_download = "https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py"

if Path("helper_functions.py").is_file():
    print("helper_functions.py already exists, skipping download")
else:
    print("Downloading helper_functions.py")
    request = requests.get(url_to_download)
    with open("helper_functions.py", "wb") as f:
        f.write(request.content)


In [None]:
from helper_functions import plot_predictions, plot_decision_boundary


In [None]:
# plot decision Boundaries
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_1a, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_1a, X_test, y_test)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


In [None]:
# Assuming 'device' is defined (e.g., device = 'cuda' or 'cpu')

# Install torchmetrics if not installed
!pip install torchmetrics

# Import necessary modules
import torch
from torchmetrics.classification import BinaryAccuracy

# Assuming y_test and y_preds are already defined

# Move tensors to the appropriate device (e.g., CUDA or CPU)
target = y_test.to(device)  # Ground truth labels
preds = y_preds.to(device)  # Model predictions (logits)

# Apply sigmoid if necessary (if your preds are logits)
preds = torch.sigmoid(preds)

# Initialize the BinaryAccuracy metric and move it to the same device as preds and target
metric = BinaryAccuracy().to(device)

# Update the metric with predictions and targets
accuracy = metric(preds, target)

# Print the accuracy
print(f"Binary Accuracy: {accuracy.item():.4f}")


### Replicate the Tanh (hyperbolic tangent) activation function in pure PyTorch.
Feel free to reference the ML cheatsheet website for the formula.


In [None]:
import torch
import torch.nn.functional as F

def custom_tanh_v2(x):
    return F.tanh(x)

# Testing the function
x = torch.tensor([0.0, 1.0, -1.0, 2.0])  # Example input tensor
output = custom_tanh_v2(x)

print("Input:", x)
print("Tanh Output:", output)


## Lets make the graph

In [None]:
# Create a tensor
A = torch.arange(-10, 10, 1, dtype = torch.float32)
A.dtype


In [None]:
def tanh(A):
	return (torch.exp(A) - torch.exp(-A)) / (torch.exp(A) + torch.exp(-A))

In [None]:
plt.plot(tanh(A));

### Create a multi-class dataset using the spirals data creation function from CS231n (see below for the code).
* Construct a model capable of fitting the data (you may need a combination of linear and non-linear layers).
* Build a loss function and optimizer capable of handling multi-class data (optional extension: use the Adam optimizer instead of SGD, you may have to experiment with different values of the learning rate to get it working).
* Make a training and testing loop for the multi-class data and train a model on it to reach over 95% testing accuracy (you can use any accuracy measuring function here that you like).
* Plot the decision boundaries on the spirals dataset from your model predictions, the plot_decision_boundary() function should work for this dataset too.

In [None]:
# Code for creating a spiral dataset from CS231n
import numpy as np
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in range(K):
  ix = range(N*j,N*(j+1))
  r = np.linspace(0.0,1,N) # radius
  t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
  X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
  y[ix] = j
# lets visualize the data
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show();

## Using the spiral dataset
In the case of the spiral dataset, you're starting with NumPy arrays (X and y) and want to split the dataset before converting it into tensors.

### Here’s the main difference:

* In the make_blobs example, the data was already in the format you needed, so you just directly converted it to PyTorch tensors before splitting.
* In the spiral dataset example, since you're using train_test_split from sklearn, the data has to be in a NumPy array format for it to work. So, you convert the data to tensors after splitting (since train_test_split works with NumPy arrays).
### Proper syntax for the spiral dataset:
Given that you want to split the dataset into training and test sets before converting the data into PyTorch tensors, here's the updated version:

In [None]:
# Split into training and test using NumPy arrays
X_train, X_test, y_train, y_test = train_test_split(
    X,  # X is a NumPy array at this point
    y,  # y is a NumPy array at this point
    test_size=0.2,
    random_state=42
)

# Now convert the training and test sets into PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)  # For classification, use torch.long for labels
y_test = torch.tensor(y_test, dtype=torch.long)


In [None]:
# Create device agnostic code

device = "cuda" if torch.cuda.is_available() else "cpu"
device

In [None]:
import torch.nn as nn

class ImprovedSpiralModel(nn.Module):
    def __init__(self, input_features, output_features, hidden_units=8):  # Default hidden_units to 8
        super().__init__()
        self.linear_layer_stack = nn.Sequential(
            nn.Linear(input_features, hidden_units),
            nn.BatchNorm1d(hidden_units),  # Match BatchNorm to hidden_units
            nn.LeakyReLU(),
            nn.Dropout(0.2),  # Dropout for regularization
            nn.Linear(hidden_units, hidden_units),
            nn.BatchNorm1d(hidden_units),  # Match BatchNorm to hidden_units
            nn.LeakyReLU(),
            nn.Dropout(0.2),  # Dropout for regularization
            nn.Linear(hidden_units, output_features)  # Output layer
        )

    def forward(self, x):
        return self.linear_layer_stack(x)

# Create an instance of the updated model
model_4a = ImprovedSpiralModel(
    input_features=2,  # Match input features to your dataset
    output_features=3,  # Number of classes
    hidden_units=8      # Match hidden_units to avoid conflicts
).to(device)
model_4a

In [None]:
# Function to initialize weights for the linear layers
def initialize_weights(m):
    if isinstance(m, nn.Linear):
        # Xavier initialization for weights (good for layers with ReLU activations)
        nn.init.xavier_uniform_(m.weight)
        # Set biases to zero
        nn.init.zeros_(m.bias)

# Apply the weight initialization to all layers of the model
model_4a.apply(initialize_weights)

In [None]:
X_train.shape, y_train.shape, X_test.shape, y_test.shape

In [None]:
torch.unique(y_train)

### Use Adam optimizer as per directions above

In [None]:
# Create loss and optimizer- with multiclass we use cross entropy loss
# Note: we have a balanced training set
loss_fn = nn.CrossEntropyLoss() # loss function measures how wrong our model our model's predictions are
optimizer = torch.optim.Adam(params=model_4a.parameters(), # optimizer updates our model parameter's to try to reduce the loss
                            lr=0.1)

### Problem Explanation
* You are using `.to(device)` on `X_test`, which is intended to move the test data to a specific device (either CPU or GPU).
* However, the model itself (`model_4a`) might not be on the same device as `X_test`. If the model is on the CPU and you're trying to send the data to the GPU (or vice versa), this will lead to a device mismatch, which PyTorch cannot handle when performing computations.
### Fixing the Issue
* To fix this, ensure that both the model and the input data are on the same device. Here's the corrected approach:

* Move the model to the same device as the data (whether it's the CPU or GPU).
Ensure that you move both `X_test` and the model to the same device before performing inference.

In [None]:
print(model_4a)


In [None]:
print(X_test.shape)


In [None]:
import torch.nn as nn

# Replace the first layer to accept 2 features instead of 100
model_4a.linear_layer_stack[0] = nn.Linear(2, 8)  # Update the first layer
model_4a.to(device)  # Move the model back to the correct device


In [None]:
X_test = X_test.to(device)


In [None]:
# Ensure model is on the correct device
model_4a.to(device)

# Ensure data tensors are on the same device
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)


In [None]:
print(next(model_4a.parameters()).device)


In [None]:
print("X_test device:", X_test.device)      # Should print 'cuda:0'
print("Model device:", next(model_4a.parameters()).device)  # Should also print 'cuda:0'


In [None]:
y_test[:10]

In [None]:
# Get logits directly from the model
y_logits = model_4a(X_test)

# Convert logits to probabilities
y_pred_probs = torch.softmax(y_logits, dim=1)  # Calculate probabilities across classes
print("Logits:\n", y_logits[:5])
print("Probabilities:\n", y_pred_probs[:5])


In [None]:
print("y_logits shape:", y_logits.shape)  # Should be (batch_size, num_classes)


In [None]:
# Convert our models logit outputs to prediction probabilities
y_pred_probs = torch.softmax(y_logits, dim=1) # we want them accross the first dimension
print(y_logits[:5])
print(y_pred_probs[:5])

In [None]:
y_pred_probs[0]

In [None]:
torch.max(y_pred_probs[0])

### convert our models prediction probabilities to prediction labels - done using `argmax()`--> finds the index of this argmax

In [None]:
y_preds = torch.argmax(y_pred_probs, dim=1)
y_preds

In [None]:
y_test

In [None]:
# Calculate accuracy- out of 100 examples what percentage does our model get right?
def accuracy_fn(y_true, y_pred):
    correct = torch.eq(y_true, y_pred).sum().item()
    acc = (correct / len(y_pred)) * 100
    return acc

### Learning Rate Scheduler
A static learning rate can lead to slower convergence or overshooting. Use a learning rate scheduler to reduce the learning rate during training:

In [None]:
from torch.optim.lr_scheduler import StepLR

# Define the optimizer and scheduler
optimizer = torch.optim.Adam(model_4a.parameters(), lr=0.01)  # Start with a higher learning rate
scheduler = StepLR(optimizer, step_size=20, gamma=0.5)  # Reduce LR by 50% every 20 epochs


Incorporate the scheduler step in the training loop:

In [None]:
for epoch in range(epochs):
    model_4a.train()
    y_logits = model_4a(X_train).squeeze()
    loss = loss_fn(y_logits, y_train.long())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Scheduler step
    scheduler.step()


Regularization
Add regularization techniques to improve generalization:

Weight Decay: Apply L2 regularization using the `weight_decay` parameter in the optimizer

In [None]:
optimizer = torch.optim.Adam(model_4a.parameters(), lr=0.01, weight_decay=1e-4)


Dropout: Increase dropout in the model

In [None]:
nn.Dropout(p=0.3)  # Increase dropout rate to 30%


#### Batch Size
Use a smaller batch size to improve gradient estimates, which can lead to better optimization. For example:

In [None]:
batch_size = 16  # Experiment with batch sizes like 8, 16, or 32
train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

for epoch in range(epochs):
    model_4a.train()
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)
        y_logits = model_4a(X_batch).squeeze()
        loss = loss_fn(y_logits, y_batch.long())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


#### Early Stopping
Monitor the test accuracy and stop training early if it stops improving:

In [None]:
best_test_acc = 0
early_stop_count = 0
patience = 10  # Stop if no improvement for 10 epochs

for epoch in range(epochs):
    # Training and testing as before...
    if test_acc > best_test_acc:
        best_test_acc = test_acc
        early_stop_count = 0
    else:
        early_stop_count += 1

    if early_stop_count >= patience:
        print(f"Early stopping at epoch {epoch}")
        break


#### Optimize Data Preprocessing
Ensure data is normalized to have zero mean and unit variance for optimal performance:

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = torch.tensor(scaler.fit_transform(X_train.cpu())).float().to(device)
X_test = torch.tensor(scaler.transform(X_test.cpu())).float().to(device)


In [None]:
for epoch in range(epochs):
    model_4a.train()
    y_logits = model_4a(X_train).squeeze()
    loss = loss_fn(y_logits, y_train.long())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Scheduler step
    scheduler.step()


In [None]:
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Optimizer and Scheduler
optimizer = torch.optim.Adam(model_4a.parameters(), lr=0.01, weight_decay=1e-4)
scheduler = StepLR(optimizer, step_size=20, gamma=0.5)

# Early Stopping
best_test_acc = 0
early_stop_count = 0
patience = 10

for epoch in range(epochs):
    model_4a.train()
    y_logits = model_4a(X_train).squeeze()
    y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1)
    loss = loss_fn(y_logits, y_train.long())
    acc = accuracy_fn(y_true=y_train, y_pred=y_pred)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Scheduler step
    scheduler.step()

    # Testing
    model_4a.eval()
    with torch.inference_mode():
        test_logits = model_4a(X_test)
        test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1)
        test_loss = loss_fn(test_logits, y_test.long())
        test_acc = accuracy_fn(y_true=y_test, y_pred=test_pred)

    # Early stopping
    if test_acc > best_test_acc:
        best_test_acc = test_acc
        early_stop_count = 0
    else:
        early_stop_count += 1

    if early_stop_count >= patience:
        print(f"Early stopping at epoch {epoch}")
        break

    if epoch % 10 == 0:
        print(f"Epoch: {epoch} | Loss: {loss:.4f}, Acc: {acc:.2f}% | Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%")


## Meets requirements

## Helper functions

In [None]:
import requests
from pathlib import Path

# 1. (Optional) Remove the existing (likely invalid) helper_functions.py
# !rm helper_functions.py

# 2. Use the *raw* GitHub URL
url_to_download = "https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py"

if Path("helper_functions.py").is_file():
    print("helper_functions.py already exists, skipping download")
else:
    print("Downloading helper_functions.py")
    request = requests.get(url_to_download)
    with open("helper_functions.py", "wb") as f:
        f.write(request.content)


In [None]:
from helper_functions import plot_predictions, plot_decision_boundary


In [None]:
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_4a, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_4a, X_test, y_test)


## That's as well as I can do here given the synthetic data set- colors are hard to read which is how the problem was written so....

In [None]:
from torchmetrics import Accuracy

# Setup metric
torchmetric_accuracy = Accuracy(task="multiclass", num_classes=3).to(device)

# Pass raw logits (or predicted probabilities) instead of class indices
torchmetric_result = torchmetric_accuracy(test_logits, y_test)
print(f"Accuracy (torchmetrics): {torchmetric_result:.4f}")


## Reached 95% accuracy

## Using sklearn for Full Classification Report
If you prefer sklearn, you can use the classification_report function. <b> *However, this requires moving data back to the CPU and converting tensors to NumPy arrays.*</b>

In [None]:
from sklearn.metrics import classification_report

# Move tensors to CPU and convert to NumPy
y_pred_classes = test_logits.argmax(dim=1).cpu().numpy()
y_true = y_test.cpu().numpy()

# Generate the classification report
print(classification_report(y_true, y_pred_classes, target_names=["Class 0", "Class 1", "Class 2"]))


### Overall, I say this assignment got tricky when the Spiral data was introduced and a 95% accuracy was required, and the assignment called for using a mix of linear and non-linear layers as in

* Construct a model capable of fitting the data (you may need a combination of linear and non-linear layers).