<a href="https://colab.research.google.com/github/JunHL96/PyTorch-Course/blob/main/02_pytorch_classification_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 02. PyTorch Neural Network Classification

## What is a classification problem?

A [classification problem](https://en.wikipedia.org/wiki/Statistical_classification) involves predicting whether something is one thing or another.

For example, you might want to:

| Problem type | What is it? | Example |
| ----- | ----- | ----- |
| **Binary classification** | Target can be one of two options, e.g. yes or no | Predict whether or not someone has heart disease based on their health parameters. |
| **Multi-class classification** | Target can be one of more than two options | Decide whether a photo is of food, a person or a dog. |
| **Multi-label classification** | Target can be assigned more than one option | Predict what categories should be assigned to a Wikipedia article (e.g. mathematics, science & philosophy). |

<div align="center">
<img src="https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/02-different-classification-problems.png" alt="various different classification in machine learning such as binary classification, multiclass classification and multilabel classification" width=900/>
</div>
    
Classification, along with regression (predicting a number, covered in [notebook 01](https://www.learnpytorch.io/01_pytorch_workflow/)) is one of the most common types of machine learning problems.

In this notebook, we're going to work through a couple of different classification problems with PyTorch.

In other words, taking a set of inputs and predicting what class those set of inputs belong to.


## What we're going to cover

In this notebook we're going to reiterate over the PyTorch workflow we covered in [01. PyTorch Workflow](https://www.learnpytorch.io/02_pytorch_classification/).

<img src="https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/01_a_pytorch_workflow.png" alt="a pytorch workflow flowchart" width=900/>

Except instead of trying to predict a straight line (predicting a number, also called a regression problem), we'll be working on a **classification problem**.

Specifically, we're going to cover:

| **Topic** | **Contents** |
| ----- | ----- |
| **0. Architecture of a classification neural network** | Neural networks can come in almost any shape or size, but they typically follow a similar floor plan. |
| **1. Getting binary classification data ready** | Data can be almost anything but to get started we're going to create a simple binary classification dataset. |
| **2. Building a PyTorch classification model** | Here we'll create a model to learn patterns in the data, we'll also choose a **loss function**, **optimizer** and build a **training loop** specific to classification. |
| **3. Fitting the model to data (training)** | We've got data and a model, now let's let the model (try to) find patterns in the (**training**) data. |
| **4. Making predictions and evaluating a model (inference)** | Our model's found patterns in the data, let's compare its findings to the actual (**testing**) data. |
| **5. Improving a model (from a model perspective)** | We've trained and evaluated a model but it's not working, let's try a few things to improve it. |
| **6. Non-linearity** | So far our model has only had the ability to model straight lines, what about non-linear (non-straight) lines? |
| **7. Replicating non-linear functions** | We used **non-linear functions** to help model non-linear data, but what do these look like? |
| **8. Putting it all together with multi-class classification** | Let's put everything we've done so far for binary classification together with a multi-class classification problem. |


## 0. Architecture of a classification neural network

Before we get into writing code, let's look at the general architecture of a classification neural network.

| **Hyperparameter** | **Binary Classification** | **Multiclass classification** |
| --- | --- | --- |
| **Input layer shape** (`in_features`) | Same as number of features (e.g. 5 for age, sex, height, weight, smoking status in heart disease prediction) | Same as binary classification |
| **Hidden layer(s)** | Problem specific, minimum = 1, maximum = unlimited | Same as binary classification |
| **Neurons per hidden layer** | Problem specific, generally 10 to 512 | Same as binary classification |
| **Output layer shape** (`out_features`) | 1 (one class or the other) | 1 per class (e.g. 3 for food, person or dog photo) |
| **Hidden layer activation** | Usually [ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU) (rectified linear unit) but [can be many others](https://en.wikipedia.org/wiki/Activation_function#Table_of_activation_functions) | Same as binary classification |
| **Output activation** | [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) ([`torch.sigmoid`](https://pytorch.org/docs/stable/generated/torch.sigmoid.html) in PyTorch)| [Softmax](https://en.wikipedia.org/wiki/Softmax_function) ([`torch.softmax`](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) in PyTorch) |
| **Loss function** | [Binary crossentropy](https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_loss_function_and_logistic_regression) ([`torch.nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) in PyTorch) | Cross entropy ([`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) in PyTorch) |
| **Optimizer** | [SGD](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) (stochastic gradient descent), [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) (see [`torch.optim`](https://pytorch.org/docs/stable/optim.html) for more options) | Same as binary classification |

Of course, this ingredient list of classification neural network components will vary depending on the problem you're working on.

But it's more than enough to get started.

We're going to get hands-on with this setup throughout this notebook.

## 1. Make classification data and get it ready

Let's begin by making some data.

We'll use the [`make_circles()`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html) method from Scikit-Learn to generate two circles with different coloured dots.

> **Note**: `Scikit-Learn` is a popular Python library that provides efficient tools for machine learning and data analysis. It includes functions for data generation, such as `make_circles()` for creating synthetic datasets, as well as tools for preprocessing, training models, and evaluating performance. This helps streamline the process of developing and prototyping machine learning workflows.

In [None]:
from sklearn.datasets import make_circles  # Import the make_circles function to generate a synthetic dataset

# Number of samples to generate
n_samples = 1000

# Create circles dataset with 1000 samples, slight noise, and a fixed random state for reproducibility
X, y = make_circles(
    n_samples,         # Number of data points to generate
    noise=0.03,        # Adds slight noise to make the data points not perfectly separable, simulating real-world conditions
    random_state=42    # Sets the random seed for reproducibility so that the dataset generated is consistent each time
)

# View the first 5 rows of the dataset
print(f"First 5 samples of X:\n {X[:5]}")
print(f"First 5 samples of y:\n {y[:5]}")

Looks like there's two `X` values per one `y` value.

Let's keep following the data explorer's motto of *visualize, visualize, visualize* and put them into a pandas DataFrame.

> **Note**: `pandas` is a powerful open-source Python library used for data manipulation and analysis. It provides data structures like `DataFrame` and `Series` that allow for easy handling and analysis of structured data. With `pandas`, you can perform operations such as data cleaning, merging, reshaping, and aggregation, making it an essential tool for data science and machine learning workflows.

> **Note**: A `DataFrame` is a two-dimensional, tabular data structure provided by the `pandas` library in Python. It is similar to a spreadsheet or SQL table and consists of rows and columns. Each column in a `DataFrame` can hold different types of data (e.g., integers, strings, floats), and it allows for easy data manipulation, analysis, and visualization. `DataFrames` support a wide range of operations such as filtering, grouping, and merging data, making them versatile for data analysis tasks.

In [None]:
# Import the pandas library for data manipulation and analysis
import pandas as pd

# Create a DataFrame from the generated circle data
circles = pd.DataFrame({
    "X1": X[:, 0],   # The first feature column from the X array (x-coordinates of the points)
    "X2": X[:, 1],   # The second feature column from the X array (y-coordinates of the points)
    "label": y       # The labels indicating the circle each point belongs to (0 or 1)
})

# Display the first 10 rows of the DataFrame to check the data structure
circles.head(10)

It looks like each pair of `X` features (`X1` and `X2`) has a label (`y`) value of either 0 or 1.

This tells us that our problem is **binary classification** since there's only two options (0 or 1).

How many values of each class are there?

In [None]:
# Count the number of occurrences of each label in the 'label' column of the DataFrame
circles.label.value_counts()  # This will display the count of points labeled as 0 and 1



The output helps verify the distribution of the data by showing how many samples belong to each class (0 or 1).

In [None]:
# Import the matplotlib library for plotting
import matplotlib.pyplot as plt

# Create a scatter plot to visualize the circle data
plt.scatter(
    x=X[:, 0],          # x-coordinates of the points (X1)
    y=X[:, 1],          # y-coordinates of the points (X2)
    c=y,                # Color the points based on their labels (0 or 1)
    cmap=plt.cm.RdYlBu  # Use the 'Red-Yellow-Blue' colormap to differentiate the classes visually
);

# The plot displays the points in the feature space, with different colors representing different classes (labels).

Let's find out how we could build a PyTorch neural network to classify dots into red (0) or blue (1).

> **Note:** This dataset is often what's considered a **toy problem** (a problem that's used to try and test things out on) in machine learning.
>
> But it represents the major key of classification, you have some kind of data represented as numerical values and you'd like to build a model that's able to classify it, in our case, separate it into red or blue dots.

### 1.1 Input and output shapes

One of the most common errors in deep learning is shape errors.

Mismatching the shapes of tensors and tensor operations will result in errors in your models.

We're going to see plenty of these throughout the course.

And there's no surefire way to make sure they won't happen, they will.

What you can do instead is continually familiarize yourself with the shape of the data you're working with.

Bourke likes referring to it as input and output shapes.

Ask yourself:

"What shapes are my inputs and what shapes are my outputs?"

Let's find out.

In [None]:
# Check the shapes of our features and labels
X.shape, y.shape

Looks like we've got a match on the first dimension of each.

There's 1000 `X` and 1000 `y`.

But what's the second dimension on `X`?

It often helps to view the values and shapes of a single sample (features and labels).

Doing so will help you understand what input and output shapes you'd be expecting from your model.

In [None]:
# View the first example of features and labels
X_sample = X[0]  # Select the first sample from the feature array (X)
y_sample = y[0]  # Select the first label from the label array (y)

# Print the values of the first sample's features and label
print(f"Values for one sample of X: {X_sample} and the same for y: {y_sample}")

# Print the shapes of the first sample's features and label
print(f"Shapes for one sample of X: {X_sample.shape} and the same for y: {y_sample.shape}")

# This code helps check the format and shape of one data sample to ensure that the data has the expected structure.


**Explanation of Input and Output Shapes**

When working with machine learning models, it's essential to understand the **shapes** of your input (`X`) and output (`y`) data to prevent errors when training or predicting. Here’s how this code relates to getting familiar with the data shapes:

- **`X_sample`** represents a single feature vector (e.g., the coordinates `[x1, x2]` of one data point). The output shows that `X_sample` has a shape of `(2,)`, indicating it is a 1D array with 2 elements (features).
- **`y_sample`** is the label associated with that feature vector (e.g., `0` or `1`). The output shows that `y_sample` has a shape of `()`, indicating it is a scalar (a single value representing the label).



### 1.2 Turn data into tensors and create train and test splits

We've investigated the input and output shapes of our data, now let's prepare it for being used with PyTorch and for modelling.

Specifically, we'll need to:
1. Turn our data into tensors (right now our data is in NumPy arrays and PyTorch prefers to work with PyTorch tensors).
2. Split our data into training and test sets (we'll train a model on the training set to learn the patterns between `X` and `y` and then evaluate those learned patterns on the test dataset).

In [None]:
# Turn data into tensors, otherwise this causes issues with computations later on
import torch
torch.__version__

# Convert the feature array (X) from a NumPy array to a PyTorch tensor and cast it to type float32
X = torch.from_numpy(X).type(torch.float)

# Convert the label array (y) from a NumPy array to a PyTorch tensor and cast it to type float32
y = torch.from_numpy(y).type(torch.float)

# View the first five samples of the feature and label tensors to verify the conversion
X[:5], y[:5]  # Display the first five rows of X and corresponding labels in y


Now our data is in tensor format, let's split it into training and test sets.

To do so, let's use the helpful function [`train_test_split()`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from Scikit-Learn.

We'll use `test_size=0.2` (80% training, 20% testing) and because the split happens randomly across the data, let's use `random_state=42` so the split is reproducible.

In [None]:
# Split data into train and test sets
from sklearn.model_selection import train_test_split  # Import function to split data into training and testing sets

# Split the data into training and testing sets, with 20% of the data reserved for testing
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,  # 20% of the data will be used as the test set, 80% of data for training set
                                                    random_state=42)  # Set a seed for reproducibility of the split

# Print the lengths of the training and testing sets to verify the split
len(X_train), len(X_test), len(y_train), len(y_test)


Nice! Looks like we've now got 800 training samples and 200 testing samples.

## 2. Building a model

Let's build a model to classify our blue and red dots.

We'll break it down into a few parts.

1. Setting up device agnostic code (so our model can run on CPU or GPU if it's available).
2. Constructing a model by subclassing `nn.Module`.
3. Defining a loss function and optimizer.
4. Creating a training loop (this'll be in the next section).

The good news is we've been through all of the above steps before in notebook 01.

Except now we'll be adjusting them so they work with a classification dataset.

Let's start by importing PyTorch and `torch.nn` as well as setting up device agnostic code.

In [None]:
# Standard PyTorch imports
import torch
from torch import nn

# Device-Agnostic Code
if torch.cuda.is_available():
    device = "cuda" # NVIDIA GPU
elif torch.backends.mps.is_available():
    device = "mps" # Apple GPU
else:
    device = "cpu" # Defaults to CPU if NVIDIA GPU/Apple GPU aren't available

print(f"Using device: {device}")

Excellent, now `device` is setup, we can use it for any data or models we create and PyTorch will handle it on the CPU (default) or GPU if it's available.

How about we create a model?

We'll want a model capable of handling our `X` data as inputs and producing something in the shape of our `y` data as outputs.

In other words, given `X` (features) we want our model to predict `y` (label).

This setup where you have features and labels is referred to as **supervised learning**. Because your data is telling your model what the outputs should be given a certain input.

To create such a model it'll need to handle the input and output shapes of `X` and `y`.

Remember how I said input and output shapes are important? Here we'll see why.

Let's create a model class that:
1. Subclasses `nn.Module` (almost all PyTorch models are subclasses of `nn.Module`).
2. Creates 2 `nn.Linear` layers in the constructor capable of handling the input and output shapes of `X` and `y`.
3. Defines a `forward()` method containing the forward pass computation of the model.
4. Instantiates the model class and sends it to the target `device`.

In [None]:
X_train.shape

In [None]:
y_train[:5]

In [None]:
# 1. Construct a model class that inherits from nn.Module for creating neural networks
class CircleModelV0(nn.Module):
    def __init__(self):
        super().__init__()  # Initialize the base class
        # 2. Define two linear layers: input to hidden, and hidden to output

        # One method to do step 2.
        self.layer_1 = nn.Linear(in_features=2, out_features=5)  # Input layer: 2 features in, 5 features out
        self.layer_2 = nn.Linear(in_features=5, out_features=1)  # Output layer: 5 features in, 1 feature out

        # Another method to do step 2. Refer to sequential explanation below
        # self.two_linear_layers = nn.Sequential(  # Create a sequential container for the layers
        #     nn.Linear(in_features=2, out_features=5),  # First layer: 2 input features, 5 output features
        #     nn.Linear(in_features=5, out_features=1)   # Second layer: 5 input features, 1 output feature
        # )

    # 3. Implement the forward pass to define how data moves through the network
    def forward(self, x):  # x is the input data
        return self.layer_2(self.layer_1(x))  # Pass input data through layer_1, then through layer_2

# 4. Create an instance of the model and move it to the specified device (e.g., CPU or GPU)
model_0 = CircleModelV0().to(device)
model_0  # Display model architecture


**What's going on here?**

We've seen a few of these steps before.

The only major change is what's happening between `self.layer_1` and `self.layer_2`.

`self.layer_1` takes 2 input features `in_features=2` and produces 5 output features `out_features=5`.

This is known as having 5 **hidden units** or **neurons**.

This layer turns the input data from having 2 features to 5 features.

**Why do this?**

This allows the model to learn patterns from 5 numbers rather than just 2 numbers, *potentially* leading to better outputs.

I say potentially because sometimes it doesn't work.

The number of hidden units you can use in neural network layers is a **hyperparameter** (a value you can set yourself) and there's no set in stone value you have to use.

Generally more is better but there's also such a thing as too much. The amount you choose will depend on your model type and dataset you're working with.

Since our dataset is small and simple, we'll keep it small.

The only rule with hidden units is that the next layer, in our case, `self.layer_2` has to take the same `in_features` as the previous layer `out_features`.

That's why `self.layer_2` has `in_features=5`, it takes the `out_features=5` from `self.layer_1` and performs a linear computation on them, turning them into `out_features=1` (the same shape as `y`).

![A visual example of what a classification neural network with linear activation looks like on the tensorflow playground](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/02-tensorflow-playground-linear-activation.png)
*A visual example of what a similar classification neural network to the one we've just built looks like. Try creating one of your own on the [TensorFlow Playground website](https://playground.tensorflow.org/).*

You can also do the same as above using [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html).

`nn.Sequential` performs a forward pass computation of the input data through the layers in the order they appear.

In [None]:
# Replicate CircleModelV0 with nn.Sequential
model_0 = nn.Sequential(  # Create a model using a sequential container
    nn.Linear(in_features=2, out_features=5),  # First layer: 2 input features, 5 output features
    nn.Linear(in_features=5, out_features=1)   # Second layer: 5 input features, 1 output feature
).to(device)  # Move the model to the specified device (e.g., CPU or GPU)

# This sequential model has the same outcome and structure as the CircleModelV0 class defined earlier.
model_0  # Display the model to verify the architecture

In [None]:
model_0.state_dict()

`nn.Sequential` is a powerful utility in PyTorch for creating neural networks that execute a series of layers in order. It simplifies the process of defining models when the computation only needs to follow a linear path from input to output.

**What is `nn.Sequential`?**
`nn.Sequential` allows you to create models where the layers are stacked sequentially, meaning each layer passes its output directly to the next layer. It’s best suited for models that do not require branching or custom layer connections.

You can learn more about it in the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html).

**When to Use `nn.Sequential`:**
- Ideal for straightforward, linear computations where the output of one layer is the input to the next.
- Fast to implement and easy to read for simple feedforward architectures.

### Limitations:
- Runs strictly in a sequential order. If your model needs more complex behavior (e.g., multiple branches, residual connections, or conditional logic), you need to subclass `nn.Module` and define a custom `forward` method.

By understanding these points, you can choose between the simplicity of `nn.Sequential` and the flexibility of subclassing `nn.Module` based on your model's requirements.


In [None]:
# Make predictions with the model
untrained_preds = model_0(X_test.to(device))  # Generate predictions from the untrained model using the test data

# Print the length and shape of the predictions to confirm output dimensions
print(f"Length of predictions: {len(untrained_preds)}, Shape: {untrained_preds.shape}")

# Print the length and shape of the test labels to verify consistency
print(f"Length of test samples: {len(y_test)}, Shape: {y_test.shape}")

# Display the first 10 predictions from the model for inspection
print(f"\nFirst 10 predictions:\n{untrained_preds[:10]}")

# Display the first 10 actual test labels to compare with predictions
print(f"\nFirst 10 test labels:\n{y_test[:10]}")

"Hmm, it seems there are the same amount of predictions as there are test labels but the predictions don't look like they're in the same form or shape as the test labels.

This shape difference is due to the model's `Linear` layer outputting a 2D tensor (e.g., [200, 1]), which is normal at this stage for compatibility with matrix operations. The test labels are stored as a 1D tensor (e.g., [200]), and this discrepancy can be adjusted in later steps.

We've got a couple steps we can do to fix this, we'll see these later on."

### 2.1 Setup loss function and optimizer

We've setup a loss (also called a criterion or cost function) and optimizer before in [notebook 01](https://www.learnpytorch.io/01_pytorch_workflow/#creating-a-loss-function-and-optimizer-in-pytorch).

But different problem types require different loss functions.

For example, for a regression problem (predicting a number) you might use mean absolute error (MAE) loss.

And for a binary classification problem (like ours), you'll often use [binary cross entropy](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a) as the loss function.

However, the same optimizer function can often be used across different problem spaces.

For example, the stochastic gradient descent optimizer (SGD, `torch.optim.SGD()`) can be used for a range of problems, and the same applies to the Adam optimizer (`torch.optim.Adam()`).

| Loss function/Optimizer | Problem type | PyTorch Code |
| ----- | ----- | ----- |
| Stochastic Gradient Descent (SGD) optimizer | Classification, regression, many others. | [`torch.optim.SGD()`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) |
| Adam Optimizer | Classification, regression, many others. | [`torch.optim.Adam()`](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) |
| Binary cross entropy loss | Binary classification | [`torch.nn.BCELossWithLogits`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) or [`torch.nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) |
| Cross entropy loss | Multi-class classification | [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) |
| Mean absolute error (MAE) or L1 Loss | Regression | [`torch.nn.L1Loss`](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html) |
| Mean squared error (MSE) or L2 Loss | Regression | [`torch.nn.MSELoss`](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) |  

*Table of various loss functions and optimizers, there are more but these are some common ones you'll see.*

Since we're working with a binary classification problem, let's use a binary cross entropy loss function.

> **Note:** Recall a **loss function** is what measures how *wrong* your model predictions are, the higher the loss, the worse your model.
>
> Also, PyTorch documentation often refers to loss functions as "loss criterion" or "criterion", these are all different ways of describing the same thing.

PyTorch has two binary cross entropy implementations:
1. [`torch.nn.BCELoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) - Creates a loss function that measures the binary cross entropy between the target (label) and input (features).
2. [`torch.nn.BCEWithLogitsLoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) - This is the same as above except it has a sigmoid layer ([`nn.Sigmoid`](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html)) built-in (we'll see what this means soon).

Which one should you use?

The [documentation for `torch.nn.BCEWithLogitsLoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) states that it's more numerically stable than using `torch.nn.BCELoss()` after a `nn.Sigmoid` layer.

So generally, implementation 2 is a better option. However for advanced usage, you may want to separate the combination of `nn.Sigmoid` and `torch.nn.BCELoss()` but that is beyond the scope of this notebook.

Knowing this, let's create a loss function and an optimizer.

For the optimizer we'll use `torch.optim.SGD()` to optimize the model parameters with learning rate 0.1.

> **Note:** There's a [discussion on the PyTorch forums about the use of `nn.BCELoss` vs. `nn.BCEWithLogitsLoss`](https://discuss.pytorch.org/t/bceloss-vs-bcewithlogitsloss/33586/4). It can be confusing at first but as with many things, it becomes easier with practice.

In [None]:
# Create a loss function
# loss_fn = nn.BCELoss()  # BCELoss = Binary Cross Entropy Loss without sigmoid activation built-in
loss_fn = nn.BCEWithLogitsLoss()  # BCEWithLogitsLoss = Binary Cross Entropy Loss with sigmoid activation built-in

# Create an optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),  # Specify the model parameters to optimize
                            lr=0.1)  # Set the learning rate for gradient descent


> Note: Sigmoid is a mathematical function that maps input values to a range between 0 and 1, making it ideal for binary classification tasks. It is often applied to the output layer of a model to interpret raw model outputs as probabilities. In `nn.BCEWithLogitsLoss()`, the sigmoid activation is applied internally, simplifying the workflow by combining the activation function and the loss computation into a single step.

Now let's also create an **evaluation metric**.

An evaluation metric can be used to offer another perspective on how your model is going.

If a loss function measures how *wrong* your model is, I like to think of evaluation metrics as measuring how *right* it is.

Of course, you could argue both of these are doing the same thing but evaluation metrics offer a different perspective.

After all, when evaluating your models it's good to look at things from multiple points of view.

There are several evaluation metrics that can be used for classification problems but let's start out with **accuracy**.

Accuracy can be measured by dividing the total number of correct predictions over the total number of predictions.

For example, a model that makes 99 correct predictions out of 100 will have an accuracy of 99%.

Let's write a function to do so.



In [None]:
# Calculate accuracy (a classification metric for what percentage of predictions are correct)
def accuracy_fn(y_true, y_pred):
    correct = torch.eq(y_true, y_pred).sum().item()  # torch.eq() checks element-wise equality between y_true and y_pred, sum() counts the number of correct predictions, and item() converts the result to a Python scalar
    acc = (correct / len(y_pred)) * 100  # Calculate the accuracy as a percentage
    return acc  # Return the calculated accuracy