<a href="https://colab.research.google.com/github/christophergaughan/Bioinformatics-Code/blob/main/PyTorch_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

| Hyperparameter             | Binary Classification                                                                                              | Multiclass Classification                                                                                  |
|----------------------------|-------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| **Input layer shape (in_features)** | Same as number of features (e.g. 5 for age, sex, height, weight, smoking status in heart disease prediction) | Same as binary classification                                                                             |
| **Hidden layer(s)**        | Problem specific, minimum = 1, maximum = unlimited                                                               | Same as binary classification                                                                             |
| **Neurons per hidden layer** | Problem specific, generally 10 to 512                                                                            | Same as binary classification                                                                             |
| **Output layer shape (out_features)** | 1 (one class or the other)                                                                                  | 1 per class (e.g. 3 for food, person, or dog photo)                                                       |
| **Hidden layer activation** | Usually ReLU (rectified linear unit) but can be many others                                                      | Same as binary classification                                                                             |
| **Output activation**      | Sigmoid (`torch.sigmoid` in PyTorch)                                                                             | Softmax (`torch.softmax` in PyTorch)                                                                      |
| **Loss function**          | Binary crossentropy (`torch.nn.BCELoss` in PyTorch)                                                              | Cross entropy (`torch.nn.CrossEntropyLoss` in PyTorch)                                                    |
| **Optimizer**              | SGD (stochastic gradient descent), Adam (see `torch.optim` for more options)                                    | Same as binary classification                                                                             |


Classification is a problem connecting to whether one thing is identified with another

## Make classification data and get it ready

In [None]:
import sklearn
from sklearn.datasets import make_circles

# Make 1000 circles
n_samples = 1000

# Create circles
X, y = make_circles(n_samples,
                    noise = 0.03,
                    random_state=42)


In [None]:
len(X), len(y)

In [None]:
print(f'First 5 samples of X:\n {X[:5]}')
print(f'First 5 samples of y:\n {y[:5]}')

In [None]:
y

## Clearly, we have a binary classification problem here as we have only 0's and 1's in the predictor column $(y)$

In [None]:
# Make a dataframe
import pandas as pd
circles = pd.DataFrame({'X1': X[:, 0],
                        'X2': X[:, 1],
                        'label': y})
circles.head()

In [None]:
# Visualize data
import matplotlib.pyplot as plt
plt.scatter(x=X[:, 0],
            y=X[:, 1],
            c=y,
            cmap=plt.cm.RdYlBu);

### This is a *toy dataset*: small enough to experiment with, but it gives us a platform to employ PyTorch code

**Our goal: separate the blue dots from the red dots**

In [None]:
# Check input and output shapes
X.shape, y.shape

In [None]:
# The data is in numpy arrays, we need to turn into pytorch tensors
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)

In [None]:
X[:5], y[:5]

In [None]:
print(f'Shape of X: {X.shape}')
print(f'Shape of y: {y.shape}')

In [None]:
print(f'Values for one sample of X: {X[0]} with shape: {X[0].shape}')
print(f'Values for one sample of y: {y[0]} with shape: {y[0].shape}')

## Create train and test splits

In [None]:
torch.__version__

In [None]:
X.dtype, y.dtype

In [None]:
# Split data randomly
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=42)


In [None]:
len(X_train), len(X_test), len(y_train), len(y_test)

## Build Model

1. Device agnostoc code
2. Construct a model by subclassing `nn.Module`
3. loss function and optimizer
4. Create training and test loop

In [None]:
import torch
from torch import nn

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device


1. Subclass `nn.Module`
2. Create 2 `nn.Linear()` layers capable of handling the shapes in our data
3. Define `forward()` method that outlines the forward pass
4. Instantiate an instance of our model class and sen to target `device`

In [None]:
# Subclass nn.Module
class CircleModelV0(nn.Module):
    def __init__(self):
        super().__init__()
        # # Create nn.Linear layers capable of handling the shapes of our data
        # self.layer_1 = nn.Linear(in_features=2,
        #                          out_features=5) # upscales to 5 features (hidden layers

        # self.layer_2 = nn.Linear(in_features=5,
        #                          out_features=1) # we're predicting a 0 or 1
        self.two_linear = nn.Sequential(
            nn.Linear(in_features=2,
                      out_features=5),
            nn.Linear(in_features=5,
                      out_features=1)
        )
    # define the forward pass
    def forward(self, x):
        return self.two_linear(x)
    #   return self.layer_2(self.layer_1(x)) # x-> layer_1 -> layer_2 -> output

# Instantiate instance of model class and send to target device
model_0 = CircleModelV0().to(device)
model_0


### Note in the code above:

The forward pass in the provided code may seem "backwards" because the sequence in which the operations are written in code starts with the last layer and progresses to the input layer, but this is simply a reflection of the computation flow in neural networks. Let's break it down:

#### Understanding the forward Pass
Order of Operations:

* When you call self.layer_1(x), the input x is passed through layer_1. * This produces the intermediate output of the first layer.
* The intermediate output is then passed to self.layer_2, which produces the final output.

In functional terms:
`x -> layer_1 -> layer_2 -> output
`
However, the Python code is written as:
`return self.layer_2(self.layer_1(x))
`

This is standard practice in programming because you apply the innermost function (layer 1) first and then the outermost function (layer 2).

#### Why It Feels "Backwards":

* Neural network layers are typically thought of as a forward progression from input to output.
* In the `forward` method, the "nesting" structure can feel reversed because you start with the input, apply transformations in order, but write it with the innermost function first.

#### It's Just Function Composition:

* The code uses function composition, where one function's output is the input to the next. This is conceptually similar to:
`f(g(x))
`


In [None]:
next(model_0.parameters()).device

In [None]:
# Let's replicate the model above using nn.Sequential
model_0 = nn.Sequential(
    nn.Linear(in_features=2,
              out_features=5),
    nn.Linear(in_features=5,
              out_features=1)).to(device)

model_0


In [None]:
model_0.state_dict()

In [None]:
# Make preds *rmbr to use the inference mode
with torch.inference_mode():
    untrained_preds = model_0(X_test.to(device))
print(f'Length of preds: {len(untrained_preds)}')
print(f'Shape of preds: {untrained_preds.shape}')
print(f'First 10 preds: {untrained_preds[:10]}')
print(f'First 10 y_test: {y_test[:10]}')

In [None]:
X_test[:10], y_test[:10]

### Set-up loss function and optimizer

Which loss and optimizer should we use?

- Depends on the problem
    - regression: MAE, MSE
    - Classification: binary cross entropy or categorical cross entropy

# Optimizer and Loss Functions in PyTorch

However, the same optimizer function can often be used across different problem spaces.

For example, the stochastic gradient descent optimizer (SGD, `torch.optim.SGD()`) can be used for a range of problems, and the same applies to the Adam optimizer (`torch.optim.Adam()`).

| **Loss Function/Optimizer**               | **Problem Type**                   | **PyTorch Code**                        |
|-------------------------------------------|-------------------------------------|-----------------------------------------|
| **Stochastic Gradient Descent (SGD)**     | Classification, regression, many others. | `torch.optim.SGD()`                     |
| **Adam Optimizer**                         | Classification, regression, many others. | `torch.optim.Adam()`                    |
| **Binary Cross Entropy Loss**             | Binary classification               | `torch.nn.BCELossWithLogits` or `torch.nn.BCELoss` |
| **Cross Entropy Loss**                    | Multi-class classification          | `torch.nn.CrossEntropyLoss`             |
| **Mean Absolute Error (MAE) or L1 Loss**  | Regression                          | `torch.nn.L1Loss`                       |
| **Mean Squared Error (MSE) or L2 Loss**   | Regression                          | `torch.nn.MSELoss`                      |


In [None]:
# Setup loss function

loss_fn = nn.BCEWithLogitsLoss()

# Setup optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),
                            lr=0.1)



In [None]:
model_0.state_dict()

In [None]:
# Calculate accuracy- out of 100 examples what percentage does our model get right?
def accuracy_fn(y_true, y_pred):
    correct = torch.eq(y_true, y_pred).sum().item()
    acc = (correct / len(y_pred)) * 100
    return acc

### Train Model

1. Forward pass
2. Calculate the loss
3. Optimizer zero grad
4. Loss backward (backpropagation)
5. Optimizer step (gradient descent)

Aslo we are going to perform the folowing:

`going from raw logits -> prediction probabilities -> prediction labels`

Our raw outputs from our model are logits. Convert into prediction probabilities  by passing them to some kind of activation function (e.g. sigmoid for binary classification or softmax for multiclass classificsation)

Then we convert our models prediction probabilities to **prediction labels** by either rounding them or taking `argmax()`

In [None]:
model_0

In [None]:
# View the first 5 outputs of the forward pass on the test data
model_0.eval()
with torch.inference_mode():
    y_logits = model_0(X_test.to(device))[:5]
y_logits

In [None]:
y_test[:5]

In [None]:
# Since we are performing a binary classification- use sigmoid activation function
y_probs = torch.sigmoid(y_logits)
y_probs

For our predicition probability values, we need to perform a range-style rounding on them:
* `y_pred_probs` >= 0.5 y = 1 (class 1)
* `y_pred+probs` < 0.5 y=0 (class 0)

In [None]:
# Find predicition probabilities
y_preds = torch.round(y_probs)

# In full (logits->pred_probs->pred_labels)
y_pred_labels = torch.round(torch.sigmoid(model_0(X_test.to(device))[:5]))
y_pred_labels

# Check for equality
print(torch.eq(y_preds.squeeze(), y_pred_labels.squeeze()))

# Get rid of extra dimension
y_preds.squeeze()

In [None]:
y_test[:5]

# Building a training and test loop