<center>
<table>
  <tr>
    <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Course Series</font></h1>
</center>

---

<center>
    <h1><font color="red">Logistic Regression Classifier Model with PyTorch</font></h1>
</center>

<h4>
This presentation was adapted from the materials 
    (created by Sebastian Raschka) available at:

<p>

<center>
<a href="https://github.com/rasbt/pycon2024">PyCon US 2024: The Fundamentals of Modern Deep Learning with PyTorch</a>
</center>
</h4>

# <font color="red">Objectives</font>

In this presentation, we:

- Introduce the basic concept of PyTorch
- Use a simple classification dataset to:
   - Build a PyTorch model
   - Train the model
   - Evaluate the model

We show the steps for building a Machine Learning (ML) model with PyTorch. The functions presented here can be used as reference for other ML applications.

# <font color="red">References</font>

- [PyTorch](https://pytorch.org/)
- [What is PyTotch?](https://www.nvidia.com/en-us/glossary/pytorch/) from NVIDIA.
- [What is PyTorch](https://www.ibm.com/think/topics/pytorch) by IBM
- [Efficiently Building PyTorch Models: A Step-by-Step Guide](https://myscale.com/blog/efficient-pytorch-model-building-step-by-step-guide/) from myscale.com
- [The Good and Bad of PyTorch Machine Learning Library](https://www.altexsoft.com/blog/pytorch-library/) from altexsoft.com

# <font color="red">What is PyTorch?</font>

- Open-source deep learning framework.
- Povide a flexible and efficient platform for building and training neural networks.
   - It has a dynamic computational graph that allows users to modify the architecture during runtime, making debugging and experimentation easier.
- Written in Python and integrated with popular Python libraries like NumPy (for scientific computing), SciPy, and Cython (for compiling Python to C for better performance). 
- Support CPU, GPU, and parallel processing, as well as distributed training.
   - PyTorch’s intuitive API and support for GPU acceleration make it ideal for building efficient feedforward networks, particularly in tasks such as image classification and digit recognition.
- Excellent tool to learn and use for creating machnine learning models. 

![fig_pytorch](https://www.nvidia.com/content/dam/en-zz/Solutions/glossary/data-science/pytorch/img-1.png)
Image reference: [https://pytorch.org/features/](https://pytorch.org/features/)

## <font color="blue">How PyTorch works</font>

The core components of PyTorch are:

- __Tensors__
   - A core PyTorch data type, similar to a multidimensional array, used to store and manipulate the inputs and outputs of a model, as well as the model’s parameters.
   - They are similar to NumPy’s ndarrays, except that tensors can run on GPUs to accelerate computing.
- __Graphs__
   - Graphs are data structures consisting of connected nodes (called vertices) and edges.
   - Neural Networks are represented as a graph structure of computations. They transform input data by applying a collection of nested functions to input parameters.
   - The goal of deep learning is to optimize these parameters (weights and biases) by computing their partial derivatives (gradients) with respect to a loss metric.

## <font color="blue">PyTorch main modules</font>

- PyTorch uses modules as the building blocks of deep learning models, which allows for the quick and straightforward construction of neural networks without the tedious work of manually coding each algorithm.
- There are three primary classes of modules used to build and optimize deep learning models in PyTorch:
   - __nn modules__ are deployed as the layers of a neural network.
      - The `torch.nn` package contains a large library of modules that perform common operations like convolutions, pooling and regression.
   - The __autograd module__ provides a simple way to automatically compute gradients, used to optimize model parameters via gradient descent, for any function operated within a neural network.
   - __Optim modules__ apply optimization algorithms to those gradients.
      - The `torch.optim` package provides modules for various optimization methods, like stochastic gradient descent (SGD) or root mean square propagation (RMSprop), to suit specific optimization needs.

## <font color="blue">Basics of PyTorch model</font>

A PyTorch model is constructed using two fundamental components that play crucial roles in defining and executing the neural network: 

- __init()__ Method:
   - Serves as the constructor function for your model.
   - Where you define all the layers that will be used in your neural network architecture. That is need to establish the structure of the model and initialize parameters such as weights and biases. 
- __forward()__ Method:
   - Defines the actual computation that takes place when input data passes through each layer of the model.

## <font color="blue">Steps for building a PyTorch model workflow</font>

Implementating the following steps into building your first PyTorch model sets a solid foundation for creating sophisticated neural networks tailored to diverse applications.

1. Define your model architecture
   - Set the numbers of nodes in the input and output layers. The two parameters are the sizes of the features and the labels.
   - Determine the number of layers and the function (linear, convolutional, or recurrent) associated to each layers.
2. Data preprocessing to prepare the data
   - Splitting into training and validation sets
   - Normalizing the data (if needed)
   - Creating data loaders
3. Train your model
   - Define the loss function
   - Define the optimizer
   - Write a loop to:
      - Feed batches of data through your model
      - Compute loss functions
      - Optimize parameters with backpropagation
      - Monitor performance metrics iteratively.
4. Evaluate the model performance
   - Test the model using unseen data and evaluate the performance metrics

![fig_workflow](https://www.scaler.com/topics/images/this-detailed-resource.webp)

# <font color="red"> Python packages used</font>

- __Matplotlib__: Create visualization.
- __Pandas__: Data (two-dimensional labelled array) manipulation and analysis.
- __Scikit-Learn__:  Provide supervised and unsupervised Machine Learning algorithms.
- __PyTorch__: Used to to build, train, and evaluate a deep machine learning algorithm based on Neural Networks.

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import matplotlib.pyplot as plt

In [None]:
import numpy as np

In [None]:
import pandas as pd
import seaborn as sns

In [None]:
from sklearn.model_selection import train_test_split
#from sklearn import metrics
from sklearn.metrics import r2_score

In [None]:
import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

# <font color="red">Loading the dataset</font>

## <font color="blue">Description of the data</font>

- We have a dataset which features are points on a plane and the label has two values (classes).
   - Each point is assigned a class (`0` or `1`).
- We want to build a Machine Learning model to be able to predict the classes given a set of points.
- We will use __logistic regression__ that is a statistical method for predicting binary classes.
   - It is a special case of linear regression where the target variable is categorical in nature.
   - It is one of the most simple and commonly used Machine Learning algorithms for two-class classification.
   - The outcome or the target variable has only two possible classes.
   - It predicts the probability of occurrence of a binary event utilizing a logit function. 

## <font color="blue">Read the data</font>

In [None]:
file_name = "classifier_dataset.csv"

In [None]:
df = pd.read_csv(file_name, sep="\s+")
df

##  <font color="blue"> Splitting the data into training and testing sets</font>
- We split the data into training and testing sets. 
- We train the model with 70% of the samples and test with the remaining 30%. 

__Extract the train and test datasets as NumPy arrays__

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df[["x1", "x2"]].values, 
                                                    df["label"].values, 
                                                    test_size=0.3, 
                                                    random_state=42)

In [None]:
#X_train = df[["x1", "x2"]].values
#y_train = df["label"].values

In [None]:
X_train

In [None]:
X_train.shape

In [None]:
y_train

In [None]:
y_train.shape

In [None]:
np.bincount(y_train)

## <font color="blue">Visualize the data</font>

__Scatterplot of $x_1$ against $x_2$__

In [None]:
plt.scatter(X_train[:,0], X_train[:,1])
plt.xlabel(r"Feature $x_1$", fontsize=10)
plt.ylabel(r"Feature $x_2$", fontsize=10)

__Scatterplot with the two classes: `y=0` and `y=1`__

In [None]:
def plot_classes(X: np.array, y: np.array, boundary: tuple=None) -> None:

    plt.plot(X[y==0, 0], X[y==0, 1],
        marker="D", markersize=10,
        linestyle="", label="Class 0",
    )

    plt.plot(X[y==1, 0], X[y==1, 1],
        marker="^", markersize=13,
        linestyle="", label="Class 1",
    )

    if boundary:
        plt.plot([boundary[0], boundary[1]], [boundary[2], boundary[3]], color="red")
    plt.legend(loc='best')
    plt.xlim([-5.5, 5.5])
    plt.ylim([-5.5, 5.5])

    plt.xlabel(r"Feature $x_1$", fontsize=12)
    plt.ylabel(r"Feature $x_2$", fontsize=12)

    plt.grid()

In [None]:
plot_classes(X_train, y_train)

## <font color="blue">Normailized the Data</font> <a class="anchor" id="sec_tf_norm"></a>

- In general, variables may not be a similar scale. High values would gain more importance in any distance-based calculations. 
- It is good practice to normalize features that use different scales and ranges. 
- Although the model might converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

In [None]:
X_train

In [None]:
train_mean = X_train.mean(axis=0)
train_std = X_train.std(axis=0)

In [None]:
train_mean

In [None]:
train_std

__Normalization of the train features__

In [None]:
X_train = (X_train - train_mean) / train_std

In [None]:
X_train

In [None]:
plot_classes(X_train, y_train)

__Normalization of the test features__

In [None]:
X_test = (X_test - train_mean) / train_std

# <font color="red">Creating the ML model</font>

## <font color="blue">Set the hyperparameters</font>

It is a good practice to declare the following parameters before creating the model for ease of change and understanding.

__Dataset parameters__

These parameters are defines by the dataset used:

- number of features
- number of classes to predict

In [None]:
input_size = 2
num_classes = 2

__Model parameters__

- batch size
- number of epochs
- learning rate (optimizer steps)

In [None]:
batch_size = 4
num_epochs = 20
learning_rate = 0.5

## <font color="blue">Building the PyTorch model</font>

__Class to create a simple model with one linear layer.__

- We define a neural network by subclassing `nn.Module`, and initialize the neural network layers in `__init__`.
- Every `nn.Module` subclass implements the operations on input data in the `forward` method.
   - The `__init()__`  method defines the layers and other components of a model.
   - The `forward()` method is where the computation gets done.
- The input layer has `num_features` nodes and the output layer `num_classes` nodes.
- The most basic type of neural network layer is a linear or fully connected layer.
   - This is a layer where every input influences every output of the layer to a degree specified by the layer’s weights.
   - If a model has `m` inputs and `n` outputs, the weights will be an `m x n` matrix.
- One of the most common places you will see linear layers is in classifier models, which will usually have one or more linear layers at the end, where the last layer will have `n` outputs, where `n` is the number of classes the classifier addresses.

In [None]:
class LogisticRegression(torch.nn.Module):

    def __init__(self, num_features, num_classes):
        super().__init__()
        self.linear1 = torch.nn.Linear(num_features, num_classes)

    def forward(self, x):
        logits = self.linear1(x)
        return logits

Note that we do not have any activation function here because there is only one layer:
- Activation functions make deep learning possible.
   - Inserting non-linear activation functions between layers is what allows a deep learning model to simulate any function, rather than just linear ones.
- The model defined above can be seen as a single matrix multiplication.

__Create the model__

In [None]:
torch.manual_seed(1)

model = LogisticRegression(num_features=input_size, num_classes=num_classes)

__Print model information__

In [None]:
print('\t Model information: \n')
print(model)

In [None]:
print('\t Layer information: \n')
print(model.linear1)

In [None]:
print('\t Model parameters: \n')
for param in model.parameters():
    print(param)

__Basic testing of the model wityh arbitrary data__

In [None]:
x = torch.tensor([[1.1, 2.1],
                  [1.1, 2.1],
                  [9.1, 4.1]])

In [None]:
with torch.no_grad():
    logits = model(x)
    probas = F.softmax(logits, dim=1)

In [None]:
print(probas)

## <font color="blue"> Defining a DataLoader</font>

- We pass the dataset to our dataloader, and our `batch_size` hyperparameter as initialization arguments.
- This creates an iterable data loader, so we can easily iterate over each batch using a loop.

In [None]:
class MyDataset(Dataset):
    def __init__(self, X, y):

        self.features = torch.tensor(X, dtype=torch.float32)
        self.labels = torch.tensor(y, dtype=torch.int64)

    def __getitem__(self, index):
        x = self.features[index]
        y = self.labels[index]
        return x, y

    def __len__(self):
        return self.labels.shape[0]

In [None]:
def instantiate_data(Xdata: np.array, 
                     ydata: np.array, 
                     batch_size: int, 
                     shuffle: bool=False) -> DataLoader:
    """
    Take the NumPy arrays for the features and labels to
    create a PyTorch DataLoader object. It also subdivide
    the arrays into groups of size batch_size. 
    If shuffle is set to True (for the training set only),
    the data will be shuffled. It allows for stable training 
    and faster convergence of our model parameters.
    """
    dataset = MyDataset(Xdata, ydata)
    dataloader = DataLoader(dataset=dataset, 
                            batch_size=batch_size, 
                            shuffle=shuffle)
    return dataloader

In [None]:
train_loader = instantiate_data(X_train, y_train, batch_size, shuffle=True)

In [None]:
X_train.shape

In [None]:
test_loader = instantiate_data(X_test, y_test, batch_size)

## <font color="blue">The training loop</font>

__Define the loss function__

- We use the Cross-Entropy Loss that is primarily used for multi-label classification models.
- It first applies softmax to the predictions and calculates the given target labels and predicted values.

In [None]:
loss_function = nn.CrossEntropyLoss()

__Define the optimizer__

- We use the SGD optimizer that implements the stochastic gradient descent method.

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

__Feed train data into the model__

In [None]:
for epoch in range(num_epochs):

    model = model.train()
    for batch_idx, (features, class_labels) in enumerate(train_loader):
        # Predict outputs
        outputs = model(features)

        # Compute the loss function
        loss = loss_function(outputs, class_labels)

        # Reset and calculate gradients
        optimizer.zero_grad()
        # Back propagation
        loss.backward()

        # Update model parameters
        optimizer.step()

        ### LOGGING
        print(f'Epoch: {epoch+1:03d}/{num_epochs:03d}'
               f' | Batch {batch_idx:03d}/{len(train_loader):03d}'
               f' | Loss: {loss:.2f}')


## <font color="blue">Evaluating the results</font>

In [None]:
def compute_accuracy(model, dataloader):
    """
    Compute the percentage of correct classification.
    """

    model = model.eval()

    correct = 0.0
    total_examples = 0

    for idx, (features, class_labels) in enumerate(dataloader):

        with torch.no_grad():
            logits = model(features)

        pred = torch.argmax(logits, dim=1)

        compare = class_labels == pred
        correct += torch.sum(compare)
        total_examples += len(compare)

    return correct / total_examples

__Evaluation on the train dataset__

In [None]:
train_acc = compute_accuracy(model, train_loader)

In [None]:
print(f"Train accuracy: {train_acc*100}%")

__Evaluation on the test dataset__

In [None]:
test_acc = compute_accuracy(model, test_loader)

In [None]:
print(f"Test accuracy: {test_acc*100}%")

## <font color="blue">Visualize the decision boundary</font>

In [None]:
def plot_boundary(model):

    w1 = model.linear1.weight[0][0].detach()
    w2 = model.linear1.weight[0][1].detach()
    b = model.linear1.bias[0].detach()

    x1_min = -20
    x2_min = (-(w1 * x1_min) - b) / w2

    x1_max = 20
    x2_max = (-(w1 * x1_max) - b) / w2

    return x1_min, x1_max, x2_min, x2_max

In [None]:
boundary = plot_boundary(model)
boundary

__Classification on the train dataset__

In [None]:
plot_classes(X_train, y_train, boundary=boundary)

__Classification on the test dataset__

In [None]:
plot_classes(X_test, y_test, boundary=boundary)