# 1. Training a Neural Network with PyTorch

Now that you've learned the key components of a neural network, you'll train one using a training loop. You'll explore potential issues like vanishing gradients and learn strategies to address them, such as alternative activation functions and tuning learning rate and momentum. 

## 1.1 Import Libraries

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss
import torch.optim as optim
from torch.utils.data import TensorDataset
import numpy as np
import pandas as pd

## 1.2 User Variables

In [28]:
animals = pd.read_csv("../datasets/animals.csv")
animals.head(2)

Unnamed: 0,animal_name,hair,feathers,eggs,milk,predator,legs,tail,type
0,sparrow,0,1,1,0,0,2,1,0
1,eagle,0,1,1,0,1,2,1,0


In [29]:
dataloader_df = pd.read_csv("../datasets/dataloader_1.csv")
dataloader_df.head(2)

Unnamed: 0,0,1,2,3,4
0,1.0,0.0,1.0,1.0,0.109915
1,3.0,0.0,1.0,1.0,0.36


# 2. Exercises

## 2.1 Using TensorDataset

### Description

Structuring your data into a dataset is one of the first steps in training a PyTorch neural network. ``TensorDataset`` simplifies this by converting NumPy arrays into a format PyTorch can use.

In this exercise, you'll create a ``TensorDataset`` using the preloaded ``animals`` dataset and inspect its structure.

### Instructions

* Convert ``X`` and ``y`` into tensors and create a ``TensorDataset``.
* Access and print the first sample.

In [None]:
import torch
from torch.utils.data import TensorDataset

X = animals.iloc[:, 1:-1].to_numpy()  
y = animals.iloc[:, -1].to_numpy()

# Create a dataset
dataset = TensorDataset(torch.tensor(X), torch.tensor(y))

# Print the first sample
input_sample, label_sample = dataset[0]
print('Input sample:', input_sample)
print('Label sample:', label_sample)

Input sample: tensor([0, 1, 1, 0, 0, 2, 1])
Label sample: tensor(0)


In [8]:
X

array([[0, 1, 1, 0, 0, 2, 1],
       [0, 1, 1, 0, 1, 2, 1],
       [1, 0, 0, 1, 1, 4, 1],
       [1, 0, 0, 1, 0, 4, 1],
       [0, 0, 1, 0, 1, 4, 1]])

In [9]:
y

array([0, 0, 1, 1, 2])

In [10]:
dataset[0]

(tensor([0, 1, 1, 0, 0, 2, 1]), tensor(0))

## 2.2 Using DataLoader

### Description

The ``DataLoader`` class is essential for efficiently handling large datasets. It speeds up training, optimizes memory usage, and stabilizes gradient updates, making deep learning models more effective.

Now, you'll create a PyTorch ``DataLoader`` using the ``dataset`` from the previous exercise and see it in action.

### Instructions

* Import the required module.
* Create a ``DataLoader`` using ``dataset``, setting a batch size of two and enabling shuffling.
* Iterate through the ``DataLoader`` and print each batch of inputs and labels.

In [12]:
from torch.utils.data import DataLoader

# Create a DataLoader
dataloader = DataLoader(
                dataset,
                batch_size = 2,
                shuffle=True
            )

# Iterate over the dataloader
for batch_inputs, batch_labels in dataloader:
    print('batch_inputs:', batch_inputs)
    print('batch_labels:', batch_labels)

batch_inputs: tensor([[1, 0, 0, 1, 1, 4, 1],
        [0, 0, 1, 0, 1, 4, 1]])
batch_labels: tensor([1, 2])
batch_inputs: tensor([[0, 1, 1, 0, 0, 2, 1],
        [0, 1, 1, 0, 1, 2, 1]])
batch_labels: tensor([0, 0])
batch_inputs: tensor([[1, 0, 0, 1, 0, 4, 1]])
batch_labels: tensor([1])


## 2.3 Using the MSELoss

### Description

For regression problems, you often use Mean Squared Error (MSE) as a loss function instead of cross-entropy. MSE calculates the squared difference between predicted values (``y_pred``) and actual values (``y``). Now, you'll compute MSE loss using both NumPy and PyTorch.

``torch``, ``numpy`` (as ``np``), and ``torch.nn`` (as ``nn``) packages are already imported.

### Instructions

* Calculate the MSE loss using NumPy.
* Create an MSE loss function using PyTorch.
* Convert ``y_pred`` and ``y`` to tensors, then calculate the MSE loss as ``mse_pytorch``.

In [15]:
y_pred = np.array([3, 5.0, 2.5, 7.0])  
y = np.array([3.0, 4.5, 2.0, 8.0])     

# Calculate MSE using NumPy
mse_numpy = np.mean((y_pred - y)**2)

# Create the MSELoss function in PyTorch
criterion = nn.MSELoss()

# Calculate MSE using PyTorch
mse_pytorch = criterion(torch.tensor(y_pred), torch.tensor(y))

print("MSE (NumPy):", mse_numpy)
print("MSE (PyTorch):", mse_pytorch)

MSE (NumPy): 0.375
MSE (PyTorch): tensor(0.3750, dtype=torch.float64)


### Practice: Manual functions

In [14]:
# MSE
def mean_squared_loss(prediction, target):
    import numpy as np
    return np.mean((prediction - target)**2)

## 2.4 Writing a training loop

### Description

In ``scikit-learn``, the training loop is wrapped in the ``.fit()`` method, while in PyTorch, it's set up manually. While this adds flexibility, it requires a custom implementation.

In this exercise, you'll create a loop to train a model for salary prediction.

The ``show_results()`` function is provided to help you visualize some sample predictions.

The package imports provided are: pandas as ``pd``, ``torch``, ``torch.nn`` as ``nn``, ``torch.optim`` as ``optim``, as well as ``DataLoader`` and ``TensorDataset`` from ``torch.utils.data``.

The following variables have been created: ``num_epochs``, containing the number of epochs (set to 5); ``dataloader``, containing the dataloader; ``model``, containing the neural network; ``criterion``, containing the loss function, ``nn.MSELoss()``; ``optimizer``, containing the SGD optimizer.

### Instructions

* Write a for loop that iterates over the ``dataloader``; this should be nested within a for loop that iterates over a range equal to the number of epochs.
* Set the gradients of the optimizer to zero.
* Compute the loss using the ``criterion()`` function and the gradients.
* Update the model's parameters.

In [16]:
def show_results(model, dataloader):
    model.eval()
    iter_loader = iter(dataloader)
    for _ in range(3):
        feature, target = next(iter_loader)
        preds = model(feature)
    
        for p, t in zip(preds, target):
            print(f'Ground truth salary: {t.item():.3f}. Predicted salary: {p.item():.3f}.')

In [20]:
# Variables

num_epochs = 5

dataloader = DataLoader(
                dataset,
                batch_size = 2,
                shuffle=True
            )

model = nn.Sequential(
  nn.Linear(4, 2),
  nn.Sigmoid(),
  nn.Linear(2, 1)
)

optimizer = optim.SGD(model.parameters(), lr=0.001)

In [26]:
def create_datarows_from_dataloader(dataloader):
    # Prepare a list to hold converted data rows
    data_rows = []

    # Iterate over batches in the DataLoader
    for batch in dataloader:
        # Each element in batch could be a tensor; convert components to numpy arrays
        # If batch is a tensor, convert entire batch to numpy arrays row-wise
        # If batch is a tuple (e.g. inputs, labels), convert each separately
        if isinstance(batch, torch.Tensor):
            # Convert tensor batch to numpy and iterate rows
            for row in batch.numpy():
                data_rows.append(row)
        elif isinstance(batch, (list, tuple)):
            # Convert each tensor component in the tuple to numpy arrays
            # Combine them row-wise as needed (example for inputs and labels)
            inputs, labels = batch
            inputs_np = inputs.numpy()
            labels_np = labels.numpy()
            for i in range(len(inputs_np)):
                row = list(inputs_np[i]) + list(labels_np[i])  # concatenate as a row
                data_rows.append(row)

    return data_rows

In [44]:
X = dataloader_df.iloc[:, :-1].to_numpy()  
y = dataloader_df.iloc[:, -1].to_numpy()

# Create a dataset
dataset = TensorDataset(torch.from_numpy(X).float(), torch.from_numpy(y).float()) # To ensure that its float32 (Tensor Float), not float64 (Tensor Double), else it will cause a mismatch in model(feature) line
# Create a DataLoader
dataloader = DataLoader(dataset, batch_size = 4, shuffle=True)

In [46]:
# Loop over the number of epochs and the dataloader
for i in range(num_epochs):
  for data in dataloader:
    # Set the gradients to zero
    optimizer.zero_grad()
    # Run a forward pass
    feature, target = data
    prediction = model(feature)    
    # Compute the loss
    loss = criterion(prediction, target)    
    # Compute the gradients
    loss.backward()
    # Update the model's parameters
    optimizer.step()
show_results(model, dataloader)

  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)


Ground truth salary: 0.133. Predicted salary: 0.084.
Ground truth salary: 0.167. Predicted salary: 0.153.
Ground truth salary: 0.242. Predicted salary: 0.153.
Ground truth salary: 0.165. Predicted salary: 0.111.
Ground truth salary: 0.133. Predicted salary: 0.153.
Ground truth salary: 0.128. Predicted salary: 0.122.
Ground truth salary: 0.150. Predicted salary: 0.109.
Ground truth salary: 0.197. Predicted salary: 0.186.
Ground truth salary: 0.182. Predicted salary: 0.153.
Ground truth salary: 0.228. Predicted salary: 0.153.
Ground truth salary: 0.112. Predicted salary: 0.071.
Ground truth salary: 0.060. Predicted salary: 0.092.


### Notes

* Use `dataloader.__dict__` to understand the parameters like batch size used in DataCamp

## 2.5 Implementing ReLU

### Description

The Rectified Linear Unit (ReLU) is a widely-used activation function in deep learning, solving challenges like the vanishing gradients problem.

In this exercise, you'll implement ReLU in PyTorch, apply it to both positive and negative values, and observe the results.

``torch.nn`` package has already been imported for you as nn.

### Instructions

* Create a ReLU function in PyTorch.
* Apply the ReLU function to both ``x_pos`` and ``x_neg``.