# Custom dataloader

### Custom DataLoader Example

Illustration of how we can efficiently iterate through custom (image) datasets. For this, suppose 
- mnist_train, mnist_valid, and mnist_test are image folders you created with your own custom images
- mnist_train.csv, mnist_valid.csv, and mnist_test.csv are tables that store the image names with their associated class labels

# 1) Inspecting the Dataset

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image

In [None]:
im = Image.open('data/mnist/mnist_train/1.png')
plt.imshow(im, cmap='binary')

In [None]:
import numpy as np

im_array = np.array(im)
print('Array Dimensions', im_array.shape)
print()
print(im_array)

In [None]:
import pandas as pd

In [None]:
df_train = pd.read_csv('data/mnist/mnist_train.csv')
print(df_train.shape)
df_train.head()

In [None]:
df_test = pd.read_csv('data/mnist/mnist_test.csv')
print(df_test.shape)
df_test.head()

# 2) Custom Dataset Class

In [None]:
import torch
from PIL import Image
from torch.utils.data import Dataset
import os



class MyDataset(Dataset):

    def __init__(self, csv_path, img_dir, transform=None):
    
        df = pd.read_csv(csv_path)
        self.img_dir = img_dir
        self.img_names = df['File Name']
        self.y = df['Class Label']
        self.transform = transform

    def __getitem__(self, index):
        img = Image.open(os.path.join(self.img_dir,
                                      self.img_names[index]))
        
        if self.transform is not None:
            img = self.transform(img)
        
        label = self.y[index]
        return img, label

    def __len__(self):
        return self.y.shape[0]

# 3) Custom Dataloader

In [None]:
from torchvision import transforms
from torch.utils.data import DataLoader


# Note that transforms.ToTensor()
# already divides pixels by 255. internally

custom_transform = transforms.Compose([#transforms.Lambda(lambda x: x/255.), # not necessary
                                       transforms.ToTensor()
                                      ])

train_dataset = MyDataset(csv_path='data/mnist/mnist_train.csv',
                          img_dir='data/mnist/mnist_train',
                          transform=custom_transform)

train_loader = DataLoader(dataset=train_dataset,
                          batch_size=32,
                          drop_last=True,
                          shuffle=True, # want to shuffle the dataset
                          num_workers=0) # number processes/CPUs to use

In [None]:
valid_dataset = MyDataset(csv_path='data/mnist/mnist_valid.csv',
                          img_dir='data/mnist/mnist_valid',
                          transform=custom_transform)

valid_loader = DataLoader(dataset=valid_dataset,
                          batch_size=100,
                          shuffle=False,
                          num_workers=0)



test_dataset = MyDataset(csv_path='data/mnist/mnist_test.csv',
                         img_dir='data/mnist/mnist_test',
                         transform=custom_transform)

test_loader = DataLoader(dataset=test_dataset,
                         batch_size=100,
                         shuffle=False,
                         num_workers=0)

## 4) Iterating Through the Dataset

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
torch.manual_seed(0)

num_epochs = 2
for epoch in range(num_epochs):

    for batch_idx, (x, y) in enumerate(train_loader):
        
        print('Epoch:', epoch+1, end='')
        print(' | Batch index:', batch_idx, end='')
        print(' | Batch size:', y.size()[0])
        
        x = x.to(device)
        y = y.to(device)

In [None]:
print(x.shape)

In [None]:
x_image_as_vector = x.view(-1, 28*28)
print(x_image_as_vector.shape)

In [None]:
x

# Now let's train a model

#### Import the libraries

In [None]:
import os
import glob
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms
from PIL import Image
from tqdm import tqdm
import matplotlib.pyplot as plt
import pandas as pd


#### Define Custom Dataset Class
This class is for creating a custom dataset. It is a subclass of torch.utils.data.Dataset. It requires a directory with images and transformations. It has two main methods: __len__ returns the number of samples in the dataset, and __getitem__ loads and returns an image and its corresponding label.

### Dataset Structure

For the `mnist` custom class to work properly, the dataset should be structured in a specific way. Below is the expected directory structure:

```
dataset_directory/
├── mnist_train/
│   ├── 0.jpg
│   ├── 1.jpg
│   └── ...
└── mnist_test/
    ├── 100.jpg
    ├── 200.jpg
    └── ...
    mnist_val/
    ├── 288.jpg
    ├── 999.jpg
    └── ...

```


In [None]:
# Custom Dataset Class
class MyDataset(Dataset):

    def __init__(self, csv_path, img_dir, transform=None):
    
        df = pd.read_csv(csv_path)
        self.img_dir = img_dir
        self.img_names = df['File Name']
        self.y = df['Class Label']
        self.transform = transform

    def __getitem__(self, index):
        img = Image.open(os.path.join(self.img_dir,
                                      self.img_names[index]))
        
        if self.transform is not None:
            img = self.transform(img)
        
        label = self.y[index]
        return img, label

    def __len__(self):
        return self.y.shape[0]

### Initialization Method

In the `__init__` method, when an instance of the class is created, it expects:

- `img_dir`: The path to the directory where images are stored.
- `transform`: Any transformations to apply to the images.

- `self.img_dir` stores the main directory where images are located.
- `self.transform` stores the transformations to be applied to each image.
- `self.img_names` stores all the file name csv file.
- `self.y` stores all the class label from csv file.

### Length Method

The `__len__` method is required by PyTorch. It returns the total number of samples in the dataset.


### Get Item Method

The `__getitem__` method is used to retrieve a single item from the dataset given an index `idx`. 
- `self.transform is not None`: If it is set to none then no transformation will be applied.


- Finally, it returns the transformed image and its label as a tuple.



#### Define CNN Model

In [None]:
# CNN Architecture
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv_layers = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.fc_layers = nn.Sequential(
            nn.Linear(in_features=64 * 7 * 7, out_features=128),
            nn.ReLU(),
            nn.Linear(in_features=128, out_features=10)
        )

    def forward(self, x):
        x = self.conv_layers(x)
        print(x.shape)
        x = x.view(-1, 64 * 7 * 7)
        x = self.fc_layers(x)
        return x

#### Initialize Dataset and Dataloaders

In [None]:
# Data Transformations
transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])


from torchvision import transforms
from torch.utils.data import DataLoader


# Note that transforms.ToTensor()
# already divides pixels by 255. internally

custom_transform = transforms.Compose([#transforms.Lambda(lambda x: x/255.), # not necessary
                                       transforms.ToTensor()
                                      ])

train_dataset = MyDataset(csv_path='data/mnist/mnist_train.csv',
                          img_dir='data/mnist/mnist_train',
                          transform=custom_transform)

train_loader = DataLoader(dataset=train_dataset,
                          batch_size=32,
                          drop_last=True,
                          shuffle=True, # want to shuffle the dataset
                          num_workers=0) # number processes/CPUs to use
valid_dataset = MyDataset(csv_path='data/mnist/mnist_valid.csv',
                          img_dir='data/mnist/mnist_valid',
                          transform=custom_transform)

valid_loader = DataLoader(dataset=valid_dataset,
                          batch_size=100,
                          shuffle=False,
                          num_workers=0)



test_dataset = MyDataset(csv_path='data/mnist/mnist_test.csv',
                         img_dir='data/mnist/mnist_test',
                         transform=custom_transform)

test_loader = DataLoader(dataset=test_dataset,
                         batch_size=100,
                         shuffle=False,
                         num_workers=0)


In [None]:
# CNN Architecture
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv_layers = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.fc_layers = nn.Sequential(
            nn.Linear(in_features=64 * 7 * 7, out_features=128),
            nn.ReLU(),
            nn.Linear(in_features=128, out_features=10)
        )

    def forward(self, x):
        x = self.conv_layers(x)
        print(x.shape)
        x = x.view(-1, 64 * 7 * 7)
        x = self.fc_layers(x)
        return x

#### Initialize the Model, Loss Function, and Optimizer

In [None]:
# Initialize the Model
model = SimpleCNN()

# Loss Function and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


#### Training Loop

In [None]:
# Training Loop
num_epochs = 10
train_losses = []
train_accuracies = []
val_accuracies = []

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0
    for images, labels in tqdm(train_loader, desc=f'Epoch {epoch + 1}'):
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        
        _, predicted = torch.max(outputs.data, 1)
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    train_losses.append(running_loss / len(train_loader))
    train_accuracy = 100 * correct_train / total_train
    train_accuracies.append(train_accuracy)
    
    print(f'Training Loss: {running_loss/len(train_loader)}, Training Accuracy: {train_accuracy}%')
    
    # Validation
    model.eval()
    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for images, labels in tqdm(valid_loader, desc='Validation'):
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_accuracy = 100 * correct_val / total_val
    val_accuracies.append(val_accuracy)
    print(f'Validation Accuracy: {val_accuracy}%')




In [None]:
# Plotting the results
plt.figure(figsize=(16, 8))

# Plotting the training loss
plt.subplot(2, 2, 1)
plt.plot(train_losses, label='Training Loss', color='blue')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training Loss')

# Plotting the training accuracy
plt.subplot(2, 2, 2)
plt.plot(train_accuracies, label='Training Accuracy', color='green')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.title('Training Accuracy')

# Plotting the validation accuracy
plt.subplot(2, 2, 4)
plt.plot(val_accuracies, label='Validation Accuracy', color='orange')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.legend()
plt.title('Validation Accuracy')

plt.tight_layout()
plt.show()

### Custom Loss:
Why do we need custom loss function?

In PyTorch, a custom loss function can be useful in several scenarios:

   * **Non-standard loss:** Sometimes, the standard loss functions provided by PyTorch may not be suitable for your specific task or problem. In such cases, you can define a custom loss function that incorporates the specific requirements of your task.

   * **Domain-specific loss:** If you are working on a problem in a specific domain, you may have domain-specific knowledge that can be leveraged to design a more effective loss function. For example, in computer vision tasks, you may want to penalize certain types of errors more heavily based on their importance in the domain.

   * **Multi-objective optimization:** In some cases, you may have multiple objectives to optimize simultaneously. In such situations, you can define a custom loss function that combines multiple objectives into a single loss value. This allows you to guide the training process by balancing the importance of different objectives.

   * **Advanced loss computations:** Custom loss functions provide flexibility in implementing complex loss computations that involve additional operations or metrics beyond simple element-wise comparisons. This can include calculations involving attention mechanisms, label smoothing, or other advanced techniques.

   * **Research and experimentation:** When developing new models or exploring innovative approaches, you may need to design novel loss functions to match your proposed architectures or algorithms. Custom loss functions enable you to experiment and evaluate your ideas effectively.

By creating a custom loss function in PyTorch, you have the freedom to tailor the loss calculation according to your specific requirements, allowing you to optimize your models more effectively for the task at hand.

### Pytorch Custom Loss function implementation

#### To write a custom loss class in PyTorch, you need to create a subclass of the torch.nn.Module class and implement the forward method. The forward method takes the model's predicted output and the target values as input and computes the loss.

Here's an explanation and implementation of a custom loss class using a CNN on the MNIST dataset in PyTorch:

#### Importing Required Libraries
This block imports the necessary libraries for the code, including PyTorch modules (torch, nn, optim), torchvision, transforms for data preprocessing, torch.nn.functional (F), matplotlib for visualization, and tqdm for the progress bar.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F
import matplotlib.pyplot as plt
from tqdm import tqdm

#### Define Custom Loss Class
This block defines a custom loss class CustomCrossEntropyLoss. The class inherits from nn.Module. In the forward method, it takes predictions and targets as input and computes the mean squared error between them. 

In [None]:
class CustomCrossEntropyLoss(nn.Module):
    def __init__(self, weight=None, reduction='mean'):
        super().__init__()

    def forward(self, predicted, target):
        # Compute the log softmax of the input
        log_softmax_input = torch.log_softmax(predicted, dim=1)
        
        # Calculate the negative log probabilities of the true classes
        neg_log_probabilities = -log_softmax_input[range(len(target)), target]
        
        # You can apply additional custom modifications to the log probabilities if needed
        
        # Calculate the mean loss (or other reduction if specified)
        loss = neg_log_probabilities.mean()
        
        return loss

In [None]:
# CNN Architecture
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv_layers = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.fc_layers = nn.Sequential(
            nn.Linear(in_features=64 * 7 * 7, out_features=128),
            nn.ReLU(),
            nn.Linear(in_features=128, out_features=10)
        )

    def forward(self, x):
        x = self.conv_layers(x)
        x = x.view(-1, 64 * 7 * 7)
        x = self.fc_layers(x)
        return x

In [None]:
# Define data transformation
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# Load the MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)

# Split the training dataset into training and validation sets
# You can adjust the split ratio as needed
total_train_samples = len(train_dataset)
train_ratio = 0.8  # 80% for training
train_size = int(train_ratio * total_train_samples)
val_size = total_train_samples - train_size

train_dataset, val_dataset = torch.utils.data.random_split(train_dataset, [train_size, val_size])

# Create data loaders for training, validation, and testing
batch_size = 64

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size)

# Checking dataset sizes
print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")
print(f"Testing dataset size: {len(test_dataset)}")


In [None]:
# Initialize the Model
model = SimpleCNN()

# Loss Function and Optimizer
#criterion = nn.CrossEntropyLoss()
criterion = CustomCrossEntropyLoss() # Using CrossEntropyLoss
optimizer = optim.Adam(model.parameters(), lr=0.001)


#### Training Loop

In [None]:
# Training parameters
num_epochs = 10
train_losses = []
train_accuracies = []
val_losses = []
val_accuracies = []

# Training Loop
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0
    
    for images, labels in tqdm(train_loader, desc=f'Epoch {epoch + 1}'):
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        
        _, predicted = torch.max(outputs.data, 1)
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    train_losses.append(running_loss / len(train_loader))
    train_accuracy = 100 * correct_train / total_train
    train_accuracies.append(train_accuracy)
    
    print(f'Training Loss: {running_loss/len(train_loader)}, Training Accuracy: {train_accuracy}%')
    
    # Validation
    model.eval()
    correct_val = 0
    total_val = 0
    running_val_loss = 0.0
    
    with torch.no_grad():
        for images, labels in tqdm(val_loader, desc='Validation'):
            outputs = model(images)
            loss = criterion(outputs, labels)
            running_val_loss += loss.item()
            
            _, predicted = torch.max(outputs.data, 1)
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_loss = running_val_loss / len(val_loader)
    val_losses.append(val_loss)
    val_accuracy = 100 * correct_val / total_val
    val_accuracies.append(val_accuracy)
    
    print(f'Validation Loss: {val_loss}, Validation Accuracy: {val_accuracy}%')


## Ensemble Loss

Ensemble loss is a technique used in machine learning to combine the predictions of multiple models, called an ensemble, to improve the overall performance and robustness. Instead of relying on a single model's predictions, an ensemble of models aggregates their outputs to make a final decision. This approach can often lead to better generalization and more accurate predictions, especially when individual models have different strengths and weaknesses.

### Applications and Benefits

1. **Improved Performance**: Ensemble methods can enhance the predictive accuracy of models, as the ensemble's combined decision tends to be more reliable and less prone to overfitting.

2. **Robustness**: Ensembles can handle noisy or uncertain data more effectively than individual models, as errors in one model are often balanced out by other models.

3. **Model Diversity**: Ensemble techniques work best when the individual models are diverse, meaning they have different architectures or are trained on different subsets of data. This diversity helps capture different patterns in the data.

4. **Reduced Risk**: By combining multiple models, the risk of relying on a single faulty or poorly trained model is mitigated, making the overall prediction more trustworthy.

5. **Ensemble Learning**: Ensemble techniques, like bagging, boosting, and stacking, are widely used in machine learning competitions and real-world applications to achieve state-of-the-art results.

### Code with Ensemble Loss (Mean Absolute Error and Mean Squared Error)

Below is the code for training a CNN model on the CIFAR-10 dataset using ensemble loss. The ensemble loss in this case will be a combination of Mean Absolute Error (MAE) and Mean Squared Error (MSE).


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# Define the ensemble loss combining MAE and MSE
class EnsembleLoss(nn.Module):
    def __init__(self, alpha=0.5, reduction='mean'):
        super(EnsembleLoss, self).__init__()
        self.alpha = alpha
        self.mae_loss = nn.L1Loss(reduction=reduction)
        self.mse_loss = nn.MSELoss(reduction=reduction)

    def forward(self, output, target):
        # Ensure both output and target tensors have the same shape (batch size)
        assert output.shape[0] == target.shape[0], "Output and target batch sizes must match."
        print(output.shape)
        mae_loss = self.mae_loss(output, target)
        mse_loss = self.mse_loss(output, target)
        ensemble_loss = self.alpha * mae_loss + (1 - self.alpha) * mse_loss
        return ensemble_loss

In [None]:
# Define the CNN model
class CNN(nn.Module):
    def __init__(self, num_classes=10):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=5, stride=1, padding=2)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(16 * 16 * 16, 120)  # Adjust the input size based on the image dimensions
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, num_classes)  # Output size matches the number of classes

    def forward(self, x):
        out = self.conv1(x)
        out = self.relu(out)
        out = self.maxpool(out)
        out = out.view(-1, 16 * 16 * 16)  # Flatten before fully connected layers
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.relu(out)
        out = self.fc3(out)
        return out

# Define the EnsembleLoss class combining MAE and MSE
class EnsembleLoss(nn.Module):
    def __init__(self, alpha=0.5):
        super(EnsembleLoss, self).__init__()
        self.alpha = alpha
        self.mae_loss = nn.L1Loss()
        self.mse_loss = nn.MSELoss()

    def forward(self, output, target):
        mae_loss = self.mae_loss(output, target)
        mse_loss = self.mse_loss(output, target)
        ensemble_loss = self.alpha * mae_loss + (1 - self.alpha) * mse_loss
        return ensemble_loss

# Set the device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set the hyperparameters
num_epochs = 10
batch_size = 100
learning_rate = 0.001

# Load and preprocess the CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Create an instance of the CNN model and move it to the device
num_classes = 10  # Number of classes in CIFAR-10
cnn = CNN(num_classes=num_classes).to(device)

# Define the ensemble loss function and optimizer
ensemble_criterion = EnsembleLoss(alpha=0.7)
optimizer = optim.Adam(cnn.parameters(), lr=learning_rate)

# Training loop
total_step = len(train_loader)
loss_list = []
accuracy_list = []
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = cnn(images)
        loss = ensemble_criterion(outputs, nn.functional.one_hot(labels, num_classes=num_classes).float())

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            # Calculate training accuracy
            _, predicted = torch.max(outputs.data, 1)
            correct = (predicted == labels).sum().item()
            accuracy = correct / labels.size(0)
            accuracy_list.append(accuracy)
            # Track the loss
            loss_list.append(loss.item())

            print(f"Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{total_step}], Loss: {loss.item()}, Accuracy: {accuracy}")

# Testing
cnn.eval()  # Switch to evaluation mode
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = cnn(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    test_accuracy = correct / total
    print(f"Accuracy on the test set: {test_accuracy}")

# Plotting
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(loss_list, label='Loss')
plt.xlabel('Iterations')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(accuracy_list, label='Accuracy', color='orange')
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.legend()

plt.show()


# Comparison: Ensemble loss vs MSE vs MAE

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# Define the CNN model
class CNN(nn.Module):
    def __init__(self, num_classes=10):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=5, stride=1, padding=2)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(16 * 16 * 16, 120)  # Adjust the input size based on the image dimensions
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, num_classes)  # Output size matches the number of classes

    def forward(self, x):
        out = self.conv1(x)
        out = self.relu(out)
        out = self.maxpool(out)
        out = out.view(-1, 16 * 16 * 16)  # Flatten before fully connected layers
        out = self.fc1(out)
        out = self.relu(out)
        out = self.fc2(out)
        out = self.relu(out)
        out = self.fc3(out)
        return out

# Define the EnsembleLoss class combining MAE and MSE
class EnsembleLoss(nn.Module):
    def __init__(self, alpha=0.5):
        super(EnsembleLoss, self).__init__()
        self.alpha = alpha
        self.mae_loss = nn.L1Loss()
        self.mse_loss = nn.MSELoss()

    def forward(self, output, target):
        mae_loss = self.mae_loss(output, target)
        mse_loss = self.mse_loss(output, target)
        ensemble_loss = self.alpha * mae_loss + (1 - self.alpha) * mse_loss
        return ensemble_loss

# Define the individual loss functions
class MAELoss(nn.Module):
    def __init__(self):
        super(MAELoss, self).__init__()
        self.loss = nn.L1Loss()

    def forward(self, output, target):
        return self.loss(output, target)

class MSELoss(nn.Module):
    def __init__(self):
        super(MSELoss, self).__init__()
        self.loss = nn.MSELoss()

    def forward(self, output, target):
        return self.loss(output, target)

# Set the device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set the hyperparameters
num_epochs = 10
batch_size = 100
learning_rate = 0.001

# Load and preprocess the CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Create an instance of the CNN model and move it to the device
num_classes = 10  # Number of classes in CIFAR-10
cnn = CNN(num_classes=num_classes).to(device)

# Define the ensemble loss function and optimizer
ensemble_criterion = EnsembleLoss(alpha=0.7)
mae_criterion = MAELoss()
mse_criterion = MSELoss()
optimizer = optim.Adam(cnn.parameters(), lr=learning_rate)

# Define loss function names in a list
loss_function_names = ["Ensemble Loss", "MSE Loss", "MAE Loss"]

# Training loop
total_step = len(train_loader)
loss_lists = [[] for _ in range(len(loss_function_names))]
accuracy_lists = [[] for _ in range(len(loss_function_names))]

for loss_idx, loss_function_name in enumerate(loss_function_names):
    cnn = CNN(num_classes=num_classes).to(device)
    optimizer = optim.Adam(cnn.parameters(), lr=learning_rate)
    criterion = None

    if loss_function_name == "Ensemble Loss":
        criterion = ensemble_criterion
    elif loss_function_name == "MSE Loss":
        criterion = mse_criterion
    elif loss_function_name == "MAE Loss":
        criterion = mae_criterion

    for epoch in range(num_epochs):
        cnn.train()  # Set the model to training mode
        for i, (images, labels) in enumerate(train_loader):
            images = images.to(device)
            labels = labels.to(device)

            # Convert labels to one-hot encoding
            labels_one_hot = nn.functional.one_hot(labels, num_classes=num_classes).float()

            # Forward pass
            outputs = cnn(images)
            loss = criterion(outputs, labels_one_hot)

            # Backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Track the losses
            loss_lists[loss_idx].append(loss.item())

            if (i + 1) % 100 == 0:
                _, predicted = torch.max(outputs.data, 1)
                correct = (predicted == labels).sum().item()
                accuracy = correct / labels.size(0)
                accuracy_lists[loss_idx].append(accuracy)

                print(f"Loss Function: {loss_function_name}, Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{total_step}], Loss: {loss.item():.4f}, Accuracy: {accuracy:.4f}")

    # Test with the current loss function
    cnn.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = cnn(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        test_accuracy = correct / total
        print(f"Accuracy on the test set with {loss_function_name}: {test_accuracy:.4f}")

# Plotting
plt.figure(figsize=(10, 5))

# Loss comparison plot
plt.subplot(1, 2, 1)
for loss_idx, loss_function_name in enumerate(loss_function_names):
    plt.plot(loss_lists[loss_idx], label=loss_function_name)
plt.xlabel('Iterations')
plt.ylabel('Loss')
plt.legend()

# Accuracy comparison plot
plt.subplot(1, 2, 2)
for loss_idx, loss_function_name in enumerate(loss_function_names):
    plt.plot(accuracy_lists[loss_idx], label=loss_function_name)
plt.xlabel('Iterations')
plt.ylabel('Accuracy')
plt.legend()

plt.show()


## Comparison of Different Loss Functions for CIFAR-10 Classification

In this experiment, we trained a Convolutional Neural Network (CNN) for the CIFAR-10 classification task using three different loss functions: Ensemble Loss, Mean Squared Error (MSE) Loss, and Mean Absolute Error (MAE) Loss. The goal was to compare their performances on the test set and understand how each loss function impacts the model's accuracy and convergence.

### Loss Functions Used:

1. **Ensemble Loss:** A custom loss function that combines MSE and MAE losses. It aims to find a balance between the two error metrics, providing a flexible approach to training the model.

2. **MSE Loss:** The traditional Mean Squared Error loss function, measuring the average squared difference between predicted probabilities and one-hot encoded target labels.

3. **MAE Loss:** The Mean Absolute Error loss function, measuring the average absolute difference between predicted probabilities and one-hot encoded target labels.

### Comparison Results:

1. **MSE Loss:** Achieved the highest accuracy on the test set, demonstrating the best overall performance for this classification task. The sensitivity to large errors due to the squared term provided a strong signal for the model to adjust its parameters effectively.

2. **Ensemble Loss:** Performed second-best among the three loss functions. While it didn't outperform MSE loss, it offered a balanced compromise between MSE and MAE, taking into account both squared and absolute errors.

3. **MAE Loss:** Achieved the lowest accuracy among the three. The MAE loss function's reduced sensitivity to large errors might lead to slower convergence or less optimal parameter adjustments.

It's essential to note that the choice of the most suitable loss function can depend on the dataset, model architecture, and specific task. In this experiment, MSE loss demonstrated superior performance, but it's always beneficial to experiment with different loss functions and hyperparameters to find the optimal combination for a given problem. Additionally, using an ensemble loss that combines various loss functions can provide greater flexibility and adaptability to different use cases, allowing fine-tuning of the model's behavior as per specific requirements.
