# Negative Log Likelihood (NLL) Loss

In this tutorial, we'll explore the Negative Log Likelihood (NLL) Loss function in PyTorch. We'll cover the following key concepts:

1. Explanation of NLL Loss
2. Contextualizing the use of NLL Loss
3. PyTorch implementation of NLL Loss
4. Evaluating and interpreting results
5. Practical applications
6. Next steps


## Explanation of NLL Loss

Negative Log Likelihood (NLL) Loss is a common loss function used in classification problems, especially when dealing with multi-class classification. It measures the dissimilarity between the predicted probabilities and the true class labels. Lower NLL Loss indicates that the model's predicted probabilities are closer to the true labels.

In multi-class classification, we typically use the Softmax function to convert raw logits into probabilities. The NLL Loss then computes the negative log likelihood of the true class label given the predicted probabilities. The goal is to minimize this loss during training, leading the model to assign higher probabilities to the correct class.

## Contextualizing the use of NLL Loss

Let's consider a multi-class classification problem where we want to categorize images of handwritten digits (0-9) using the popular MNIST dataset. We'll use PyTorch to create a simple neural network, and employ NLL Loss as our loss function.

Note that we'll use the `torch.nn.LogSoftmax` activation function in the final layer of our model, which combines the Softmax function and the logarithm. This is a more numerically stable approach than using Softmax followed by NLL Loss separately.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# Define the neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return nn.functional.log_softmax(x, dim=1)

model = SimpleNet()
print(model)

## PyTorch implementation of NLL Loss

Now that we've defined our neural network, let's set up the NLL Loss function, optimizer, and data loaders for the MNIST dataset.

In [None]:
# Set up NLL Loss, optimizer, and data loaders
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(datasets.MNIST('data', train=True, download=True, transform=transform), batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(datasets.MNIST('data', train=False, transform=transform), batch_size=1000, shuffle=False)

## Training Process

With everything set up, we'll train our model for a few epochs and observe the training loss.

In [None]:
def train(model, device, train_loader, loss_function, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = loss_function(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.item()))

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

for epoch in range(1, 6):
    train(model, device, train_loader, loss_function, optimizer, epoch)

## Evaluation and Interpretation

Now that our model is trained, let's evaluate it on the test dataset and calculate the test loss and accuracy.

In [None]:
def test(model, device, test_loader, loss_function):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += loss_function(output, target).item()  # Sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # Get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)
'.format(test_loss, correct, len(test_loader.dataset), accuracy))

test(model, device, test_loader, loss_function)

## Practical Application

NLL Loss is often used in multi-class classification problems where the output is a probability distribution over classes. Examples include:

- Handwriting recognition, as shown in this tutorial
- Natural Language Processing tasks like text classification, part-of-speech tagging, or named entity recognition
- Image classification in computer vision

When using NLL Loss, it's important to ensure that the output layer of your model produces log probabilities, either by using `LogSoftmax` activation or by applying the logarithm after Softmax.

## Next Steps

Now that you have a better understanding of the Negative Log Likelihood (NLL) Loss function in PyTorch, you can explore other loss functions that might better suit your specific problem domain. Some other common loss functions include:

- Cross-Entropy Loss for binary or multi-class classification
- Mean Squared Error (MSE) Loss for regression tasks
- L1 Loss and Smooth L1 Loss for robust regression

Additionally, you can experiment with different model architectures, optimizers, and hyperparameters to improve the performance of your model.