# Logits and softmax


The logits variable contains the raw output scores from the model's final layer before applying any activation function. It represents the unnormalized predictions or confidence levels for each class. 

To obtain probabilities, we apply the softmax activation function to the logits using torch.softmax. The softmax function normalizes the logits and produces probabilities that sum up to 1. The resulting probabilities are stored in the probabilities variable.

In [1]:
import torch
import torch.nn as nn

# Define a small neural network
class SmallNet(nn.Module):
    def __init__(self):
        super(SmallNet, self).__init__()
        self.fc1 = nn.Linear(10, 5)  # Fully connected layer 1
        self.fc2 = nn.Linear(5, 3)   # Fully connected layer 2

    def forward(self, x):
        x = torch.relu(self.fc1(x))   # Apply ReLU activation to layer 1
        logits = self.fc2(x)          # Output logits from layer 2
        return logits

# Create an instance of the network
model = SmallNet()

# Generate a random input tensor of size (batch_size, input_dim)
input_tensor = torch.randn(2, 10)

# Forward pass through the network
logits = model(input_tensor)

# Apply softmax to obtain probabilities
probabilities = torch.softmax(logits, dim=1)

print("Logits:")
print(logits)
print("Probabilities:")
print(probabilities)


Logits:
tensor([[-0.4022, -0.3693,  0.2588],
        [-0.5446, -0.5214,  0.1516]], grad_fn=<AddmmBackward0>)
Probabilities:
tensor([[0.2519, 0.2603, 0.4878],
        [0.2482, 0.2540, 0.4978]], grad_fn=<SoftmaxBackward0>)


# torch.nn.CrossEntropyLoss

In PyTorch, the torch.nn.CrossEntropyLoss function computes the cross-entropy loss, which is commonly used in classification tasks. It combines the softmax activation and the negative log-likelihood loss into a single function. Here's an explanation of what goes into the loss calculation:

Input: The input to the CrossEntropyLoss function consists of two main components: the predicted logits and the target labels.

Predicted Logits: These are the raw output scores from the model's final layer before applying the softmax activation. The logits represent the model's predicted scores for each class in a multi-class classification problem.

Target Labels: These are the ground truth labels for the corresponding inputs. The target labels are represented as integers, where each integer corresponds to a specific class.

Softmax Activation: Before computing the loss, the predicted logits are passed through the softmax activation function. The softmax function normalizes the logits and converts them into probabilities. This allows us to interpret the outputs as the model's predicted class probabilities.

Negative Log-Likelihood Loss: The cross-entropy loss is calculated based on the predicted probabilities and the target labels. It measures the dissimilarity between the predicted probabilities and the target labels. The loss value is higher when the predicted probabilities diverge from the target labels.

Loss Calculation: The CrossEntropyLoss function in PyTorch combines the softmax activation and negative log-likelihood loss into a single step. It internally applies the softmax activation to the logits and computes the negative log-likelihood loss.

The loss value is computed as the average of the per-instance losses. It can be interpreted as the average dissimilarity between the predicted class probabilities and the true class labels over the entire batch of input examples.

The CrossEntropyLoss function automatically performs the softmax activation and the negative log-likelihood loss calculation, providing a convenient way to compute the loss for classification tasks.

Here's an example of using CrossEntropyLoss in PyTorch:

In [2]:
import torch
import torch.nn as nn

# Example inputs
logits = torch.tensor([[1.2, 0.5, -1.0], [0.3, 1.8, -0.5]])
labels = torch.tensor([0, 1])  # Corresponding target labels

# Define the loss function
criterion = nn.CrossEntropyLoss()

# Compute the loss
loss = criterion(logits, labels)

print("Loss:", loss.item())


Loss: 0.3774033188819885


Lets see things in more details 


In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Example input and target labels
input_logits = torch.tensor([[1.2, -0.5, 0.8, 2.1]])
target_labels = torch.tensor([2])  # Target label for the input example 

# Compute softmax probabilities using torch.nn.functional.softmax
probabilities = F.softmax(input_logits, dim=1)
print("Probabilities (Softmax):", probabilities)

# Compute log probabilities using torch.log
log_probabilities = torch.log(probabilities)
print("Log Probabilities:", log_probabilities)

# Retrieve the log probability for the target label
target_log_probability = log_probabilities[0, target_labels]
print("Target Log Probability:", target_log_probability)

# Compute the negative log-likelihood loss using torch.neg and torch.mean
loss = -torch.mean(target_log_probability)
print("Loss (Negative Log-Likelihood):", loss.item())

# Compute the loss using torch.nn.CrossEntropyLoss directly
criterion = nn.CrossEntropyLoss()
loss_direct = criterion(input_logits, target_labels)
print("Loss (torch.nn.CrossEntropyLoss):", loss_direct.item())


Probabilities (Softmax): tensor([[0.2319, 0.0424, 0.1554, 0.5703]])
Log Probabilities: tensor([[-1.4615, -3.1615, -1.8615, -0.5615]])
Target Log Probability: tensor([-1.8615])
Loss (Negative Log-Likelihood): 1.8615424633026123
Loss (torch.nn.CrossEntropyLoss): 1.8615424633026123


# Using NLL 
Yet another way to compute the same thing:


In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Example input and target labels
input_logits = torch.tensor([[1.2, -0.5, 0.8, 2.1]])
target_labels = torch.tensor([2])  # Target label for the input example is "bird"

# Compute log probabilities using torch.nn.functional.log_softmax
log_probabilities = F.log_softmax(input_logits, dim=1)
print("Log Probabilities (Log Softmax):", log_probabilities)

# Retrieve the log probability for the target label
target_log_probability = log_probabilities[0, target_labels]
print("Target Log Probability:", target_log_probability)

# Compute the negative log-likelihood loss using torch.nn.NLLLoss
criterion = nn.NLLLoss()
loss = criterion(log_probabilities, target_labels)
print("Loss (torch.nn.NLLLoss):", loss.item())


Log Probabilities (Log Softmax): tensor([[-1.4615, -3.1615, -1.8615, -0.5615]])
Target Log Probability: tensor([-1.8615])
Loss (torch.nn.NLLLoss): 1.8615424633026123
