### nn.CrossEntropyLoss

**Summary:**

`nn.CrossEntropyLoss` is a PyTorch loss function commonly used in classification problems. It combines `nn.LogSoftmax` and `nn.NLLLoss` (negative log-likelihood loss) in one single class. This loss function is used when training a classification model that outputs raw, unnormalized scores for each class.

**Detailed Explanation:**

#### 1. **Purpose and Usage**

`nn.CrossEntropyLoss` is designed to measure the performance of a classification model whose output is a probability distribution across classes. It is suitable for multi-class classification problems where the model output is a vector of raw scores (also called logits) for each class.

The loss function computes the cross-entropy between the true labels and the predicted probabilities. Cross-entropy measures the difference between two probability distributions - the true distribution (one-hot encoded vector of true labels) and the predicted distribution (output probabilities from the model).

#### 2. **Mathematical Formulation**

The cross-entropy loss for a single instance can be expressed as:

$$
\text{Loss} = -\sum_{c=1}^{C} y_{c} \log(\hat{p}_{c})
$$

Where:
- \( C \) is the number of classes.
- \( y_{c} \) is a binary indicator (0 or 1) if class label \( c \) is the correct classification for the current observation.
- \( \hat{p}_{c} \) is the predicted probability for class \( c \).

In PyTorch, the function internally applies `nn.LogSoftmax` to the logits to get log-probabilities and then applies `nn.NLLLoss` to compute the negative log likelihood.

#### 3. **Implementation in PyTorch**

Here is an example of how to use `nn.CrossEntropyLoss` in a simple neural network:




In [2]:
import torch
import torch.nn as nn
import torch.optim as optim

# Sample data
inputs = torch.randn(10, 5)  # Batch size of 10, feature size of 5
targets = torch.randint(0, 3, (10,))  # Random target labels (3 classes)

# Define a simple model
model = nn.Sequential(
    nn.Linear(5, 10),
    nn.ReLU(),
    nn.Linear(10, 3)
)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Forward pass
outputs = model(inputs)

# Compute loss
loss = criterion(outputs, targets)

# Backward pass and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()

print(f'Loss: {loss.item()}')

Loss: 1.1310322284698486


#### 4. **Real-World Example**

In a real-world scenario, consider a neural network designed to classify images of handwritten digits (0-9) from the MNIST dataset. Here, `nn.CrossEntropyLoss` would be ideal since the task involves multi-class classification, and the model needs to output a probability distribution over 10 classes (one for each digit).

By using `nn.CrossEntropyLoss`, you can ensure that the model learns to assign high probabilities to the correct digit labels and low probabilities to the incorrect ones, effectively minimizing the loss during training.

---

This loss function is integral to training models in classification tasks, ensuring that the predictions are as close as possible to the actual labels, thus improving the model's accuracy.