## 3. Loss function

### Finding Loss Functions in PyTorch

PyTorch offers a wide range of loss functions that cater to different types of tasks, from regression to classification. You can find a comprehensive list of available loss functions in the PyTorch documentation under the [Loss Functions](https://pytorch.org/docs/stable/nn.html#loss-functions) section. This resource provides details on each loss function, including their purpose, usage, and parameters, enabling you to choose the most suitable one for your specific application.

### Code Explanation

- **Purpose**:
  - The code defines a loss function using `torch.nn.CrossEntropyLoss()`, which is commonly used for multi-class classification tasks.

- **Functionality**:
  - `CrossEntropyLoss` computes the cross-entropy loss between the predicted class probabilities (output logits from the model) and the true class labels.

- **Use Case**:
  - This loss function is appropriate when the model outputs raw scores (logits) for each class, and you need to determine how well these scores match the actual labels.

## 4. Optimization algorithm

### Finding Optimization Algorithms in PyTorch

PyTorch provides a variety of optimization algorithms to help train neural networks efficiently. These optimizers adjust the model parameters based on the computed gradients to minimize the loss function. You can explore the available optimization algorithms in the PyTorch documentation under the [Optimizers](https://pytorch.org/docs/stable/optim.html) section. This resource details each optimizer's functionality, parameters, and typical use cases, allowing you to select the most appropriate one for your specific needs.

### Code Explanation

- **Purpose**:
  - The code defines an optimizer using `optim.SGD`, which stands for Stochastic Gradient Descent, a popular optimization algorithm in machine learning.

- **Functionality**:
  - `SGD` updates the model's parameters by computing the gradients of the loss function and applying them to minimize the loss.
  - `lr=0.001` specifies the learning rate, which determines the step size during each update. A smaller learning rate can lead to more stable convergence.
  - `momentum=0.9` is used to accelerate the optimization process by helping the optimizer navigate through the parameter space more effectively. It does this by maintaining a running average of past gradients, reducing oscillations and improving convergence speed.

- **Use Case**:
  - This optimizer is suitable for training neural networks where you want to leverage simple, yet effective gradient descent techniques with additional momentum for faster convergence.

In [1]:
import torch
import torch.optim as optim

from resnet18 import model

# Define the loss function
criterion = torch.nn.CrossEntropyLoss()

# Define the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)