## Cross-Entropy Loss

Cross-Entropy Loss, also known as Log Loss, is commonly used in classification problems. It measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. It is defined as:

$$L(y,\hat{y}) = -\frac{1}{N}\sum_{i=1}^{N}[y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$

where $y$ is the true label (0 or 1), $\hat{y}$ is the predicted probability, and $N$ is the number of samples.

In this tutorial, we will use a simple dataset to demonstrate how to apply Cross-Entropy Loss in PyTorch. We will create a binary classification problem where we need to classify whether a given number is even or odd.

## PyTorch Cross-Entropy Loss Parameters

In PyTorch, the `nn.CrossEntropyLoss` class is used for cross-entropy loss. It combines the `nn.LogSoftmax()` and `nn.NLLLoss()` (Negative Log Likelihood Loss) in one single class. The main parameters for this class are:

- `weight` (Tensor, optional): A manual rescaling weight given to each class. If provided, it has to be a Tensor of size `C`.

- `size_average` (bool, optional): Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`

- `ignore_index` (int, optional): Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets.

- `reduce` (bool, optional): Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`

- `reduction` (string, optional): Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: No reduction will be applied, `'mean'`: The sum of the output will be divided by the number of elements in the output, `'sum'`: The output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Now, let's import the necessary libraries and create a simple dataset for our binary classification problem.

In [None]:
import torch
import torch.nn as nn
import numpy as np

# Create a simple dataset
data = torch.tensor([[2, 0], [5, 1], [8, 0], [12, 0], [15, 1], [17, 1]], dtype=torch.float32)
X = data[:, 0]
y = data[:, 1]

print('Dataset:')
print(data)

## Import Libraries and Create Dataset

In [None]:
import torch
import torch.nn as nn

# Create a simple dataset
data = torch.tensor([[2, 0], [5, 1], [8, 0], [12, 0], [15, 1], [17, 1]], dtype=torch.float32)
X = data[:, 0]
y = data[:, 1]

print('Dataset:')
print(data)

## Feedforward Neural Network

We will create a simple feedforward neural network with one input layer, one hidden layer, and one output layer. The hidden layer will have two neurons, and we will use the ReLU activation function. The output layer will have a single neuron with a sigmoid activation function to produce a probability value between 0 and 1.

In [None]:
# Define the neural network model
class SimpleNN(nn.Module):

    def __init__(self):
        super(SimpleNN, self).__init__()

        # Define layers
        self.hidden_layer = nn.Linear(1, 2)
        self.output_layer = nn.Linear(2, 1)

        # Define activation functions
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.hidden_layer(x)
        x = self.relu(x)
        x = self.output_layer(x)
        x = self.sigmoid(x)

        return x

# Instantiate the model
model = SimpleNN()
print(model)

## Forward Propagation

Now that we have our neural network model defined, we can perform forward propagation by passing the input data through the model. This will produce an output probability value for each input data point.

In [None]:
# Perform forward propagation
X_input = X.view(-1, 1)  # Reshape input data

# Pass the input data through the model
output_probs = model(X_input)

print('Output Probabilities:')


## Interpretation and Evaluation

The output probabilities can be interpreted as the model's prediction of whether a given number is even or odd. We can convert these probabilities to binary predictions by applying a threshold (e.g., 0.5). Note that the model has not been trained yet, so the predictions are likely to be incorrect at this stage.

In [None]:
# Convert probabilities to binary predictions
threshold = 0.5
predictions = (output_probs > threshold).float()

# Compare predictions with true labels
print('Predictions:')
print(predictions.view(-1))
print('True labels:')
print(y)

## Practical Application

In practical applications, forward propagation is an essential step in training and evaluating neural networks. Once the model is trained, you can use forward propagation to make predictions on new data and assess the performance of the model. For example, you could use a similar neural network to classify images, detect anomalies in time-series data, or predict stock prices.