# Deep Learning

# Tutorial 5: Softmax Regression for classification

In this tutorial, we will cover:

- Softmax Regression

Prerequisites:

- Python

My contact:

- Niklas Beuter (niklas.beuter@th-luebeck.de)

Course:

- Slides and notebooks will be available at https://lernraum.th-luebeck.de/course/view.php?id=5383

## Softmax Regression

Softmax regression, also known as multinomial logistic regression, is a generalization of logistic regression that can handle multiple classes, as opposed to binary classification which deals with only two classes. It is widely used for classification tasks where the classes are mutually exclusive. For example, it is used for handwriting recognition where each image only corresponds to one digit.

### The Softmax Function

The core of Softmax regression is the Softmax function. This function takes a vector of raw scores, often called logits, and transforms it into a vector of probabilities, with each value representing the probability that the input belongs to a corresponding class. The probabilities must be positive and sum up to one so that they can be directly interpreted as probabilities.

The Softmax function is defined as follows for an input vector **z**:

$$ Softmax(z)_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}} $$

Here, $ e^{z_i} $ represents the exponential of the ith element of the input vector **z**, and $ K $ is the number of classes.

### Learning the Parameters

Similar to logistic regression, Softmax regression models learn weights and biases through the training process. However, instead of fitting a single line (or hyperplane in higher dimensions), Softmax regression fits one line for each class. The decision boundary between any two classes is then determined by the points where their probabilities are equal.

The parameters are typically learned by minimizing the cross-entropy loss, also known as the negative log-likelihood. This loss function is a measure of how well the predicted probability distribution fits the true distribution (the true class labels). For a single data point **x**, the cross-entropy loss for true class **k** is:

$$ L(\theta) = -\sum_{i=1}^{K} y_i \log(p_i) $$

where $ y_i $ is the binary indicator (0 or 1) if class label **c** is the correct classification for **x**, and $ p_i $ is the predicted probability that **x** is of class **i**.

### Optimization

To find the best parameters, we use optimization algorithms such as gradient descent or its variants (stochastic gradient descent, mini-batch gradient descent) to minimize the cross-entropy loss over the entire training set. The optimization is done iteratively by computing the gradient of the loss with respect to each parameter and adjusting the parameters in the direction that reduces the loss.

### Advantages of Softmax Regression

- **Flexibility**: It can handle multiple classes directly, without the need to reduce the problem into multiple binary classification problems.
- **Probabilistic Interpretation**: The output can be interpreted as a probability distribution over classes.
- **Efficiency**: It is computationally efficient, especially with the use of vectorized operations.

Softmax regression is a powerful tool for classification problems and serves as a foundational model for more complex neural network architectures used in deep learning.

## Going into the code

First, we import all needed modules

In [None]:
import torch
import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

Then, we create a dataset to work on.

In [None]:
# Generate a synthetic dataset
X, y = make_blobs(n_samples=100, centers=3, n_features=2, random_state=42)

# Convert the dataset to PyTorch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)

Let's visualize the data to understand with what we are working

In [None]:
# Visualize the dataset
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Synthetic Dataset for Softmax Regression')
plt.show()

Let us define the standard linear model and adding the softmax (directly from torch).
We also define the cross entropy loss.

In [None]:
def model(x):
    return torch.softmax(x @ weights + bias, dim=1)

def cross_entropy_loss(predictions, targets):
    log_p = -torch.log(predictions[range(targets.shape[0]), targets])
    return torch.mean(log_p)

As model parameters we only have 2 features (x and y) and 3 classes. So, we define three output linear functions.

Let us try to predict first input values to see, if everything works.

In [None]:
# Model parameters
weights = torch.randn((2, 3), requires_grad=True)  # 2 features, 3 classes
bias = torch.zeros(3, requires_grad=True)

# Example prediction
preds = model(X_tensor)
print(preds[:5])  # Display the first 5 predictions

Now, we should start to train our softmax regression network. We set the learning rate and the number of epochs for training.

In [None]:
# Learning rate
lr = 0.1
# Number of epochs
epochs = 100

Let us start training and optimize the weights and biases.

In [None]:
# Training loop
for epoch in range(epochs):
    preds = model(X_tensor)
    loss = cross_entropy_loss(preds, y_tensor)
    loss.backward()

    with torch.no_grad():
        weights -= lr * weights.grad
        bias -= lr * bias.grad
        weights.grad.zero_()
        bias.grad.zero_()

    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')

Now, we can visualize the result and predict a new data point.

In [None]:
# Define a function to predict the class and visualize it
def predict_and_visualize(new_point):
    # Convert the new point to a PyTorch tensor
    new_point_tensor = torch.tensor(new_point, dtype=torch.float32).unsqueeze(0)

    # Use the trained model to predict the class
    prediction = model(new_point_tensor)
    predicted_class = torch.argmax(prediction).item()

    # Now let's plot the original dataset
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')

    # And plot the new point with the predicted class
    color_map = plt.cm.viridis
    class_colors = [color_map(i) for i in [0.0, 0.5, 1.0]]
    predicted_color = class_colors[predicted_class]

    plt.scatter(new_point[0], new_point[1], color=predicted_color, edgecolor='black', s=100, zorder=3)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Synthetic Dataset with Predicted Point')

    # Create a legend for the new point
    legend_patch = mpatches.Patch(color=predicted_color, label=f'Predicted class: {predicted_class}')
    plt.legend(handles=[legend_patch])

    plt.show()
    return predicted_class

Here, you can set any point you like and see the classification result.

In [None]:
# Example usage: Let's predict the class of a new point (1.0, -10.0)
new_data_point = [0.0, -2.0]
predicted_class = predict_and_visualize(new_data_point)
print(f"The predicted class for new data point {new_data_point} is {predicted_class}")

## Plot decision boundaries

Finally, let us see the result of the model. So, we ploit the decision boundaries.

In [None]:
def plot_linear_discriminants(weights, bias):
    # Generate a grid of points to plot decision boundaries
    x_min, x_max = X_tensor[:, 0].min() - 1, X_tensor[:, 0].max() + 1
    y_min, y_max = X_tensor[:, 1].min() - 1, X_tensor[:, 1].max() + 1
    h = 0.01
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

    # Flatten the grid to pass through the model
    grid = np.c_[xx.ravel(), yy.ravel()]
    grid_tensor = torch.tensor(grid, dtype=torch.float32)

    # Compute the class scores from the linear part (before softmax)
    scores = grid_tensor @ weights + bias.unsqueeze(0)

    # Compute the predicted class labels
    _, predicted_classes = torch.max(scores, 1)
    predicted_classes = predicted_classes.reshape(xx.shape)

    # Plot the decision boundary
    plt.contourf(xx, yy, predicted_classes.numpy(), alpha=0.5, cmap='viridis')

    # Plot the original data points for reference
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap='viridis')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Linear Discriminant Functions')
    plt.show()

In [None]:
# Plot linear discriminants after training
plot_linear_discriminants(weights, bias)