# Softmax 

* **Purpose:** The softmax function generalizes logistic regression to the **multi-class** case (more than two categories).

* **Class scores (logits):** For each class (i), a score (logit) is computed as
  [
  z_i = w_i^\top x + b_i
  ]
  In 1D, these correspond to different lines; in higher dimensions, they are **dot products** between the input vector (x) and the parameter vectors (w_i).

* **Decision (hard prediction):** The predicted class is the index with the highest score:
  [
  \hat{y} = \arg\max_i z_i
  ]
  (In the examples: blue/red/green regions in 1D, or in 2D, points assigned to the parameter vector (w_i) producing the largest dot product.)

* **Probabilities (soft prediction):** The logits are transformed into probabilities using the softmax function:
  [
  p_i = \frac{e^{z_i}}{\sum_j e^{z_j}}
  ]
  Each class gets a probability between 0 and 1, and all probabilities sum to 1.

* **Geometric intuition (2D / MNIST example):**

  * Each **class** is represented by a parameter vector (w_i).
  * Each sample (x) is classified according to the (w_i) with the highest dot product (the “closest” vector).
  * The MNIST example shows how 28×28 grayscale images (784 pixels) are flattened into vectors for classification; for visualization, the idea is simplified to 2D.

* **Training:** In practice, softmax is trained using **cross-entropy loss**, which encourages the correct class to have the highest logit.

* **Summary:**

  1. Compute logits (z_i) for each class.
  2. Use **argmax** to pick the predicted class.
  3. Use **softmax** to convert logits into class probabilities.
  4. Train the model with **cross-entropy loss** for multi-class classification.
