# 🧪 Lab Exercise: Implementing a Softmax Layer for Neural Networks

In this lab, you'll dive into the world of neural networks by building a **Softmax layer** from scratch! 🎯 This layer is essential for handling **multi-class classification** problems, helping models predict probabilities across multiple categories. You’ll implement both the **forward pass** (to compute probabilities) and the **backward pass** (to compute gradients) of the softmax operation.

---

## 🎓 Learning Objectives

By the end of this lab, you’ll be able to:

- 🔢 **Understand** the mathematical formulation of the softmax function  
- 📈 **Implement** the forward pass to generate a probability distribution  
- 🔁 **Implement** the backward pass to compute gradients during backpropagation  
- 🛠️ **Handle** numerical stability issues to ensure accurate results  

---

Let’s get started and turn theory into code! 💻🔥


In [1]:
import numpy as np

class SoftmaxLayer:
    def __init__(self):
        pass  # No need for cache if only implementing forward pass

    def forward(self, Z):
        """Compute the softmax of input Z (forward pass only)

        Args:
            Z: Input array of shape (batch_size, num_classes)

        Returns:
            A: Probability distribution after softmax, same shape as Z
        """
        # Shift values by max for numerical stability
        Z_shifted = Z - np.max(Z, axis=1, keepdims=True)

        # Compute exponentials
        exp_Z = np.exp(Z_shifted)

        # Compute softmax probabilities
        A = exp_Z / np.sum(exp_Z, axis=1, keepdims=True)

        return A

# Assessment Note 📝

The **unit test code** provided is designed to automatically evaluate your implementation of the softmax layer. When you run these tests against your completed code, they will verify:

## How to Use the Tests 🛠
1. Complete your `SoftmaxLayer` implementation ✍
2. Run the test cell ▶
3. All tests should pass if correct ✅
4. If any fail, check error messages to debug 🐞


**Happy coding!** 🚀 Let's make sure your softmax layer works perfectly! 🎉

In [2]:
import unittest
import numpy as np

class TestSoftmaxForward(unittest.TestCase):
    def setUp(self):
        self.softmax = SoftmaxLayer()
        self.tol = 1e-6

    def test_forward_shape(self):
        """Test if output shape matches input shape"""
        Z = np.array([[1, 2, 3], [0, 0, 0]])
        A = self.softmax.forward(Z)
        try:
            self.assertEqual(A.shape, Z.shape)
        except AssertionError:
            print("\n🔴 Error in output shape!")
            print(f"Expected shape: {Z.shape}, Got: {A.shape}")
            print("Make sure your softmax implementation returns an array with the same shape as the input")
            raise

    def test_forward_probabilities(self):
        """Test if output probabilities are correct"""
        Z = np.array([[1, 2, 3]])
        A = self.softmax.forward(Z)

        # Test row sums
        row_sums = np.sum(A, axis=1)
        try:
            np.testing.assert_allclose(row_sums, [1.0], atol=self.tol)
        except AssertionError:
            print("\n🔴 Error in probability sums!")
            print(f"Row sums should be approximately 1.0, Got: {row_sums}")
            print("Your softmax probabilities don't sum to 1. Check your normalization step")
            print("Remember: A = exp(Z_shifted) / sum(exp(Z_shifted), keeping dimensions with keepdims=True")
            raise

        # Test known values
        expected = np.array([[0.09003057, 0.24472847, 0.66524096]])
        try:
            np.testing.assert_allclose(A, expected, atol=self.tol)
        except AssertionError:
            print("\n🔴 Error in probability values!")
            print(f"Expected: {expected}")
            print(f"Got: {A}")
            print("Your softmax values don't match expected probabilities")
            print("Did you remember to shift values by max(Z) before exponentiation?")
            print("The correct steps are:")
            print("1. Z_shifted = Z - np.max(Z, axis=1, keepdims=True)")
            print("2. exp_Z = np.exp(Z_shifted)")
            print("3. A = exp_Z / np.sum(exp_Z, axis=1, keepdims=True)")
            raise

    def test_forward_stability(self):
        """Test numerical stability with large inputs"""
        Z = np.array([[1000, 1001, 1002]])
        A = self.softmax.forward(Z)
        try:
            np.testing.assert_allclose(np.sum(A, axis=1), [1.0], atol=self.tol)
        except AssertionError:
            print("\n🔴 Numerical stability problem!")
            print("Your softmax fails with large input values")
            print("Did you implement the max-shifting trick for numerical stability?")
            print("Solution: Subtract np.max(Z) before exponentiation to prevent overflow")
            raise

if __name__ == '__main__':
    print("Running tests...\n")
    unittest.main(argv=['first-arg-is-ignored'], exit=False, verbosity=2)

test_forward_probabilities (__main__.TestSoftmaxForward)
Test if output probabilities are correct ... ok
test_forward_shape (__main__.TestSoftmaxForward)
Test if output shape matches input shape ... ok
test_forward_stability (__main__.TestSoftmaxForward)
Test numerical stability with large inputs ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.012s

OK


Running tests...



## 🔍 Instructor Notes: Softmax Implementation Solution

### 🎯 Key Learning Objectives
1. Numerical stability handling in exponential operations
2. Proper broadcasting in vectorized operations
3. Validation of probability distribution properties

### ⚠️ Common Pitfalls to Address
```python
A = np.exp(Z) / np.sum(np.exp(Z), axis=1)  # Fails with large Z values
```

### ⚠️ Implementation Insights
* The `keepdims=True` is crucial for proper broadcasting. Without it, substraction would fail due to shape mismatch:
```python
np.max(Z, axis=1, keepdims=True)  # Maintains (n,1) shape
```

### � Debugging Tips
* If probabilities do not sum to 1:
  * Verify interemediate shapes with `Z_shifted.shape`
  * Check `np.sum()`includes `keepdims=True`
* For NaN values:
  * Likely overflow - confirm max-shifting is implemented
  * test with extreme values (>>1000)







