# Chapter 4: Softmax Activation

$$S_{i,j}=\frac{e^{z_{i,j}}}{\sum^{L}_{l=1}e^{z_{i,l}}}$$

Softmax functions consists of two parts: Exponentiation and Normalization. Exponentiation maps data to a non-negative real number space while keep the class of data. Normalization excludes the influence of the magnitudes among data. Let's create an input:

In [1]:
import numpy as np
layer_outputs = [4.8,1.21,2.385] # Values from the earlier previous when we described what a neural network is.

Calculate the exponential values:

In [2]:
exp_values = np.exp(layer_outputs)
print('exponentiated values:\n',exp_values)

exponentiated values:
 [121.51041752   3.35348465  10.85906266]


And normalized values:

In [3]:
norm_values = exp_values / np.sum(exp_values)
print('normalized exponentiated values:\n',norm_values)
sum_norm_values = np.sum(norm_values)
print('sum of normalized values:',sum_norm_values)

normalized exponentiated values:
 [0.89528266 0.02470831 0.08000903]
sum of normalized values: 0.9999999999999999


Before the next step, test a little more about the np.sum function:

In [4]:
array_to_be_summed = np.random.random((3,3))
print('array to be summed:\n',array_to_be_summed)
summed_array = np.sum(array_to_be_summed)
print('summed array:',summed_array)
summed_array_axis_0 = np.sum(array_to_be_summed,axis=0)
print('summed array axis 0:',summed_array_axis_0)
summed_array_axis_1 = np.sum(array_to_be_summed,axis=1)
print('summed array axis 1:',summed_array_axis_1)

array to be summed:
 [[0.5904968  0.64847236 0.69945019]
 [0.55927405 0.81059468 0.61866652]
 [0.4102911  0.21477291 0.56904373]]
summed array: 5.121062337742032
summed array axis 0: [1.56006195 1.67383995 1.88716044]
summed array axis 1: [1.93841934 1.98853526 1.19410774]


The summation along axis 0 is the summation w.r.t. the columns and axis 1, rows. Now is the time to write the activation function using softmax:

In [5]:
class ActivationSoftmax:
    def forward(self,inputs):
        exp_values = np.exp(inputs - np.max(inputs,axis=1,keepdims=True))
        probabilities = exp_values / np.sum(exp_values,axis=1,keepdims=True)
        self.output = probabilities

### Dead Neuron and Large Number
Before playing with our new activation function, one more thing need to be mentioned. Generally, very large numbers could "wreak havoc down" the line and render a network useless over time, also called a dead neuron. The exponential function used in softmax activation is one of the sources of exploding values. 

In [6]:
print(np.exp(1))
print(np.exp(10))
print(np.exp(100))
print(np.exp(1000))

2.718281828459045
22026.465794806718
2.6881171418161356e+43
inf


  print(np.exp(1000))


Overflow issues can arise even with relatively small numbers. To prevent such errors, we can leverage the property of the exponential function, which tends towards 0 for negative values and equals 1 at 0. By subtracting the largest value from every input, we effectively shift the entire input set along the axis, aligning the largest value with 0. This adjustment ensures that the output of the activation function remains monotonous and avoids explosive behavior.

In [7]:
softmax  = ActivationSoftmax()
softmax.forward([[1,2,3]])
print(softmax.output)

[[0.09003057 0.24472847 0.66524096]]


Let's gather all buddies we knew together:

In [8]:
from nnfs.datasets import spiral_data

class LayerDense:
    def __init__(self, n_inputs, n_neurons):
        self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1,n_neurons))
        
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases
        
class ActivationReLU:
    def forward(self,inputs):
        self.output = np.maximum(0,inputs)
        
X, y = spiral_data(samples=100, classes=3)

dense1 = LayerDense(2,3)
activation1 = ActivationReLU()
dense2 = LayerDense(3,3)
activation2 = ActivationSoftmax()
dense1.forward(X)
activation1.forward(dense1.output)
dense2.forward(activation1.output)
activation2.forward(dense2.output)
print(activation2.output[:5])

[[0.33333333 0.33333333 0.33333333]
 [0.333359   0.33333178 0.33330921]
 [0.33338568 0.33333017 0.33328414]
 [0.33339186 0.3333298  0.33327834]
 [0.33342196 0.33332798 0.33325006]]
