[[Neural Networks from Scratch]]
##### Why Softmax?
Our end goal is to measure how right or wrong our prediction is with a probability distribution. If we used ReLU and all the values are negative we can only get an output of 0% which is the main disadvantage of ReLU.

Softmax exponentiates each input value, ensuring that it can never be negative or zero whilst retaining its **relative significance**. 
##### Initialise the packages

In [None]:
import micropip

await micropip.install("numpy")
await micropip.install("nnfs")
await micropip.install("matplotlib")

import matplotlib.pyplot as plt
import numpy as np
import nnfs
from nnfs.datasets import spiral_data
import math

nnfs.init()

##### Exponentiating Values

In [None]:
layer_outputs = [4.8, 1.21, 2.385]

# Euler's number
# E = 2.71828182846
E = math.e

'''
exp_values = []

for output in layer_outputs:
	exp_values.append(E**output)
'''

exp_values = np.exp(layer_outputs)

print(exp_values)


The next step once these values are exponentiated, is to normalise the values.

##### How do we normalise?
In our case, we normalise the exponentiated values by dividing a single output neuron's value by the sum of all other output neurons in that output layer.
$$
y = \frac{u}{\sum_{i=1}^{n} u_i}
$$
##### Normalisation

In [None]:
'''
norm_base = sum(exp_values)
norm_values = []

for value in exp_values:
	norm_values.append(value / norm_base)
'''

# Dividing the expontentiated values by the sum of all output neurons in the same output layer
norm_values = exp_values / np.sum(exp_values)

print(norm_values)
print(sum(norm_values))


##### Summing up our progress of Softmax so far
Input -> Exponentiate -> Normalise -> Output

The combination of the exponentiation and normalisation processes is what makes up Softmax, and it is denoted by the formula below:

$$
S_{i,j} = \frac{e^{z_{i,j}}}{\sum_{l=1}^{L} e^{z_{i,j}}}
$$

##### Exponentiating and Normalising Batches of Inputs

In [None]:
layer_outputs = [[4.8, 1.21, 2.385],
				 [8.9, -1.81, 0.2],
				 [1.41, 1.051, 0.026]]

# Exponentiating
exp_values = np.exp(layer_outputs)
# Normalising 
# axis=1 sums each row; keepdims=True keeps dimensions for broadcasting so we don't have to transpose it
norm_values = exp_values / np.sum(exp_values, axis=1, keepdims=True)

print(norm_values)


##### What is the issue with exponentiation?
As the input to the exponential function grows, we may see an explosion of values where massive numbers and overflows occur.

##### What is an overflow?
Overflow occurs when a computed value becomes too large for the computer to handle, causing errors.

##### How to combat this overflow?
Subtract the maximum value in the output vectors from every element. This shifts the largest value to 0 and all others below 0, preventing overflow during exponentiation.

##### What happens to the Softmax value?
Softmax produces values between 0 and 1 that sum to 1.

##### What is a vector?
A one-dimensional array - just a list of numbers.

##### The Softmax Activation Class

In [None]:
nnfs.init()

class Layer_Dense:
	def __init__(self, n_inputs, n_neurons):
		self.weights = 0.10 * np.random.randn(n_inputs, n_neurons)
		self.biases = np.zeros((1, n_neurons))
	def forward(self, inputs):
		self.output = np.dot(inputs, self.weights) + self.biases

class Activation_ReLU:
	def forward(self, inputs):
		self.output = np.maximum(0, inputs)

class Activation_Softmax:
	def forward(self,inputs):
		exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
		probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
		self.output = probabilities

# 3 classes associated with the three-armed spiral in the dataset. (Each coordinate must fit to one of the 3 arms)
X, y = spiral_data(samples=100, classes=3)
# There must be 2 inputs as the only 2 features are X and y coordinates for this data
dense1 = Layer_Dense(2, 3)
activation1 = Activation_ReLU()

dense2 = Layer_Dense(3, 3)
activation2 = Activation_Softmax()

dense1.forward(X)
activation1.forward(dense1.output)

dense2.forward(activation1.output)
activation2.forward(dense2.output)

print(activation2.output[:5])