 Introducing the Softmax activation function:
 The model we are trying to build is a classifier, so we  want activation function meant for 
 classification. One of these is the softmax activation function. First, why do we need
 another activation function? Well it depends on what our overall goals are. In this case the
 ReLU is unbounded, not normalized with other units and exclusive. Not normalized means  that
 the values can be anything , an output of [4.8 ,1,21 ,232] is without context. and exclusive
 means that each output is independent of the others. To address this lack of context,  the 
 softmax activation  function on the output data can take in non-normalized or uncalibrated 
 inputs and produce a normalized distribution of probabilities  for our classes. 
 In the case of classification what we want to see is a prediction of which class the network
 "thinks" the input represents/ This distribution returned by the softmax activation function
 represesnts CONFIDENCE SCORES for each class and will add up to 1.
 The predicted class is associated with the output neuron that returned the largest
 confidence score.
 

In [18]:
import math
#values from a previous layer of neurons:
layer_outputs = [ 4.8 ,1.21 , -2.385]
#To implement the softmax function we need to first normalize the inputs to the function
#We do this by first exponentiating the inputs:(to avoid getting negative values)
#y  = e^x
E = math.e
exp_values = []
for output in layer_outputs:
    exp_values.append(E**output)
print(exp_values)

[121.51041751873483, 3.353484652549023, 0.09208897957928121]


In [19]:
#now we normalize:
norm_den = sum(exp_values)
norm_values = []
for values in exp_values:
    norm_values.append(values/norm_den)
print(norm_values)
print("sum  of normalized exponentiated values:", sum(norm_values))

[0.9724257028383028, 0.026837325858991908, 0.0007369713027052813]
sum  of normalized exponentiated values: 1.0


Exponentiating and Normalizing Using Numpy
It's faster to use numpy.

In [20]:
import numpy as np
lay_outputs = [ 4.8 ,1.21 , -2.385]
#For each value in the vector calculate the  exponential value
exp_numvals = np.exp(lay_outputs)# np.exp exponentiates all the elements in a single call.
print("The exponentiated values:")
print(exp_numvals)

#now we normalize the exponentiated values
norm_numvalues = exp_numvals/np.sum(exp_numvals)
print("The normalized values:")
print(norm_numvalues)
print(np.sum(exp_numvals))
print("sum of the normalized values:",np.sum(norm_numvalues))

The exponentiated values:
[1.21510418e+02 3.35348465e+00 9.20889796e-02]
The normalized values:
[9.72425703e-01 2.68373259e-02 7.36971303e-04]
124.95599115086317
sum of the normalized values: 1.0


For batches:
We need the sum of each row of the input matrix so we do the following:
np.sum(inp , axis=1(sum of rows))
but we also need the shapes to align so we use the keepdims parameter 
np.sum(inp , axis=1(sum of rows) , keepdims+True)

In [21]:
inp_batch =np.array([[2.4 , 4.5 ,7.8 , 3.0],
            [2.0,3.0,8.0,4.6],
            [-1.4,3.3,4.1,0.17]])

#inp_batch = inp_batch - np.max(inp_batch, axis = 1 , keepdims = True)
exp_batch = np.exp(inp_batch)
probabilities = exp_batch/np.sum(exp_batch , axis=1, keepdims = True)
print(probabilities)
print(np.sum(probabilities, axis  = 1))
print(np.sum(inp_batch , axis=1 , keepdims=True))
print(np.shape(inp_batch[1]))

[[0.00430302 0.03513923 0.95271713 0.00784062]
 [0.00237749 0.0064627  0.95914984 0.03200997]
 [0.00277434 0.30503112 0.67885925 0.01333529]]
[1. 1. 1.]
[[-13.5 ]
 [-14.4 ]
 [-10.23]]
(4,)


making the doftmax object and using it in a network

In [22]:
class layer:
    def __init__(self, no_of_inputs , no_of_neurons):
        self.weights = np.random.randn(no_of_inputs , no_of_neurons)
        self.biases = np.zeros((1,no_of_neurons))
    def forward(self, inputs):
        out = np.dot(inputs, self.weights)
        self.outp = out +self.biases
class ReLU :
    def forward(self, inputs):
        self.out = np.maximum(0, inputs)
class Softmax:
    def forward(self, inputs):
        ex_batch = np.exp(inputs -  np.max(inputs , axis = 1 , keepdims = True))
        self.prob= exp_batch/np.sum(ex_batch , axis=1, keepdims = True)
layer1 = layer(np.shape(inp_batch[1])[0], 4 )
layer1.forward(inp_batch)
relu = ReLU()
relu.forward(layer1.outp)
print(relu.out)
layer2 = layer(4,10)
layer2.forward(relu.out)
smax = Softmax()
smax.forward(layer2.outp)
print(smax.prob)
print(np.sum(smax.prob , axis = 1, keepdims =True))
    
            
        
        
        

[[0.         0.         4.49024915 0.        ]
 [0.         0.         2.44026358 0.        ]
 [0.         0.         4.93912335 0.        ]]
[[0.00412958 0.03372285 0.91431532 0.00752458]
 [0.00159657 0.00433992 0.64410186 0.02149579]
 [0.00383604 0.42176134 0.93864713 0.01843848]]
[[0.95969233]
 [0.67153414]
 [1.38268298]]
