<h1>Softmax and Cross-Entropy</h1> 



<h3 style='color:yellow'>Cross-entropy loss and the softmax function are considered two of the most used functions in neural networks for classification tasks, where both are used with Numpy data arrays and PyTorch tensors.</h3>

<h3 style='color:yellow'>The softmax squashes the values of the output layer to be between 0 and 1, i.e., convert values to  probabilities.</h3>

<h3 style='color:yellow'>The softmax is computed by considering the exponential of each numerical element value divided by the sum of all exponential values as follows:</h3>


$$
\Large{S(y)_i}= \frac{e ^{y_i}}{\Sigma e^{y_i}}
$$


<div style="display: flex; justify-content: center;">
    <img src="softmax.png" width=400 />
</div>

<h3 style='color:yellow'>The model returns the class index associated with the highest probability, i.e., the index of 0.7, which is 0.</h3>



In [2]:
import numpy as np
import torch
import torch.nn as nn 

In [3]:
def soft_max(x):
    return np.exp(x)/np.sum(np.exp(x),axis=0)

X=np.array([2,1,0.1])
output=soft_max(X)
output

array([0.65900114, 0.24243297, 0.09856589])

In [8]:
X_tensor= torch.from_numpy(X.astype(np.float32))
output=torch.softmax(X_tensor,dim=0)
output

tensor([0.6590, 0.2424, 0.0986])

<h3 style='color:yellow'>The softmax output function is used for classification tasks; however, for the sake of reconstruction tasks, we do not use the softmax and instead we use the sigmoid function to squash the value between 0 and 1.</h3>

<h3 style='color:yellow'>The sigmoid function is also used in binary classification tasks to squash the values between 0 and 1. Then it maps the values > 0.5 to one class and the values < 0.5 to another class.</h3>

<h3 style='color:yellow'>The softmax function is commonly used in combination with the cross-entropy loss function in multi-class classification tasks. This combination is often referred to as "softmax cross-entropy" or "softmax loss."</h3>

$$
\Large{ {\mathrm {Loss}} \,=D(\hat{Y},Y)} = - \frac{1}{N} . \sum Y_i \,\, log(\hat{Y_i})

$$


$$
 \mathrm{if} \,\, Y=[1,0,0]  \\ \hat{Y}=[0.7,0.2,0.1]  \\
 D(Y,\hat{Y})  \rightarrow 0.35
$$


$$
 \mathrm{if} \,\, Y=[1,0,0]  \\ \hat{Y}=[0.1,0.3,0.6]  \\
 D(Y,\hat{Y})  \rightarrow 2.30
$$


In [15]:
# Numpy cross Entropy loss
def cross_entropy(y_true,y_predicted):
    loss= - (1/len(y_true) * np.sum(y_true * np.log(y_predicted),axis=0))
    return loss

y_true=np.array([1,0,0])  # This is one-hot  encoding
y_pred1=np.array([0.9,0.2,0.1])
y_pred2=np.array([0.1,0.3,0.6])
loss1=cross_entropy(y_true,y_pred1)
loss2=cross_entropy(y_true,y_pred2)
print(f'The first loss {loss1:.3f}')
print(f'The second loss {loss2:.3f}')

The first loss 0.035
The second loss 0.768


<h3 style='color:yellow'>PyTorch cross-entropy method embeds the softmax layer into it, thus there is no need to apply or recall the softmax in the last layer.</h3>

$$
nn.CrossEntropyLoss = nn.LogSoftymax + nn.NLLoss \, \mathrm{(negative \, logliklehood \,loss)}
$$

<h3 style='color:yellow'>When using PyTorch, Y_true should have the correct class label (not the one-hot encoding) class labels.</h3>

<h3 style='color:yellow'>When using PyTorch, Y_predicted should be raw scores (logits without the need to convert them to probability), where torch can handle that.</h3>





In [25]:
# Pytorch cross_entropy loss
loss=nn.CrossEntropyLoss()
y_true=torch.tensor([0]) # Here we insert the class label, not the one-hot encoding, as in the numpy cross-entropy

# good prediction becuse we have one label and the first index is very high (n_sample x n_classes = 1 x 3)
y_pred_good=torch.tensor([[2.0,1.0,0.1]]) # Take care about the size here; it should be an array of arrays, where it is in the size n_samples x n_classes. Because we have 1 class, we use [[]]; if we have two classes, we use [[],[]], and so on

y_pred_bad=torch.tensor([[0.4,2.0,0.3]]) # Take care about the size here, it should be an arry of arry where it is in  size n_samples x n_classes. because we have 1 class we use [[]] if we have two [[],[]]

loss1=loss(y_pred_good,y_true)
loss2=loss(y_pred_bad,y_true)

print(loss1.item(),'||||', loss2.item())

0.4170299470424652 |||| 1.9253969192504883


In [24]:
# To get the index of the class or the actual prediction
_,prediction1=torch.max(y_pred_good,1)   # prediction1=torch.argmax(y_pred_good,1)
_,prediction2=torch.max(y_pred_bad,1)
print(prediction1,'||||||',prediction2)

tensor([0]) |||||| tensor([1])


In [27]:
# PyTorch allows for multiple classes, i.e., multiclass classification or detection
loss=nn.CrossEntropyLoss()
y_true=torch.tensor([2,0,1]) # Here we insert the class label, not the one-hot encoding, as in the numpy cross-entropy

# good prediction becuse we have one label and the first index is very high (n_sample x n_classes = 3 x 3)
y_pred_good=torch.tensor([[0.5,1.0,2.1],[2.0,1.0,0.1],[2.0,3.0,0.1]]) # Take care about the size here; it should be an array of arrays, where it is in the size n_samples x n_classes. Because we have 3 classes, we use [[],[],[]]
y_pred_bad=torch.tensor([[2.1,1.0,0.2],[0.3,1.0,0.1],[2.0,3.0,0.1]]) # Take care about the size here, it should be an arry of arry where it is in  size n_samples x n_classes. because we have 1 class we use [[]] if we have two [[],[]]

loss1=loss(y_pred_good,y_true)
loss2=loss(y_pred_bad,y_true)

print(loss1.item(),'||||', loss2.item())
print('')
_,prediction1=torch.max(y_pred_good,1)
_,prediction2=torch.max(y_pred_bad,1)
print(prediction1,'||||||',prediction2)

0.3993692398071289 |||| 1.3299669027328491

tensor([2, 0, 1]) |||||| tensor([0, 1, 1])


<h3 style='color:yellow'>The following figure shows a complete example of integrating the Softmax with neural networks.</h3>
<div style="display: flex; justify-content: center;">
    <img src="nnn_softmax.png" width=500 />
</div>

In [28]:
# The following lines of code are incolpme and are used just for he illustration and showing the concept of using softmax-loss 
class NNET(nn.Module):
    def __init__(self,input_size, hidden_size, num_classes, *args, **kwargs):
        super(NNET,self).__init__(*args, **kwargs)
        self.linear=nn.Linear(input_size,hidden_size)
        self.relu=nn.ReLU()
        self.liner2=nn.Linear(hidden_size,num_classes)
    
    def forward(self,x):
        out=self.linear(x)
        out=self.relu(out)
        out=self.liner2(out)
        return out

In [29]:
model_= NNET(input_size=28*28,hidden_size=5,num_classes=3)
loss=nn.CrossEntropyLoss() # This applies also softmax
# We complete the other requirements here, including LR, EPOCH, optimizer, and training loop, and that is done once the data is available.
