### SoftMax_CrossEntropyLoss - Theory

In [1]:
import numpy as np
import torch
from torch import nn

```
                |-> 2.0 |                 |-> .65 |                   
    Linear----> |-> 1.0 |---> Softmax --->|-> .25 |----> CrossEntopy(y, y_pred)
                |-> 0.1 |                 |-> .10 |

```

### The softmax
* The sum of the probabilities of each outcome should be **`1`**
* Softmax applies the exponential function to each element and normalizes by dividing by the sum of all these exponents

In [2]:
def softmax(x):
    return np.exp(x) / np.sum(np.exp(x), axis=0)

In [4]:
x = np.array([2., 1., 0.1], dtype='float32')
output = softmax(x)
output

array([0.6590011 , 0.24243295, 0.09856589], dtype=float32)

> The sum of these probabilities adds up to one 

In [7]:
np.sum(output, axis=0) # which is close to one

0.99999994

### Softmax using torch buildin function

In [8]:
a = torch.from_numpy(x)
a

tensor([2.0000, 1.0000, 0.1000])

In [10]:
output = torch.softmax(a, dim=0)
output

tensor([0.6590, 0.2424, 0.0986])

In [11]:
torch.sum(output) # this is exactly 1

tensor(1.0000)

### The Cross entropy
`Cross-entropy` loss, or ``log`` loss, measures the performance of a classification model whose output is a probability value between 0 and 1. 
- loss increases as the predicted probability diverges from the actual label

In [12]:
def cross_entropy(actual, predicted):
    EPS = 1e-15
    predicted = np.clip(predicted, EPS, 1 - EPS)
    loss = -np.sum(actual * np.log(predicted))
    return loss

> Labels must be ``one hot encoded``

```
       class_0: [1 0 0]
       class_1: [0 1 0]
       class_2: [0 0 1]
```
**Example:**

In [13]:
y = np.array([1, 0, 0])
y_pred_good = np.array([0.8, 0.1, 0.1])
y_pred_bad = np.array([0.4, 0.5, 0.1])

In [15]:
loss_1 = cross_entropy(y, y_pred_good)
loss_2 = cross_entropy(y, y_pred_bad)

print(loss_1, loss_2)

0.2231435513142097 0.916290731874155


#### Pytorch CrossEntropyLoss
* CrossEntropyLoss in PyTorch (applies Softmax)
* ``nn.LogSoftmax + nn.NLLLoss``
* **NLLLoss** -> **N**egative **L**og **L**ikelihood **L**oss

In [16]:
loss = nn.CrossEntropyLoss()

In [17]:
# target is of size nSamples = 1
# each element has class label: 0, 1, or 2
# Y (=target) contains class labels, not one-hot

Y = torch.tensor([0])

In [19]:
Y_pred_good = torch.tensor([[2.0, 1.0, 0.1]])
Y_pred_bad = torch.tensor([[0.5, 2.0, 0.3]])
l1 = loss(Y_pred_good, Y)
l2 = loss(Y_pred_bad, Y)

print(l1, l2)

tensor(0.4170) tensor(1.8406)


### ``Binary classification`` vrs ``Multiclass problem``

* **Binary classification** - apply ``sigmoid`` as an activation function to the output layer of the ``NN``
    * ``loss = nn.BCELoss()``

* **Multiclass problem** - do not apply softmax or  at the last layer because `CrossEntropyLoss()` applies softmax.
    * ``loss = nn.CrossEntropyLoss()``