In [1]:
import torch
import torch.nn.functional as F

## Softmax

With softmax we are trying to set the sum of all the classes equals to 1.0

![](https://i.imgur.com/TfRJAtc.png)

In [2]:
data = torch.FloatTensor([1, 2, 1])
F.softmax(data, dim=-1)

tensor([0.2119, 0.5761, 0.2119])

In [3]:
data = torch.FloatTensor([
    [1, 2, 1],
    [2, 3, 4]
])
F.softmax(data, dim=-1)

tensor([[0.2119, 0.5761, 0.2119],
        [0.0900, 0.2447, 0.6652]])

In [4]:
# let implement a softmax function
def calc_softmax(data, dim=-1):
    e_data = torch.exp(data)
    return e_data / torch.sum(e_data, dim=dim, keepdim=True)

data = torch.FloatTensor([1, 2, 1])
calc_softmax(data)

tensor([0.2119, 0.5761, 0.2119])

In [5]:
data = torch.FloatTensor([
    [1, 2, 1],
    [2, 3, 4]
])
calc_softmax(data)

tensor([[0.2119, 0.5761, 0.2119],
        [0.0900, 0.2447, 0.6652]])

## Cross Entropy Loss

Here's how we define this.

![](https://i.imgur.com/244C1AF.png)

here `y` is the one-hot-encoded version of the labels & other is the yhat which are the predictions. Usually, we need to use softmax before passing them to here.

In [6]:
Y = torch.tensor([1, 0, 2])
yhat = torch.FloatTensor([
    [0.1, 0.8, 0.7],
    [0.4, 0.5, 0.5],
    [0.9, 0.1, 0.5]
])
yhat

tensor([[0.1000, 0.8000, 0.7000],
        [0.4000, 0.5000, 0.5000],
        [0.9000, 0.1000, 0.5000]])

In [7]:
y_one_hot = F.one_hot(Y, 3)
y_one_hot

tensor([[0, 1, 0],
        [1, 0, 0],
        [0, 0, 1]])

In [8]:
F.cross_entropy(yhat, Y)

tensor(1.0646)

**Cross Entropy Implementation in Simpler Terms**

Based on what's available on [D2L.ai](https://d2l.ai/chapter_linear-classification/softmax-regression.html#log-likelihood)

![](https://i.imgur.com/ivCsEX6.png)

In [9]:
yhat

tensor([[0.1000, 0.8000, 0.7000],
        [0.4000, 0.5000, 0.5000],
        [0.9000, 0.1000, 0.5000]])

In [10]:
yhat_s = calc_softmax(yhat)
yhat_s

tensor([[0.2068, 0.4164, 0.3768],
        [0.3115, 0.3443, 0.3443],
        [0.4718, 0.2120, 0.3162]])

In [11]:
torch.log(yhat_s)

tensor([[-1.5761, -0.8761, -0.9761],
        [-1.1664, -1.0664, -1.0664],
        [-0.7513, -1.5513, -1.1513]])

In [12]:
torch.log(yhat_s) * y_one_hot

tensor([[-0.0000, -0.8761, -0.0000],
        [-1.1664, -0.0000, -0.0000],
        [-0.0000, -0.0000, -1.1513]])

In [13]:
(torch.log(yhat_s) * y_one_hot).sum(-1)

tensor([-0.8761, -1.1664, -1.1513])

In [14]:
(torch.log(yhat_s) * y_one_hot).sum(-1).mean() * -1

tensor(1.0646)

In [15]:
def calc_cross_entropy(yhat, targets):
    yhat_s = calc_softmax(yhat)
    return (torch.log(yhat_s) * targets).sum(-1).mean() * -1

calc_cross_entropy(yhat, y_one_hot)

tensor(1.0646)

## Binary Cross Entropy Loss

We can use this when there's only two categories or classes.

![](https://i.imgur.com/4ZyPKzs.png)

In [16]:
yhat_b = torch.FloatTensor([0, 1])
target_b = torch.FloatTensor([0, 1])
F.binary_cross_entropy(yhat_b, target_b, reduction="mean")

tensor(0.)

If the values are the same, the loss in 0.

In [17]:
yhat_b = torch.FloatTensor([1.0, 0])
target_b = torch.FloatTensor([0, 1])
F.binary_cross_entropy(yhat_b, target_b, reduction="mean")

tensor(100.)

Otherwise the loss in 100.

In [18]:
yhat_b = torch.FloatTensor([0.2, 0.99])
target_b = torch.FloatTensor([0, 1])
F.binary_cross_entropy(yhat_b, target_b, reduction="mean")

tensor(0.1166)

In between values results in something in between.

### Let's Implemen this Our Selves.


**Making Safe Log**

ln(0) is -inf.
That's not helpful in our case.

So, here's how we define a safe log function which sets the maximum value to somewhere around -500.

Basically that's similar to giving something like log(0.000000001)

In [19]:
def safe_log(k):
    return torch.maximum(torch.log(k), torch.tensor(-500))

In [20]:
def calc_binary_cross_entropy(yhat, targets):    
    loss =  safe_log(yhat) * (targets) + safe_log(1.0 - yhat) * (1.0 - targets) 
    return loss.mean() * -1

In [21]:
yhat_b = torch.FloatTensor([0, 1])
target_b = torch.FloatTensor([0, 1])
calc_binary_cross_entropy(yhat_b, target_b)

tensor(-0.)

In [22]:
yhat_b = torch.FloatTensor([1.0, 0])
target_b = torch.FloatTensor([0, 1])
calc_binary_cross_entropy(yhat_b, target_b)

tensor(500.)

In [23]:
yhat_b = torch.FloatTensor([0.2, 0.99])
target_b = torch.FloatTensor([0, 1])
calc_binary_cross_entropy(yhat_b, target_b)

tensor(0.1166)

Here we got 500 instead of 100 in the Pytorch's version. That's because of the -500 value we choose for the safe log.