<a href="https://colab.research.google.com/github/arkincognito/PyTorch/blob/main/06_Softmax.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Softmax

In case of Multiclass Classification Problem:<br>

$$ P(class = i) = {e^{h_i(x)}\over \sum_j {e^{h(x)}}} $$

Since softmax values represent the probability of class being i, the sum of probabilities of all classes should be 1.<br>

$$ \sum _i P(class = i) = 1$$

In [None]:
import torch
import torch.nn.functional as F
import torch.nn as nn

torch.manual_seed(1)

<torch._C.Generator at 0x7f8bc8489b40>

In [None]:
z = torch.FloatTensor([1,2,3])
hypothesis = F.softmax(z, dim=0)
print(hypothesis)
print(hypothesis.sum())

tensor([0.0900, 0.2447, 0.6652])
tensor(1.)


# Cross Entropy Loss
Cross Entropy Loss:
$$L = {1\over N} \sum -y\log(\hat y)$$
where $y$ is the encoded value of the actual category, and $\hat y$ is the softmax value of the predicted category

Let's make a quick example of the softmax problem.
Let's make a 5 class classification problem with 3 data.

In [None]:
z = torch.rand(3,5, requires_grad=True)
hypothesis = F.softmax(z, dim=1)
print(hypothesis)

tensor([[0.2645, 0.1639, 0.1855, 0.2585, 0.1277],
        [0.2430, 0.1624, 0.2322, 0.1930, 0.1694],
        [0.2226, 0.1986, 0.2326, 0.1594, 0.1868]], grad_fn=<SoftmaxBackward>)


In [None]:
y = torch.randint(5,(3,)).long()
print(y)

tensor([0, 2, 1])


In [None]:
y_one_hot = torch.zeros_like(hypothesis)
#scatter_(dimension to follow, y.unsqueeze(1), filling in value)
y_one_hot.scatter_(1, y.unsqueeze(1), 1)
print(y_one_hot)

tensor([[1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0.]])


In [None]:
cost = (-y_one_hot * torch.log(F.softmax(z, dim=1))).sum(dim=1).mean()
print(cost)

tensor(1.4689, grad_fn=<MeanBackward0>)


Instead of ```torch.log(F.softmax(z, dim=1))```, we can use ```F.log_softmax(hypothesis)```

In [None]:
cost = (-y_one_hot * F.log_softmax(z, dim=1)).sum(dim=1).mean()
print(cost)

tensor(1.4689, grad_fn=<MeanBackward0>)


In [None]:
# Negative Log Likelihood Loss nll_loss()
F.nll_loss(F.log_softmax(z, dim=1), y)

tensor(1.4689, grad_fn=<NllLossBackward>)

#F.cross_entropy()
F.cross_entropy() combines nll_loss and log_softmax.

In [None]:
F.cross_entropy(z, y)

tensor(1.4689, grad_fn=<NllLossBackward>)

Let's try 4 parameter 3 class multiclassification problem.

In [None]:
x_train = [[1, 2, 1, 1],
           [2, 1, 3, 2],
           [3, 1, 3, 4],
           [4, 1, 5, 5],
           [1, 7, 5, 5],
           [1, 2, 5, 6],
           [1, 6, 6, 6],
           [1, 7, 7, 7]]
y_train = [2, 2, 2, 1, 1, 1, 0, 0]
x_train = torch.FloatTensor(x_train)
y_train = torch.LongTensor(y_train)

# Implementing nn.Module()

In [None]:
class SoftmaxClassifier(nn.Module):
  def __init__(self):
    super().__init__()
    # 4 input parameter, 3 classes
    self.linear = nn.Linear(4,3)

  def forward(self, x):
    return self.linear(x)

In [None]:
model = SoftmaxClassifier()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.08)
nb_epoch = 100
for epoch in range(nb_epoch+1):
  # Calculate z
  z = model(x_train)
  # Calculate cost
  cost = F.cross_entropy(z, y_train)

  # Initialize all the gradients to zero
  optimizer.zero_grad()
  # Backward Propagation
  cost.backward()
  # Update
  optimizer.step()
  if epoch % 10 == 0:
    # Take Argmax of softmax to predict the class
    prediction = F.softmax(z, dim=1).max(dim=1)[1]
    accuracy = (prediction == y_train).float().mean()
    print(f'Epoch: {epoch:4d}\t|accuracy: {accuracy:.4f}\t|cost: {cost.item():.4f}')
print('train finished')
print(f'Prediction:\t{prediction}')
print(f'Actual:\t\t{y_train}')

Epoch:    0	|accuracy: 0.2500	|cost: 1.5089
Epoch:   10	|accuracy: 0.5000	|cost: 0.8487
Epoch:   20	|accuracy: 0.6250	|cost: 0.7611
Epoch:   30	|accuracy: 0.7500	|cost: 0.7082
Epoch:   40	|accuracy: 0.7500	|cost: 0.6720
Epoch:   50	|accuracy: 0.8750	|cost: 0.6448
Epoch:   60	|accuracy: 0.8750	|cost: 0.6231
Epoch:   70	|accuracy: 0.8750	|cost: 0.6050
Epoch:   80	|accuracy: 0.8750	|cost: 0.5894
Epoch:   90	|accuracy: 0.8750	|cost: 0.5758
Epoch:  100	|accuracy: 0.8750	|cost: 0.5636
train finished
Prediction:	tensor([2, 2, 2, 1, 0, 1, 0, 0])
Actual:		tensor([2, 2, 2, 1, 1, 1, 0, 0])


# Summary

###Binary Classification Problem
- Sigmoid Function
- Binary Cross Entropy: $Cost = - {1\over m} \sum _{i=1} ^m ylog(H(x^{(i)}) + (1-y)log(H(x^{(i)}))$

###Multi Class Classification Problem
- Softmax Function
- Cross Entropy: $L = {1\over N} \sum -y\log(\hat y)$

