### 10 Outputs

$ XW = \hat{y} $

$\begin{bmatrix}
  a_1 & b_1\\
  a_2 & b_2 \\
  ... \\
  a_n & b_n
\end{bmatrix} 
\begin{bmatrix}
w_1 ...  w_{10} \\ w_{11} ... w_{20}
\end{bmatrix}
=
\begin{bmatrix}
 y_1 ... y_{10} \\
 ... \\
 y_n ... y_{10n}
\end{bmatrix}
$

### Softmax

$
\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K} e^{z_k}} \forall j = 1, ... k
$

Use output as probability, softmax is like sigmoid that fits between 0 / 1, probablilty of output occuring

Takes a linear output first, then softmax transforms the linear output (logit)

Cross entropy is the loss

$D(\hat{y}, y) = -ylog(\hat{y})$

$\alpha = \frac{1}{n} \sum_{i} D(\sigma(wx_i + b), y_i) $ (Sum of loss) (sigma is softmax function)

$\hat{y} = wx_i + b$

In [1]:
import numpy as np

In [2]:
Y = np.array([1, 0, 0])
Y_pred1 = np.array([0.7, 0.2, 0.1]) # generated by the sigma function, softmax values
Y_pred2 = np.array([0.1, 0.3, 0.6])

print(f"loss 1 = {np.sum(-Y * np.log(Y_pred1))}")
print(f"loss 2 = {np.sum(-Y * np.log(Y_pred2))}")

loss 1 = 0.35667494393873245
loss 2 = 2.3025850929940455


In [3]:
import torch as t
import torch.nn as nn

In [4]:
loss = nn.CrossEntropyLoss()

In [5]:
Y = t.Tensor([0]).long() # needs to not be 1 hot but input, 0, 1, or 2, a singular class
Y_pred1 = t.Tensor([[2.0, 1.0, 0.1]])
Y_pred2 = t.Tensor([[0.5, 2.0, 0.3]])
l1 = loss(Y_pred1, Y) # feed logit values directly, cross entropy loss has log softmax in it
l2 = loss(Y_pred2, Y)

In [6]:
print(f"loss 1 = {l1.item()}")
print(f"loss 2 = {l2.item()}")

loss 1 = 0.41703000664711
loss 2 = 1.840616226196289


#### Batch

In [7]:
# classes
Y = t.Tensor([2, 0, 1]).long()
# logits
Y_pred1 = t.Tensor([
    [0.1, 0.2, 0.9],
    [1.1, 0.1, 0.2],
    [0.2, 2.1, 0.1]
])
Y_pred2 = t.Tensor([
    [0.8, 0.2, 0.3],
    [0.2, 0.3, 0.5],
    [0.2, 0.2, 0.5]
])
l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)
print(f"loss 1 = {l1.item()}")
print(f"loss 2 = {l2.item()}")

loss 1 = 0.4966353476047516
loss 2 = 1.2388995885849


https://ljvmiranda921.github.io/notebook/2017/08/13/softmax-and-the-negative-log-likelihood/

https://stackoverflow.com/questions/49390842/cross-entropy-in-pytorch


https://jamesmccaffrey.wordpress.com/2016/09/25/log-loss-and-cross-entropy-are-almost-the-same/

In [8]:
t.nn.Softmax(dim=1)(t.Tensor([[0.1, 0, 0, 1]]))

tensor([[ 0.1898,  0.1717,  0.1717,  0.4668]])

In [9]:
loss(t.Tensor([[0.1, 0, 0, 1]]), t.Tensor([3]).long())

tensor(0.7619)

The linear output [0.1,0,0,1] is transformed into probabilities (sums to 1), where the class dictates which label (0-3) should be the label of an instance, class should be 3 (3rd column), therefore, loss is -1 + log(exp(0.1) + exp(0) + exp(0) + exp(1)) which is = 0.7619

The key idea here is saying, the biggest number leads to the largest log probability, meaning we want the class to match of the column with the biggest number otherwise loss (error) is high, and the other columns to have much smaller numbers than the class column otherwise our confidence is not great

#### Same ways to calculate loss by hand (same result)

Using logits, -1 is the logit of class 3 (3rd column value)

In [10]:
-1 + np.log(np.exp(0.1) + np.exp(0) + np.exp(0) + np.exp(1))

0.7618933412552511

Using probabilty

In [11]:
-1 * np.log(0.4668)

0.7618543785697361

http://willwolf.io/2017/05/18/minimizing_the_negative_log_likelihood_in_english/

BCE for logistic regression (binary probabilities)
CE for softmax (multi class probabilities)

https://pytorch.org/docs/master/nn.html?highlight=nllloss

LogSoftMax + NLLLoss is CrossEntropyLoss

In [31]:
import torch.nn.functional as F

class Net(t.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(784, 520)
        self.l2 = nn.Linear(520, 320)
        self.l3 = nn.Linear(320, 240)
        self.l4 = nn.Linear(240, 120)
        self.l5 = nn.Linear(120, 10)
        
    def forward(self, x):
        x = x.view(-1, 784) # -1 mean dim is inferred from 28 x 28, flattened
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        return self.l5(x) # just net outputs

In [32]:
from torchvision import datasets, transforms

# Training settings
batch_size = 64

# MNIST Dataset
train_dataset = datasets.MNIST(root='./mnist_data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='./mnist_data/',
                              train=False,
                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = t.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = t.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

In [34]:
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = t.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        output = model.forward(data)
        loss = criterion(output, target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))

def test():
    model.eval() # put in eval mode, no gradients
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        output = model(data)
        # sum up batch loss
        test_loss += criterion(output, target).data[0]
        # get the index of the max
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


for epoch in range(1, 10):
    train(epoch)
    test()



  app.launch_new_instance()







Test set: Average loss: 0.0261, Accuracy: 5546/10000 (55%)


Test set: Average loss: 0.0070, Accuracy: 8688/10000 (86%)




Test set: Average loss: 0.0047, Accuracy: 9107/10000 (91%)


Test set: Average loss: 0.0034, Accuracy: 9406/10000 (94%)




Test set: Average loss: 0.0027, Accuracy: 9505/10000 (95%)


Test set: Average loss: 0.0022, Accuracy: 9597/10000 (95%)




Test set: Average loss: 0.0020, Accuracy: 9619/10000 (96%)




Test set: Average loss: 0.0019, Accuracy: 9633/10000 (96%)


Test set: Average loss: 0.0017, Accuracy: 9696/10000 (96%)



In [35]:
t.save(model.state_dict(), './mnist_model')

#### ToDo Exercise 9-2