Sigmoid

$\sigma(z) = \frac{1}{1 + e^{-z}}$

$ 1 $ if $ \hat{y} > 0.5 $ otherwise $ 0 $

$\hat{y} = \sigma(xw + b)$

$loss = -\frac{1}{N} \sum_{n=1}^{N} y_n log(\hat{y}_n) + (1 - y_n)log(1 - \hat{y}_n)$

Cross entropy loss for sigmoid (logistic regression), instead of MSE for linear regression

### Example using cross entropy loss

| y  | $\hat{y}$   | loss |
|-|-|-|
| 1  | 0.2  | $ -1 * (1 * log(0.8) + 0 * log(1 - 0.8)) = -1 * log (0.2) = 0.7 $   |
| 1  | 0.8  | $ -1 * (1 * log(0.8) + 0 * log(1 - 0.8)) = -1 * log (0.8) = 0.1 $   |
| 0 | 0.1   | $ -1 * (0 * log(0.1) + 1 * log(1 - 0.1)) = -1 * log (0.9) = 0.05$ |
| 0 | 0.9   | $ -1 * (0 * log(0.9) + 1 * log(1 - 0.9)) = -1 * log (0.1) = 1 $  |

In [39]:
import torch as t
import torch.nn.functional as F
from torch.autograd import Variable as Var

In [71]:
class Model(t.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = t.nn.Linear(1, 1)
    def forward(self, x):
        # we are forcing the linear output to be of sigmoid
        y_pred = F.sigmoid(self.linear(x))
        return y_pred

In [73]:
model = Model()
criterion = t.nn.BCELoss(size_average=True)
optimizer = t.optim.SGD(model.parameters(), lr = 0.01) 
x_data = Var(t.Tensor([[1.0], [2.0], [3.0], [4.0]]))
y_data = Var(t.Tensor([[0], [0], [1.0], [1.0]]))

for epoch in range(1000):
    y_pred = model.forward(x_data)
    loss = criterion(y_pred, y_data)
    print(f'epoch = {epoch}, loss = {loss.item()}') # this deviates from the lecture
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

epoch = 0, loss = 1.1835718154907227
epoch = 1, loss = 1.1715904474258423
epoch = 2, loss = 1.1599231958389282
epoch = 3, loss = 1.1485670804977417
epoch = 4, loss = 1.1375192403793335
epoch = 5, loss = 1.1267762184143066
epoch = 6, loss = 1.116334319114685
epoch = 7, loss = 1.106189489364624
epoch = 8, loss = 1.096337914466858
epoch = 9, loss = 1.0867751836776733
epoch = 10, loss = 1.0774964094161987
epoch = 11, loss = 1.0684970617294312
epoch = 12, loss = 1.05977201461792
epoch = 13, loss = 1.0513163805007935
epoch = 14, loss = 1.0431244373321533
epoch = 15, loss = 1.0351910591125488
epoch = 16, loss = 1.0275105237960815
epoch = 17, loss = 1.0200772285461426
epoch = 18, loss = 1.012885570526123
epoch = 19, loss = 1.0059294700622559
epoch = 20, loss = 0.9992032647132874
epoch = 21, loss = 0.9927009344100952
epoch = 22, loss = 0.9864168167114258
epoch = 23, loss = 0.9803445935249329
epoch = 24, loss = 0.9744786620140076
epoch = 25, loss = 0.9688130617141724
epoch = 26, loss = 0.9633418

epoch = 445, loss = 0.693665087223053
epoch = 446, loss = 0.6933633089065552
epoch = 447, loss = 0.6930618286132812
epoch = 448, loss = 0.6927605271339417
epoch = 449, loss = 0.6924594640731812
epoch = 450, loss = 0.6921584606170654
epoch = 451, loss = 0.6918576955795288
epoch = 452, loss = 0.6915572285652161
epoch = 453, loss = 0.6912569403648376
epoch = 454, loss = 0.6909569501876831
epoch = 455, loss = 0.6906569600105286
epoch = 456, loss = 0.6903573274612427
epoch = 457, loss = 0.6900578141212463
epoch = 458, loss = 0.6897586584091187
epoch = 459, loss = 0.689459502696991
epoch = 460, loss = 0.6891606450080872
epoch = 461, loss = 0.6888620257377625
epoch = 462, loss = 0.6885635852813721
epoch = 463, loss = 0.6882653832435608
epoch = 464, loss = 0.6879673004150391
epoch = 465, loss = 0.6876693964004517
epoch = 466, loss = 0.6873718500137329
epoch = 467, loss = 0.6870744228363037
epoch = 468, loss = 0.6867771148681641
epoch = 469, loss = 0.6864801049232483
epoch = 470, loss = 0.68618

epoch = 861, loss = 0.5845710635185242
epoch = 862, loss = 0.5843451619148254
epoch = 863, loss = 0.5841193795204163
epoch = 864, loss = 0.5838937759399414
epoch = 865, loss = 0.5836683511734009
epoch = 866, loss = 0.5834430456161499
epoch = 867, loss = 0.5832178592681885
epoch = 868, loss = 0.5829929113388062
epoch = 869, loss = 0.5827680826187134
epoch = 870, loss = 0.5825434327125549
epoch = 871, loss = 0.5823189616203308
epoch = 872, loss = 0.582094669342041
epoch = 873, loss = 0.5818704962730408
epoch = 874, loss = 0.5816464424133301
epoch = 875, loss = 0.5814225077629089
epoch = 876, loss = 0.5811988115310669
epoch = 877, loss = 0.5809752345085144
epoch = 878, loss = 0.5807518362998962
epoch = 879, loss = 0.5805285573005676
epoch = 880, loss = 0.5803055167198181
epoch = 881, loss = 0.5800825357437134
epoch = 882, loss = 0.5798597931861877
epoch = 883, loss = 0.5796371698379517
epoch = 884, loss = 0.5794147253036499
epoch = 885, loss = 0.5791923999786377
epoch = 886, loss = 0.5789

In [74]:
hour_pred = Var(t.Tensor([[1.0]]))
print(f"Prediction for 1 hour {model.forward(hour_pred)}, closer to 0 is better")

Prediction for 1 hour tensor([[ 0.4925]]), closer to 0 is better


In [75]:
hour_pred = Var(t.Tensor([[7.0]]))
print(f"Prediction for 1 hour {model.forward(hour_pred)}, closer to 1 is better")

Prediction for 1 hour tensor([[ 0.9180]]), closer to 1 is better


Consider the sigmoid like an activation function similar to RELU