# Logistic Regression

Binary Prediction (0 or 1) is very useful!

* Spent $N$ hours for study, pass or fail ?
* GPA and GRE scores for the HKUST PHD program, admin or not ?
* Soccer game against Japan, win or lose ?
* She/He looks good, propose or not ?
* ...

In these cases, we are facing binary classification problems.

A Simple solution to turn a linear model into a binary classification model is to keep the linear hypothesis as is with two parameters but wrap the linear output with the sigmoid function (no parameters), so we are optimizing the same paramaters but the updating will change to make the results closer to one or zero, basically outputting probabilities. here is a visualization:

<img src="sigmoid.png" />

If $\hat{y} >= 0.5$ then our final prediction will be $1$, and if else we mark the prediction as $0$ (we do this after training our model, not on training).

Now that we've introduced the sigmoid function, we need a new loss function that reflects the classification nature of the problem, that's why we introduce cross entropy (or binary entropy).

<img src="cross_entropy.png" />

The loss function will be small if the predicted probability is closer to the target, and high if it's far away.

## Implementation

In [61]:
import torch
from torch import nn, Tensor
from torch.nn import Module
from torch.autograd import Variable
import numpy as np

### Data Definition

In [68]:
X = Variable(Tensor([[-4.], [-3.], [-2.], [-1.], [0.], [.1], [.2], [.3], [.4], [.5]]))
y = Variable(Tensor([[0.], [0.], [0.], [0.], [1], [1], [1], [1], [1], [1]]))

In [69]:
X.shape, y.shape

(torch.Size([10, 1]), torch.Size([10, 1]))

Let's implement Logistic Regression:

In [70]:
class LogisticRegression(nn.Module):
    def __init__(self):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(in_features=1, out_features=1)
        self.activation = nn.Sigmoid()
    
    def forward(self, x):
        y_hat = self.activation(self.linear(x))
        return y_hat

In [71]:
del(m)
m = LogisticRegression()

In [72]:
# loss function, optimizer.
criterion = nn.BCELoss(size_average=True)
optimizer = torch.optim.RMSprop(params=m.parameters(), lr=0.01)

In [73]:
# now we train the shit.
for epoch in range(100000):
    # Forward Propagation.
    y_hat = m.forward(X)
    
    # calculate the loss.
    loss = criterion(y_hat, y)
    
    if epoch % 10000 == 0: 
        print(epoch,'Y_hat: ', 
              str(y_hat.data.numpy().reshape((10,))), 
              ' / Y: ', str(y.data.numpy().reshape((10,))),
             ' / Weights: ', str([w.data.numpy().reshape((1,))[0] for w in m.parameters()])
             )

    # Back propagation.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 Y_hat:  [0.1402359  0.21750474 0.3214312  0.44666928 0.57906276 0.59199864
 0.6048082  0.6174755  0.62998515 0.6423227 ]  / Y:  [0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]  / Weights:  [0.53306484, 0.3189273]
10000 Y_hat:  [0.000000e+00 0.000000e+00 3.222511e-25 2.332121e-09 1.000000e+00
 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00]  / Y:  [0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]  / Weights:  [36.51798, 16.64149]
20000 Y_hat:  [0.0000000e+00 0.0000000e+00 2.9510331e-26 7.0474915e-10 1.0000000e+00
 1.0000000e+00 1.0000000e+00 1.0000000e+00 1.0000000e+00 1.0000000e+00]  / Y:  [0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]  / Weights:  [37.71188, 16.6387]
30000 Y_hat:  [0.0000000e+00 0.0000000e+00 1.0223037e-26 4.1515652e-10 1.0000000e+00
 1.0000000e+00 1.0000000e+00 1.0000000e+00 1.0000000e+00 1.0000000e+00]  / Y:  [0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]  / Weights:  [38.24279, 16.640425]
40000 Y_hat:  [0.0000000e+00 0.0000000e+00 5.1320995e-27 2.9450070e-10 1.0000000e+00
 1.0000000e+00 1.0000000e+00 1.00000