- Target probability: $\theta \in [0, 1]$

- Probability: $P(X = 1) = \theta,\quad P(X = 0) = 1 - \theta$

- Likelihood: $$P(x_1, x_2,\ldots, x_N) = \prod_{n=1}^N P(x_n)$$

where $x_i\in \{0, 1\}$

## Bernoulli distribution

$$P(y|\theta) = \theta^y(1 - \theta)^{1-y}$$

So

$\begin{align}
P(y = 0 |\theta) &= \theta^0(1 - \theta)^{1-0}\\
&= (1 - \theta)
\end{align}$

and

$\begin{align}
P(y = 1 |\theta) &= \theta^1(1 - \theta)^{1-1}\\
&= \theta
\end{align}$

- Function $\theta$

Consider

$$P(Y|\theta) = \prod_{n=1}^N P(y_n|\theta) = \prod_{n=1}^N \theta^y(1 - \theta)^{1-y}$$

and

$$\hat{\theta} = \underset{\theta}{\operatorname{argmax}} P(Y|\theta)$$

## How to Determine $\theta$

- The log likelihood function is given by

$$l(\theta) = ln(P(Y|\theta)) = \sum_{n = 1}^N y_n ln(\theta) + (1 + y_n)ln(1-\theta)$$

## Cross Entropy Loss

What's the problem with using the cost function generated using the threshold function: It is flat in some  regions

- maximum likelihood:

$$P(Y|w, b) = \prod_{n=1}^N P(y_n|wx_n + b)$$

$$(w^*, b^*) = \underset{w, b}{\operatorname{argmax}} P(Y|w, b)$$

- cross entropy loss:
$$\mathscr{l}(\theta) = -\frac{1}{N}\sum_{n = 1}^N y_n ln(\theta) + (1 + y_n)ln(1-\theta)$$

In [1]:
import torch
from torch.nn import Linear
torch.manual_seed(1)
import torch.nn as nn

from torch import nn, optim
from torch.utils.data import Dataset, DataLoader

In [2]:
class logistic_regression(nn.Module):
    def __init__(self, input_size):
        super(logistic_regression, self).__init__()
        self.linear = nn.Linear(input_size, 1)
        
    def forward(self, x):
        yhat = torch.sigmoid(self.linear(x))
        return yhat

In [3]:
#Data
class Data2D(Dataset):
    def __init__(self):
        self.x = torch.zeros(20, 2)
        self.y = torch.zeros(20, 1)
        self.x[:, 0] = torch.arange(-1, 1, 0.1)
        self.x[:, 1] = torch.arange(-1, 1, 0.1)
        self.w = torch.tensor([[1.0], [1.0]])
        self.b = 1
        self.f = torch.mm(self.x, self.w) + self.b
        self.y[self.f < 0] = 1
        self.len = self.x.shape[0]
        
    def __getitem__(self, index):
        return self.x[index], self.y[index]
    
    def __len__(self):
        return self.len

```python
criterion = nn.CrossEntropyLoss() #cross entropy loss
```

In [4]:
def criterion(yhat, y):
    out = -1 * torch.mean(y * torch.log(yhat) + (1 - y) * torch.log(1 - yhat))
    return out

In [5]:
data_set = Data2D()
trainloader = DataLoader(dataset = data_set, batch_size = 1)
model = logistic_regression(input_size= 2)
optimizer = optim.SGD(model.parameters(), lr = 0.1)

In [6]:
for epoch in range(100):
    for x, y in trainloader:
        
        yhat = model(x)
        loss = criterion(yhat, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

In [7]:
for p in model.parameters():
    if p.requires_grad:
         print(p.name, p.data)

None tensor([[-3.6910, -4.3674]])
None tensor([-4.3696])


In [9]:
X = torch.tensor([[1.0, 1.0]])
model(X)

tensor([[4.0047e-06]], grad_fn=<SigmoidBackward0>)