## Concept Overview

In Logistic Regression, we want to predict the probability that a given input $x$ belongs to a the positive class (1): 

$\hat{y} = P(y = 1 \mid x)$


Since probabilities must stay between $0$ and $1$, we do not simply use a linear function like

$
w^T x + b.
$

Instead, we first define the linear term $z = w^T x + b,$ and then pass it through the **Sigmoid function**, defined as

$$
\sigma(z) = \frac{1}{1 + e^{-z}}.
$$

This S-shaped curve acts as a squashing function:

- When $z$ is a very large positive number, $e^{-z} \to 0$, so  
  $$
  \sigma(z) \to 1.
  $$

- When $z$ is a very large negative number, $e^{-z}$ becomes very large, making the denominator grow significantly, so  
  $$
  \sigma(z) \to 0.
  $$

Thus, the sigmoid function ensures that the output always remains between $0$ and $1$, making it suitable for modeling probabilities.


### Sigmoid
![sigmoid-curve](../images/sigmoid-curve.png)

$\sigma(x) = \frac{1}{1 + e^{-z}}$
- Squashes input into the range (0, 1). 
- Used for binary classification where the output is the probability of a single class being positive
- Suffers from vanishing gradients â€” when inputs are very positive or very negative, the gradient becomes extremely small, slowing or stopping learning.

In [2]:
import torch

In [21]:
z_values = torch.arange(-4, 5, 2)
z_values

tensor([-4, -2,  0,  2,  4])

In [22]:
torch.sigmoid(z_values)

tensor([0.0180, 0.1192, 0.5000, 0.8808, 0.9820])