# Logistic Regression ‚Äî Improving Perceptron Using Probabilistic Updates

---

## 00:00‚Äì01:36 ‚Äî Recap: What Was the Problem with Perceptron?

We are studying **Logistic Regression**.

In the previous videos, we studied the **Perceptron Trick** and identified a **fundamental flaw** in its algorithm.

### Perceptron Logic (Recap)

- Run a loop for many iterations
- Randomly pick a data point
- Ask:
  - Is this point **misclassified**?
    - Yes ‚Üí move the line toward the point
    - No ‚Üí do nothing

Eventually, we get a line that separates both classes.

### Problem

Once **all points are correctly classified**, the algorithm **stops learning**.

- Even if the margin is poor
- Even if the decision boundary can be improved

But **Logistic Regression** continues to improve and finds a **better-balanced boundary**.

üëâ The flaw is **not in data**, it is **in the algorithm itself**.

---

## 01:36‚Äì02:43 ‚Äî Core Idea: Change the Algorithm

Earlier:
- Only **misclassified points** affected the line

Now:
- **Every point** will affect the line

### New Strategy

- Misclassified points ‚Üí **pull** the line toward themselves
- Correctly classified points ‚Üí **push** the line away from themselves

So learning **never stops**, even after perfect classification.

---

## 02:43‚Äì05:00 ‚Äî Distance-Based Influence (Key Insight)

The **magnitude of push or pull** depends on **distance from the line**.

### Misclassified Points

- Close to line ‚Üí small pull
- Far from line ‚Üí strong pull

### Correctly Classified Points

- Close to line ‚Üí strong push
- Far from line ‚Üí weak push

This ensures:
- Boundary moves until it reaches a **stable central position**
- Similar behavior to Logistic Regression

---

## 05:00‚Äì07:21 ‚Äî Summary of the New Learning Rule

For **every point**:

- Check if point is:
  - Correctly classified ‚Üí push
  - Incorrectly classified ‚Üí pull
- Strength of update depends on:
  - Distance from decision boundary

This creates a **continuous learning dynamic**.

---

## 07:21‚Äì11:33 ‚Äî Revisiting the Perceptron Update Equation

Original Perceptron update:

$$
\mathbf{w}_{new} = \mathbf{w}_{old} + \eta (y - \hat{y}) \mathbf{x}
$$

### Four Cases

| True $y$ | Predicted $\hat{y}$ | $(y - \hat{y})$ | Effect |
|--------|------------------|---------------|--------|
| 1 | 1 | 0 | No update |
| 0 | 0 | 0 | No update |
| 1 | 0 | 1 | Pull |
| 0 | 1 | -1 | Push |

‚ùå **Problem**  
When $(y - \hat{y}) = 0$, the update becomes **zero** ‚Üí learning stops.

---

## 11:33‚Äì14:51 ‚Äî Why Step Function Breaks Learning

Prediction was made using a **step function**:

$$
\hat{y} =
\begin{cases}
1 & z \ge 0 \\
0 & z < 0
\end{cases}
$$

Where:

$$
z = \mathbf{w} \cdot \mathbf{x}
$$

### Issue

- Output is **discrete**: $0$ or $1$
- Causes $(y - \hat{y}) = 0$
- Update becomes zero
- Learning halts

üëâ We must **replace the step function**

---

## 14:51‚Äì17:42 ‚Äî Introducing the Sigmoid Function

We replace the step function with the **sigmoid function**:

$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$

### Properties

- Output range: $(0, 1)$
- Smooth and continuous
- Never exactly $0$ or $1$

Key values:

- $z \to +\infty \Rightarrow \sigma(z) \to 1$
- $z \to -\infty \Rightarrow \sigma(z) \to 0$
- $z = 0 \Rightarrow \sigma(z) = 0.5$

---

## 17:42‚Äì20:44 ‚Äî New Prediction Mechanism

Now prediction is:

$$
\hat{y} = \sigma(\mathbf{w} \cdot \mathbf{x})
$$

Instead of discrete labels, we get:

$$
\hat{y} \in (0,1)
$$

Interpretation:

- $\hat{y}$ = **probability of positive class**

---

## 20:44‚Äì23:34 ‚Äî Probabilistic Interpretation

Let:

- Event $A$: Placement happens

Then:

$$
P(A) = \hat{y}
$$

And:

$$
P(\text{No Placement}) = 1 - \hat{y}
$$

### Decision Boundary Meaning

- On the line: $z = 0 \Rightarrow \hat{y} = 0.5$
- Above the line ‚Üí higher probability of placement
- Below the line ‚Üí lower probability

The entire space becomes a **probability gradient**.

---

## 23:34‚Äì26:21 ‚Äî Why Sigmoid Fixes the Perceptron Problem

Now:

$$
y - \hat{y} \neq 0
$$

So update never vanishes.

### New Update Rule

$$
\mathbf{w}_{new} = \mathbf{w}_{old} + \eta (y - \sigma(z)) \mathbf{x}
$$

- Correctly classified points still update
- Misclassified points update more strongly
- Distance controls magnitude automatically

---

## 26:21‚Äì30:28 ‚Äî Example with Four Points

Assume four points with sigmoid outputs:

| Point | $\hat{y}$ |
|-----|----------|
| P1 | 0.80 |
| P2 | 0.65 |
| P3 | 0.30 |
| P4 | 0.15 |

Then:

$$
y - \hat{y} \neq 0 \quad \forall \text{ points}
$$

So **every point contributes** to learning.

---

## 30:28‚Äì34:17 ‚Äî Distance-Based Magnitude (Mathematical Proof)

- Larger $|z|$ ‚Üí sigmoid closer to $0$ or $1$
- Smaller $|z|$ ‚Üí sigmoid closer to $0.5$

Hence:

- Near boundary ‚Üí strong updates
- Far from boundary ‚Üí weak updates

Exactly what we wanted.

---

## 34:17‚Äì39:40 ‚Äî Experimental Result

Three models compared:

- Red line ‚Üí Pure Perceptron (step function)
- Brown line ‚Üí Modified Perceptron + Sigmoid
- Black line ‚Üí True Logistic Regression

Result:

- Brown > Red (improvement achieved)
- Brown < Black (still not perfect)

Meaning:
- Direction is correct
- Solution is not complete yet

---

## 39:40‚Äì40:32 ‚Äî Conclusion

- Replacing step function with sigmoid:
  - Prevents zero updates
  - Enables continuous learning
  - Introduces probability interpretation
- Algorithm improves significantly
- Still not fully Logistic Regression

üëâ Remaining gap will be fixed using:
- Proper loss function
- Gradient Descent

**Next video completes the model.**

---
