In [None]:
#30 May 2025

# 📘 Naive Logistic Regression – Manual Example

## 📊 Step 1: The Dataset

We simulate a case where we predict whether a student **passes** (`1`) or **fails** (`0`) based on the number of **hours studied**.

| Hours Studied (X) | Passed (Y) |
|-------------------|------------|
| 1                 | 0          |
| 2                 | 0          |
| 3                 | 0          |
| 4                 | 1          |
| 5                 | 1          |
| 6                 | 1          |

---

## 🧠 Step 2: Logistic Regression Model

We assume the logistic model:

$$
P(Y = 1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}
$$

Let:

- \( p_i = \) predicted probability that \( Y_i = 1 \)

The **likelihood** function:

$$
L(\beta_0, \beta_1) = \prod_{i=1}^{n} p_i^{y_i} (1 - p_i)^{1 - y_i}
$$

And the **log-likelihood** is:

$$
\log L = \sum_{i=1}^{n} \left[ y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right]
$$

To estimate \( \beta_0 \) and \( \beta_1 \), we maximize this log-likelihood.

---

## 🔢 Step 3: Estimate Parameters (Conceptually)

Fitting the logistic regression model (e.g. in R or Python) gives:

- \( \beta_0 \approx -7.98 \)
- \( \beta_1 \approx 2.21 \)

---

## 🔍 Step 4: Predict Probabilities

Use the model:

$$
p = \frac{1}{1 + e^{-(-7.98 + 2.21 \cdot X)}}
$$

### For \( X = 3 \):

$$
p = \frac{1}{1 + e^{-(-7.98 + 2.21 \cdot 3)}} = \frac{1}{1 + e^{-1.65}} \approx 0.84
$$

### For \( X = 2 \):

$$
p = \frac{1}{1 + e^{-(-7.98 + 2.21 \cdot 2)}} = \frac{1}{1 + e^{3.56}} \approx 0.0275
$$

Interpretation:
- At 2 hours → ~2.75% chance of passing
- At 3 hours → ~84% chance of passing

---

## 📈 Step 5: Decision Boundary

To classify using a threshold of 0.5:

$$
\hat{Y} =
\begin{cases}
1 & \text{if } p \geq 0.5 \\
0 & \text{if } p < 0.5
\end{cases}
$$

Find the **decision boundary** where \( p = 0.5 \):

$$
0.5 = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}
\Rightarrow \beta_0 + \beta_1 X = 0
\Rightarrow X = -\frac{\beta_0}{\beta_1}
$$

Substituting estimates:

$$
X = -\frac{-7.98}{2.21} \approx 3.61
$$

So: if a student studies **more than 3.6 hours**, we predict they will **pass**.

---

## ✅ Final Summary

- **Model**:

$$
P(\text{Pass}) = \frac{1}{1 + e^{-(-7.98 + 2.21 \cdot \text{Hours})}}
$$

- Learned using **maximum likelihood estimation**.
- Can be used to **predict probabilities** or **classify outcomes**.
- **Decision boundary**: ~3.6 hours studied.
