# 30 May 2025

# 📘 Naive Manual Statistical Modeling Example (Linear Regression)

### 🎯 Goal:
Model the relationship between **Hours Studied (X)** and **Exam Score (Y)** using a simple **linear statistical model**.

We assume the model:

$$
Y = \beta_0 + \beta_1 X + \varepsilon
$$

We will estimate \( \beta_0 \) and \( \beta_1 \) manually using the **least squares method**.

---

## 🔢 Step 1: Small Dataset (n = 5)

| Student | Hours Studied (X) | Exam Score (Y) |
|---------|-------------------|----------------|
| 1       | 1                 | 52             |
| 2       | 2                 | 54             |
| 3       | 3                 | 57             |
| 4       | 4                 | 59             |
| 5       | 5                 | 60             |

---

## 🧮 Step 2: Compute Means

$$
\bar{X} = \frac{1 + 2 + 3 + 4 + 5}{5} = 3.0
$$

$$
\bar{Y} = \frac{52 + 54 + 57 + 59 + 60}{5} = 56.4
$$

---

## 🧮 Step 3: Compute Slope \( \beta_1 \)

$$
\beta_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}
$$

| \( X_i \) | \( Y_i \) | \( X_i - \bar{X} \) | \( Y_i - \bar{Y} \) | \( (X_i - \bar{X})(Y_i - \bar{Y}) \) | \( (X_i - \bar{X})^2 \) |
|----------|-----------|---------------------|---------------------|-------------------------------------|------------------------|
| 1        | 52        | -2.0                | -4.4                | 8.8                                 | 4.0                    |
| 2        | 54        | -1.0                | -2.4                | 2.4                                 | 1.0                    |
| 3        | 57        |  0.0                |  0.6                | 0.0                                 | 0.0                    |
| 4        | 59        |  1.0                |  2.6                | 2.6                                 | 1.0                    |
| 5        | 60        |  2.0                |  3.6                | 7.2                                 | 4.0                    |

- Numerator: \( 8.8 + 2.4 + 0 + 2.6 + 7.2 = 21.0 \)  
- Denominator: \( 4 + 1 + 0 + 1 + 4 = 10.0 \)

$$
\beta_1 = \frac{21.0}{10.0} = 2.1
$$

---

## 🧮 Step 4: Compute Intercept \( \beta_0 \)

$$
\beta_0 = \bar{Y} - \beta_1 \bar{X} = 56.4 - 2.1 \cdot 3 = 56.4 - 6.3 = 50.1
$$

---

## 📘 Final Estimated Model:

$$
\hat{Y} = 50.1 + 2.1 \cdot X
$$

---

## 🧪 Example Prediction

**Predict exam score for a student who studied 6 hours:**

$$
\hat{Y} = 50.1 + 2.1 \cdot 6 = 62.7
$$

---

## ✅ Summary

We manually estimated a linear regression model using a small dataset.

- We assumed the form: \( Y = \beta_0 + \beta_1 X + \varepsilon \)
- Estimated:
  - Intercept: \( \beta_0 = 50.1 \)
  - Slope: \( \beta_1 = 2.1 \)

**Final model equation:**

$$
\text{Exam Score} = 50.1 + 2.1 \cdot \text{Hours Studied}
$$



---

> 📌 Want to extend? Try calculating residuals, sum of squared errors, or R² by hand!
