

# **Part 1: Introduction to Linear Regression**

### **What is Linear Regression?**

Linear regression is a supervised learning method used to predict a **quantitative response** (a number) based on a **predictor** (input data).

Think of it as finding the "best fit" straight line through a cloud of data points. It is one of the oldest and most widely used statistical learning methods.

### **Why use it?**

Even though it is simple, it serves as the foundation for modern machine learning approaches. It helps us answer critical questions like:

1. **Is there a relationship?**
   (e.g., Does advertising spending actually increase sales?)

2. **How strong is the relationship?**
   (e.g., Does a huge budget boost sales a lot, or just a little?)

3. **Prediction:**
   (e.g., If we spend $50k on TV ads, how many units will we sell?)

### **Real-World Analogy: Taxi Fare**

Imagine predicting taxi fare ($Y$) based on distance traveled ($X$):

$$ Y ≈ \beta_0 + \beta_1X + \epsilon $$


| Term                      | Meaning                             |
| ------------------------- | ----------------------------------- |
| **Intercept ($\beta_0$)** | Base fare when you sit in the taxi. |
| **Slope ($\beta_1$)**     | Cost per kilometer.                 |
| **Error ($\epsilon$)**    | Traffic delays, route changes, etc. |

---

# **Part 2: Simple Linear Regression**

Simple linear regression predicts a response $Y$ based on a **single** predictor variable $X$. We assume there is a straight-line relationship.

### **The Formula**

$$Y \approx \beta_0 + \beta_1 X$$

### **Variable Breakdown**

| Symbol        | Name               | Explanation                                                   |
| :------------ | :----------------- | :------------------------------------------------------------ |
| **$Y$**       | Response Variable  | The output you want to predict (e.g., Sales).                 |
| **$X$**       | Predictor Variable | The input you have (e.g., TV Ad Budget).                      |
| **$\beta_0$** | Intercept          | The value of $Y$ when $X = 0$.                                |
| **$\beta_1$** | Slope              | The average increase in $Y$ for every 1-unit increase in $X$. |
| **$\approx$** | Approximation      | Means "is approximately modeled as."                          |

### **Prediction Equation**

In the real world, we don't know the true $\beta_0$ and $\beta_1$. We have to estimate them using data. Once we have estimates (denoted by "hats" $\hat{}$), we can make a prediction

$$\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$$

Where:

* **$\hat{y}$**: Predicted value
* **$\hat{\beta}_0$**: Estimated intercept
* **$\hat{\beta}_1$**: Estimated slope

---

# **Part 3: Estimating the Coefficients (Training the Model)**

How do we find the best $\beta_0$ and $\beta_1$? We find the best $\beta_0$ and $\beta_1$ by fitting the line that is closest to the actual data.

### **Residual ($e_i$)**

The difference between the **actual value** ($y_i$) and the **predicted value** ($\hat{y}_i$) is called the residual:

$$e_i = y_i - \hat{y}_i$$

If $e_i$ is positive, our model under-predicted. If negative, it over-predicted.
* If $e_i > 0$: Model under-predicted.
* If $e_i < 0$: Model over-predicted.

### **Residual Sum of Squares (RSS)**

To measure how "bad" our model is, we square all the residuals (to get rid of negatives) and add them up. This is the Residual Sum of Squares.

$$RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Expanded form:

$$RSS = (y_1 - \hat{\beta}_0 - \hat{\beta}_1 x_1)^2 + \dots + (y_n - \hat{\beta}_0 - \hat{\beta}_1 x_n)^2$$

### **Least Squares Method**

Linear regression chooses $\hat{\beta}_0$ and $\hat{\beta}_1$ that make RSS **as small as possible**.

Formulas:

$$\hat{\beta}*1 = \frac{\sum*{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}$$

$$\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$$

Where:

* **$\bar{y}$** = average of all $y$ values
* **$\bar{x}$** = average of all $x$ values

---

# **Part 4: Assessing the Accuracy of Coefficients**

We have estimates ($\hat{\beta}$), but how much can we trust them?

### **True vs Estimated Relationship**

* **Population Regression Line:**
  The actual relationship in the world
  $$Y = \beta_0 + \beta_1 X + \epsilon$$

* **Least Squares Line:**
  The relationship we estimate from our sample.

**Bias Concept:**
If we repeatedly sampled new datasets, the *average* of all estimated lines would equal the true line. This means linear regression estimates are **unbiased**.

---

### **Standard Error (SE)**

The Standard Error tells us how far our estimate ($\hat{\beta}_1$) is likely to be from the actual value ($\beta_1$) on average.

$$SE(\hat{\beta}*1)^2 = \frac{\sigma^2}{\sum*{i=1}^{n}(x_i - \bar{x})^2}$$

Where:

* **$\sigma^2$** = variance of the error term (noise)

**Key Insight:** SE decreases if:

* We have more data
* The $x$ values are more spread out

### **Confidence Intervals (CI)**

A 95% Confidence Interval is a range of values defined so that, if we repeated the experiment over and over, 95% of those intervals would contain the true unknown parameter3333.

Think of it like fishing:

Point Estimate ($\hat{\beta}_1$): Throwing a spear at a specific spot. You might miss the fish (the true value) slightly.

Confidence Interval: Casting a net. You are 95% sure the fish is somewhere inside the net.

**The Formula**:

For Linear Regression, the 95% confidence interval is approximately:

$$\text{Lower Limit} = \hat{\beta}_1 - 2 \cdot SE(\hat{\beta}_1)$$

$$\text{Upper Limit} = \hat{\beta}_1 + 2 \cdot SE(\hat{\beta}_1)$$

Or written simply:

$$\hat{\beta}_1 \pm 2 \cdot SE(\hat{\beta}_1)$$

* $\hat{\beta}_1$: Your estimated slope.

* $SE(\hat{\beta}_1)$: The Standard Error (how much the estimate wiggles on average).

* Note: The number '2' is an approximation. Precisely, it depends on the "t-distribution", but for large datasets, it is very close to 25.


---

### **Hypothesis Testing (Courtroom Analogy)**

We use Hypothesis Tests to decide if there is actually a relationship between $X$ and $Y$.

#### **Null Hypothesis ($H_0$):**

No relationship ($\beta_1 = 0$)

#### **Alternative Hypothesis ($H_a$):**

There is a relationship ($\beta_1 \neq 0$)

To test this, compute the **t-statistic**:

$$t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}$$

Interpretation:

* **t-statistic**: Measures how many standard deviations our coefficient is away from zero. **Large |t|** → strong evidence against $H_0$
* **p-value**: The probability of seeing this result by pure luck if there was no relationship. **Small p-value (< 0.05)**: It is very unlikely to be luck. We reject the null hypothesis and conclude a relationship exists.

---

# **Part 5: Assessing the Accuracy of the Model**

We know the variables are related, but how good is the model at predicting?

### **1. Residual Standard Error (RSE)**

RSE measures, on average, how far predictions are from the regression line. It is the average amount that the response (Y) deviates from the true regression line. It is a measure of the "lack of fit" in absolute units (e.g., if predicting sales, RSE is in "units sold").

$$RSE = \sqrt{\frac{1}{n-2} RSS}$$

Interpretation example:
If RSE = 3.26, predictions are typically off by about 3.26 units.

---

### **2. $R^2$ Statistic (R-Squared)**

Since RSE depends on units (dollars, meters, liters), it's hard to tell if it's "good."$R^2$ tells us what fraction of the variation in $Y$ is explained by the model, ranging from 0 to 1.

$$R^2 = 1 - \frac{RSS}{TSS}$$

Where:

* **TSS** = total variation in $Y$ (data) before regression
* **RSS** = variation left unexplained by the model

Interpretation:

* **$R^2 = 1$** — The model explains 100% of the variability (perfect model).
* **$R^2 = 0.61$** — model explains 61% of variation (Decent fit)
* **$R^2 = 0$** — model explains nothing

### **Correlation vs $R^2$**

n Simple Linear Regression, $R^2$ is actually just the correlation ($r$) squared:

$$R^2 = (Cor(X, Y))^2$$

---
