# C2: BASICS OF LINEAR REGRESSION

## What is Linear Regression?

- **Definition:** A supervised learning algorithm used to predict a continuous value by finding the best-fitting straight line through the data.  
- **Equation:**  
  **Simple Linear Regression:**  
  $y = b_0 + b_1x$  
  - $y$ = Predicted value  
  - $x$ = Input feature  
  - $b_0$ = Intercept  
  - $b_1$ = Slope (Change in $y$ when $x$ increases by 1)  

## Simple Linear Regression

- Uses one input feature.  
- Example: Predicting house price based only on house size.  

## Multiple Linear Regression

- Uses more than one input feature.  
- Equation:  
  $y = b_0 + b_1x_1 + b_2x_2 + \dots + b_nx_n$  
- Example: Predicting house price based on number of bedrooms, house size, location score, etc.  

## Covariance and Correlation

- **Covariance:** Shows the direction of the relationship (positive/negative) between two variables.  
  - Formula:  
    $\mathrm{Cov}(X,Y) = \dfrac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{n-1}$  
  - Positive: Both increase together.  
  - Negative: One increases while the other decreases.  

- **Correlation:** Standardized form of covariance, ranges from -1 to +1.  
  - Formula:  
    $r = \dfrac{\mathrm{Cov}(X,Y)}{\sigma_x \sigma_y}$  
  - $r = 1$ → Perfect positive correlation.  
  - $r = -1$ → Perfect negative correlation.  
  - $r = 0$ → No linear relationship.  

## Regression Analysis

- **Goal:** Find coefficients $b_0, b_1, \dots, b_n$ that minimize the prediction error.  

## Ordinary Least Squares (OLS)

- Method to find the best-fitting line by minimizing the **Sum of Squared Errors (SSE):**  
  $\mathrm{SSE} = \sum (y_i - \hat{y_i})^2$  
- The line with the minimum SSE is chosen.  

## $R^2$ and Adjusted $R^2$

- **$R^2$ (Coefficient of Determination):**  
  - Measures how much variance in $y$ is explained by the model.  
  - Range: 0 to 1 (higher is better).  
  - Formula:  
    $R^2 = 1 - \dfrac{SSE}{SST}$  

- **Adjusted $R^2$:** Corrects $R^2$ for multiple predictors.  
  - Prevents artificial inflation when adding irrelevant variables.  

## Inferences and Slope

- Hypothesis testing for slope:  
  - $H_0: b_1 = 0$ → No relationship between $x$ and $y$.  
  - $H_a: b_1 \neq 0$ → Relationship exists.  
- If **p-value < significance level** (e.g., 0.05), reject $H_0$.  

## Linear Regression with Time Series (Autoregression)

- Predicts a variable using its past values.  
- Example: Predict today’s stock price based on the last 5 days.  
