# Linear Regression

- One of the simplest and widely used algorithm for predictive modeling
- Tries to build a relationship between 
    - independent variable (input) `X` 
    - and dependent variable (output) `Y`

__Equations:__

_For simple data with single feature:_
$$\text{Y} = \text{mX + c}$$
- m: slope (or weight)  
- c: intercept (or bias term)

_For Multiple Linear Regression (multiple features):_
$$ \hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + .... + \theta_n x_n $$

- $\hat{y}$: output (predicted value)  
- $\theta_0 + \theta_1 x_1 + \theta_2 x_2 + .... + \theta_n x_n$: `weighted sum`  
- $X_i$: feature value of ith feature   
- $\theta_j$: model parameters, $\theta_0$ is `bias term` and $\theta_1,\theta_2...$ are `Feature weights`

_Two more representation of this function_

1. Vectorized form:  
$\theta$ and x are taken as vectors, $$\hat{Y} = h_\theta \cdot (x) = \theta \cdot X$$ 


$\theta$: Model parameters vector ($\theta_0$ bias term + $\theta_1,\theta_2...$ features wieight)  
$x$ : feature vector, containing feature value of an instance. $x_0$ is always $1$, $x_1$ to $x_n$ containg feature values  
$\hat{Y}$: output is a scalar,    
$h_\theta$: `hypothesis function` uses model parameters $\theta$


2. Matrix form:

$$
\hat{Y} = \theta^\top X
$$

$$
\theta = 
\begin{bmatrix}
\theta_0 \\
\theta_1 \\
\vdots \\
\theta_n
\end{bmatrix}
\quad,\quad

X = 
\begin{bmatrix}
x_1^{(0)} & x_2^{(0)} & \dots & x_n^{(0)} \\
x_1^{(1)} & x_2^{(1)} & \dots & x_n^{(1)} \\
x_1^{(2)} & x_2^{(2)} & \dots & x_n^{(2)} \\
\vdots & \vdots & \ddots & \vdots \\
x_1^{(m)} & x_2^{(m)} & \dots & x_n^{(m)}
\end{bmatrix}
$$




- $\theta$: model parameters (θ₀ = bias, θ₁…θₙ = weights)  
- $x$: feature vector (x₀ = 1 for bias)  
- $\theta^T x$: matrix multiplication of θᵀ and x gives prediction ŷ  
- $\hat{y}$: is a vector of shape 1x1

Note: The shape of $X$ is mxn , m:number of rows (instances), n:number of columns(Feature)  
      The shape of $Y$ is mx1


### ___How do we train a model?___

what does `Training` a model refer to?  
Training a model refers to setting parameters such that the model best fits the training data.

What does `fitting` a model mean?  
Fitting refers to setting parameters such that the predictions of model closely matches the outputs (present in training set)

To find the best parameters, we need a measure of how good or bad the model is predicting, And that measure is called `Cost function or Loss function`  
- The most common performance measure is RMSE (for linear reg). 
- but as a cost function we gonna use MSE, because its easier to work with MSE and its gives the same result (because the value that minimizes MSE, also minimizes its root)
- Hence we have to find best $\theta$ that minimizes MSE

Note- This is often done in ML, that we dont use acutally commonly used perfomance matrix while evaluating final model, instead we use an easier-to-optimize function.

### ___MSE (Mean Squared Error)___
: cost function for a linear regression model

$$
MSE(\theta) = \frac{1}{m} \sum_{i=1}^{m} \left( \hat{y}^{(i)} - y^{(i)} \right)^2
$$

- m: Total number of instances
- $\hat{y}^{(i)}$: predicted value of ith instance. $\hat{y}^{(i)} = h_\theta (x_i)$
- $y^{(i)}$: Actual value (Ground truth) of that instance

_Now we have to find a parameter ($\theta$) that reduces the cost function_
