# Cost Function for Linear Regression

The cost function $ J(\beta) $ measures how well the linear regression model fits the data. It is defined as:

$ J(\beta) = \frac{1}{2m} \sum_{i=1}^{m} (Y^{(i)} - h_\beta(X^{(i)}))^2 $

Where:
- $ J(\beta) $: Cost function value (to be minimized)
- $ m $: Number of data points (observations)
- $ Y^{(i)} $: Actual value of the dependent variable for the $ i $-th data point
- $ h_\beta(X^{(i)}) $: Predicted value for the $ i $-th data point
    $ h_\beta(X^{(i)}) = \beta_0 + \beta_1 X_1^{(i)} + \beta_2 X_2^{(i)} + ... + \beta_p X_p^{(i)} $
- $ \beta $: Coefficients (parameters) of the linear regression model

## Why Include the $ \frac{1}{2} $ Factor?

The factor $ \frac{1}{2} $ is included for convenience because, during gradient descent, the derivative of the squared error results in a cancellation of this factor, simplifying the calculation.

To see this, let's look at the derivative:

$ \frac{\partial}{\partial \beta_j} (\frac{1}{2}(Y^{(i)} - h_\beta(X^{(i)}))^2) = (Y^{(i)} - h_\beta(X^{(i)})) \cdot \frac{\partial}{\partial \beta_j}(Y^{(i)} - h_\beta(X^{(i)})) $

The $ \frac{1}{2} $ cancels with the 2 that comes from the power rule of differentiation.

## Purpose of the Cost Function

The cost function $ J(\beta) $ quantifies the error between:
- Predicted values: $ h_\beta(X^{(i)}) $
- Actual values: $ Y^{(i)} $

Minimizing this cost function helps us find the optimal parameters $ \beta $ for the best-fit line.

## Matrix Form

The cost function can also be written in matrix form:

$ J(\beta) = \frac{1}{2m}(Y - X\beta)^T(Y - X\beta) $

Where:
- $ Y $ is the $ m \times 1 $ vector of target values
- $ X $ is the $ m \times (p+1) $ matrix of features (including a column of ones for the intercept)
- $ \beta $ is the $ (p+1) \times 1 $ vector of parameters

## Gradient of the Cost Function

The gradient with respect to each parameter $ \beta_j $ is:

$ \frac{\partial J}{\partial \beta_j} = -\frac{1}{m} \sum_{i=1}^{m} (Y^{(i)} - h_\beta(X^{(i)}))X_j^{(i)} $

In matrix form:
$ \nabla J(\beta) = -\frac{1}{m}X^T(Y - X\beta) $