# LINEAR REGRESSION

To explain the linear regression formula for a dataset with two feature instances ($x_1$ and $x_2$), where each feature instance has two data points, let's consider a small dataset as an example. In linear regression, we try to fit a model that can predict a dependent variable $y$ based on independent variables $x_1$ and $x_2$.

### Example Dataset:

| Data Point | $x_1$ (Feature 1) | $x_2$ (Feature 2) | $y$ (Target) |
|------------|-------------------|-------------------|--------------|
| 1          | $x_{11}$          | $x_{12}$          | $y_1$        |
| 2          | $x_{21}$          | $x_{22}$          | $y_2$        |

Here, $x_{11}$ and $x_{21}$ are the values of feature $x_1$ for the first and second data points, respectively, and $x_{12}$ and $x_{22}$ are the values of feature $x_2$ for the first and second data points, respectively. $y_1$ and $y_2$ are the target values for the first and second data points, respectively.

### Linear Regression Model:

The linear regression model for this dataset can be represented as:

$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2$

In general terms:
$\hat{y} = \theta \cdot x$ 
$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n$

Normal equation (Can be used instead of iterative approach - gradient descend). This will spit out result directly:
$\theta = (X^T \cdot X)^{-1} \cdot X^T \cdot y$

Where:
- $ \hat{y} $ is the predicted value of the target variable.
- $ \theta_0 $ is the intercept term.
- $ \theta_1 $ and $ \theta_2 $ are the coefficients for features $x_1$ and $x_2$, respectively, which represent the influence of each feature on the target variable.

### Objective:

The objective in linear regression is to find the values of $ \theta_0 $, $ \theta_1 $, and $ \theta_2 $ that minimize the difference between the predicted values $ \hat{y} $ and the actual target values $ y $ in the dataset. This is typically achieved by minimizing a cost function, such as the Mean Squared Error (MSE), which is defined as:

$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

For our dataset with two data points, this becomes:

$MSE = \frac{1}{2} \left[ (y_1 - (\theta_0 + \theta_1 x_{11} + \theta_2 x_{12}))^2 + (y_2 - (\theta_0 + \theta_1 x_{21} + \theta_2 x_{22}))^2 \right]$

### Solution:

The values of $ \theta_0 $, $ \theta_1 $, and $ \theta_2 $ that minimize the MSE can be found using various methods, including analytical solutions such as the Normal Equation in the case of linear regression without regularization, or iterative optimization methods such as Gradient Descent.

### Interpretation:

Once the optimal values of $ \theta_0 $, $ \theta_1 $, and $ \theta_2 $ are found, the linear regression model can predict the target variable $ y $ for any given values of $ x_1 $ and $ x_2 $ using the formula:

$\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2$

This model assumes a linear relationship between the features ($x_1$ and $x_2\)) and the target variable ($y\)), and the coefficients ($ \theta_1 $ and $ \theta_2 \)) quantify the strength and direction of the relationship between each feature and the target.