## The theory behind ft_linear_regression project


The formula of the linear regression is:
$$y(x) = \theta_0 + \theta_1 x$$

Where:
- $y$ is the dependent variable (the output we want to predict)
- $x$ is the independent variable (the input feature)
- $\theta_0$ is the intercept (the value of $y$ when $x = 0$)
- $\theta_1$ is the slope (the change in $y$ for a one-unit change in $x$)

This is the data points we will be working with:
<p align="center">
  <img src="static/data_plot.png" width="50%">
</p>

The goal of linear regression is to find the best-fitting line through the data points, which minimizes the difference between the predicted values and the actual values. This is typically done using the method of least squares, which minimizes the sum of the squared differences between the predicted and actual values.

The cost function for linear regression is given by:
$$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$$

Where:
- $J(\theta)$ is the cost function
- $m$ is the number of training examples
- $h_\theta(x^{(i)})$ is the predicted value for the $i$-th training example
- $y^{(i)}$ is the actual value for the $i$-th training example
- $\theta$ is the vector of parameters (including both $\theta_0$ and $\theta_1$)

The gradient descent algorithm is used to minimize the cost function by iteratively updating the parameters $\theta$:
$$\theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}$$

Where: 
- $\alpha$ is the learning rate (a hyperparameter that controls the step size in the parameter space)
- $\frac{\partial J(\theta)}{\partial \theta_j}$ is the partial derivative of the cost function with respect to the parameter $\theta_j$
The partial derivatives for the parameters $\theta_0$ and $\theta_1$ are given by:
$$\frac{\partial J(\theta)}{\partial \theta_0} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})$$
$$\frac{\partial J(\theta)}{\partial \theta_1} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}$$

--- 
## Evaluation of the Model

The final model can be evaluated using metrics such as Mean Squared Error (MSE) or R-squared, which provide insights into the model's performance and how well it fits the data.
The MSE is calculated as:
$$MSE = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$$
The R-squared value is calculated as:
$$R^2 = 1 - \frac{\sum_{i=1}^{m} (y^{(i)} - h_\theta(x^{(i)}))^2}{\sum_{i=1}^{m} (y^{(i)} - \bar{y})^2}$$
Where $\bar{y}$ is the mean of the actual values $y^{(i)}$.
The R-squared value indicates the proportion of variance in the dependent variable that can be explained by the independent variable. A higher R-squared value indicates a better fit of the model to the data. 