# LINEAR REGRESSION

## <font color="light blue">Simple Linear Regression </font>

#### <font color="orange">What is Simple Linear Regression? </font>

Simple Linear Regression is a statistical method that helps us understand the relationship between two continuous variables. It's called "simple" because we're only dealing with two variables, and "linear" because we're looking for a straight-line relationship between them.

##### **<font color="orange">Example </font>**

Let's say we want to predict the price of a house based on its size. We have data on the size of several houses (in square feet) and their corresponding prices. We can use Simple Linear Regression to find the relationship between these two variables.

#### <font color="orange">Mathematical Equation </font>

The Simple Linear Regression equation is:

Price(y) = β0 + β1 \* Size(x) + ε

Where:

* **Price** is the predicted price of the house
* **Size** is the size of the house (in square feet)
* **β0** is the intercept or constant term
* **β1** is the slope coefficient - unit movement in the x-axis what is the unit movement in the y-axis.
* **ε** is the error term (which represents the random variation in the data)

#### How it Works

Here's a step-by-step explanation:

1. **Collect data**: We collect data on the size and price of several houses.
2. **Plot the data**: We plot the data on a scatter plot to visualize the relationship between size and price.
3. **Find the best-fit line**: We use a mathematical algorithm to find the best-fit line that minimizes the error between the predicted prices and the actual prices.
4. **Interpret the results**: We interpret the results by looking at the slope coefficient (β1) and the intercept (β0). The slope coefficient tells us the change in price for a one-unit change in size. The intercept tells us the predicted price when the size is zero.

#### <font color="orange">Simple Linear Regression Plot </font>
<img src="../Images/SimpleLinearRegression.jpg" width="600" height="400">

#### <font color="orange">Cost Function </font>

A cost function, also known as a loss function or objective function, is a mathematical function that measures the difference between the predicted output of a machine learning model and the actual true output.

The cost function is typically denoted as J(θ), where θ represents the model's parameters. The cost function takes the following form:
The formula for the cost function is:

$$
J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2
$$

* $J(\theta)$: Cost function
* $\theta$: Model parameters
* $m$: Number of training examples
* $x^{(i)}$: $i^{th}$ input feature
* $y^{(i)}$: $i^{th}$ actual true output
* $h_\theta(x^{(i)})$: Predicted output of the model for the $i^{th}$ datapoint


<font color = 'red'>**Our main aim is to minimize the cost function**</font>


#### <font color="orange">Common Regression Metrics (Cost Function) </font>

##### 1. Mean Absolute Error (MAE)
The Mean Absolute Error measures the average magnitude of the errors in a set of predictions, without considering their direction. It gives an idea of how much the predicted values deviate from the actual values.

$$
MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
$$


##### 2. Mean Squared Error (MSE)
The Mean Squared Error is the average of the squared differences between the actual and predicted values. It penalizes larger errors more than smaller ones, making it sensitive to outliers.

$$
MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$


##### 3. Root Mean Squared Error (RMSE)
The Root Mean Squared Error is the square root of the Mean Squared Error. It provides an estimate of the standard deviation of the errors, making it easier to interpret in the same units as the target variable.

$$
RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}
$$


##### 4. Coefficient of Determination (R²)
The R² score indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It ranges from 0 to 1, where 1 indicates perfect predictions.

$$
R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\sum_{i=1}^n (y_i - \bar{y})^2}
$$


##### 5. Adjusted R²
The Adjusted R² adjusts the R² value to account for the number of predictors in the model. It prevents overestimating the model's explanatory power when additional predictors are included.

$$
R_{adj}^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k - 1} \right)
$$



##### <font color="orange">Example: Gradient Descent </font>

Gradient Descent is a popular convergence algorithm used in machine learning. Here's how it works:
- Initialize: Start with an initial guess for the parameters, θ.
- Iterate: Use the current parameters to make predictions and calculate the cost, J(θ).
- Improve: Update the parameters using the gradient of the cost function, ∇J(θ).
- Repeat: Go back to step 2 and repeat until convergence.

#### <font color="orange">Convergence Algorithm </font>

The convergence algorithm ensures that an iterative optimization process converges to a solution by repeatedly updating parameters until a stopping criterion is met.

#### <font color="orange">Convergence Algorithm Steps </font>

1. **Initialize Parameters**:
   Start with initial values for the parameters:
    $ \theta $, such as (  $ \theta_1 $,  $ \theta_2 $, $ \ldots $, $ \theta_n $).
    

2. **Compute the Objective Function**:
   Evaluate the cost or loss function $ J(\theta) $, which measures the difference between the predicted and actual values.

   Example of a cost function (Mean Squared Error):
   $$
   J(\theta) = \frac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2
   $$

3. **Compute the Gradient**:
   Calculate the gradient of the cost function with respect to the parameters:
   $$
   \frac{\partial J(\theta)}{\partial \theta_j}
   $$

   The gradient provides the direction and magnitude of the parameter updates.

4. **Update the Parameters**:
   Update the parameters using the chosen learning rate $ \alpha $:
   $$
   \theta_j := \theta_j - \alpha \frac{\partial J(\theta)}{\partial \theta_j}
   $$

5. **Check the Stopping Criterion**:
   Stop the iterations when one of the following conditions is met:
   - The change in $ J(\theta) $ between iterations is less than a predefined threshold $ \epsilon $.
     $$
     |J(\theta^{(t)}) - J(\theta^{(t-1)})| < \epsilon
     $$
   - The number of iterations exceeds the maximum allowed iterations.

6. **Output the Optimal Parameters**:
   Once convergence is achieved, return the optimal parameters $ \theta^* $.
