### ***Linear regression***

When we try to find a line such that most of the points are close to the line.


```math
y = m x + c
```

**Goal:** Model relationship between independent variables (X) and a dependent variable (y).
Predict continuous outcomes.

### ***Multiple Linear Regression***

when we have multiple independent variables (features) to predict a dependent variable (target).
```math
    y = m1​x1​ + m2​x2 ​+ m3​x3​ + ⋯ + mn​xn ​+ c + ε
```

where:
    - y is the dependent variable (target).
    - x1, x2, ..., xn are the independent variables (features).
    - m1, m2, ..., mn are the coefficients (slopes) for each independent variable.
    - c is the y-intercept (constant term).
    - ε is the error term (residuals).

***Mean Squared Error (MSE)***
    
The Mean Squared Error (MSE) is a common metric used to evaluate the performance of regression models. 
It measures the average of the squares of the errors—that is, the average squared difference between the actual and predicted values.
```math
        mse = (1/n) * Σ(actual - predicted)²
```
This is very bad for outliers.
For example error before averaging means 10 will be 100 after squaring.

***Mean Absolute Error (MAE)***

The Mean Absolute Error (MAE) is another metric used to evaluate the performance of regression models.
It measures the average of the absolute differences between the actual and predicted values.
```math
        mae = (1/n) * Σ|actual - predicted|
```
MAE is more robust to outliers compared to MSE because it does not square the errors.

***Residual sum of squares (RSS)***

This tells us how well our model is able to predict the target variable.
It is the sum of the squares of the differences between the actual and predicted values.
The lower the RSS, the better the model fits the data.
```math
RSS = Σ (yi - ŷi)²
```
where yi is the actual value and ŷi is the predicted value.

***Total sum of squares (TSS)***
This measures the total variability in the target variable.
It is the sum of the squares of the differences between the actual values and the mean of the actual values.
```math
TSS = Σ (yi - ȳ)²
```
where yi is the actual value and ȳ is the mean of the actual values.

***R-squared (R²)***
This is a statistical measure that represents the proportion of the variance for the target variable that is explained by the independent variables in the model.
It is calculated as:
```math
R² = 1 - (RSS / TSS)
```
R² ranges from 0 to 1, where:
- 0 indicates that the model does not explain any of the variability in the target variable.
- 1 indicates that the model explains all the variability in the target variable.

A higher R² value indicates a better fit of the model to the data.

***Root Mean Squared Error (RMSE)***
RMSE is the square root of the average squared differences between predicted and actual values.
```math
    rmse = √( (1/n) * Σ(actual - predicted)² )
```
RMSE gives higher weight to large errors and is in the same unit as the target variable.


***Adjusted R²***
Adjusted R² adjusts the R² value based on the number of predictors.
It prevents overestimating the goodness of fit when adding more variables.
```math
        Adjusted R² = 1 - (1 - R²) * (n - 1) / (n - p - 1)
```

where:
    n = number of observations
    p = number of independent variables


***Multicollinearity***

When one independent variable (feature) in a regression model can be (nearly) linearly predicted from the others, there is multicollinearity.

***Example:***
    c1 = c2 + c3 - c4

Some value of columns can be identified easily like total but some of the columns are little difficult to identify
So we create linear regression for each independent variable using the rest of the values

Now R² of this model predict is very low then that independent variable has no multicollinearity with any other variable.

***Others way to find Multicollinearity***
Pairwise correlation matrix (high absolute correlations are a hint).
Variance Inflation Factor (VIF)
```math
    VIF = 1 / 1 - R²
```
we can calculate VIF for all the variables when VIF is high we cant use that variable
common rule-of-thumb: VIF > 5 or 10 signals concern.

***Common remedies:***
- Remove or combine highly correlated predictors (e.g., drop one, sum, or average).
- Use dimensionality reduction: Principal Component Regression (PCR) or Partial Least Squares (PLS).
- Apply regularization: Ridge (L2) reduces variance; Lasso (L1) can perform variable selection.

***Notes:***
- Multicollinearity mainly affects interpretability of coefficients, not necessarily predictive performance.

***Assumptions of Linear Regression:***
1. Linearity: The relationship between the independent and dependent variables is linear.
2. Independence: Observations are independent of each other.
3. Homoscedasticity: Constant variance of errors across all levels of the independent variable.
4. Normality of Errors: The residuals (errors) are normally distributed.
5. No Multicollinearity: Independent variables are not highly correlated with each other.
6. No Autocorrelation: Residuals are not correlated with each other, especially in time series data.

***Implementation of linear regression***
```code
from sklearn import linear_model
model = linear_model.LinearRegression()
```
different parameters avalilable in linear regression
```code
model = linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=None, positive=False)
```

***fit_intercept:*** whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).

***normalize:*** This parameter is ignored when fit_intercept is set to False. If True,
    the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm.

***copy_X:*** If True, X will be copied; else, it may be overwritten.

***n_jobs:*** The number of jobs to use for the computation. This will only provide speed
    up the computation if multiple input arrays are passed to the fit method of the estimator
    and if n_targets is greater than 1. None means 1 unless in a joblib.parallel_backend context.
    
***positive:*** When set to True, forces the coefficients to be positive. This option is only supported for dense arrays.

***Order of implementation***

1. Import necessary libraries
2. Load the dataset
3. data processing
4. Check for correlations (heatmap)
5. Detect multicollinearity (VIF)
6. Split data into train and test sets
7. Apply k-Fold Cross-Validation (e.g., K=5 or 10)
8. Fit Linear Regression model on training data
9. Predict on test data
10. Evaluate model — MSE, RMSE, MAE, R², Adjusted R²
11. Plot residuals (check assumptions visually)
12. Check model assumptions (linearity, homoscedasticity, normality)
13. Interpret coefficients (feature importance)
14. Apply Regularization (Ridge / Lasso / ElasticNet) — optional
15. Save or deploy the model — optional