**Multiple Linear Regression**
- Multiple Linear Regression (MLR) is a supervised learning algorithm that predicts a continuous dependent variable based on two or more independent variables. It extends the simple linear regression model to accommodate multiple features.
- MLR models the relationship between a dependent variable Y and multiple independent variables X1, X2, ..., Xn. The model assumes a linear relationship and finds the best-fitting hyperplane in the multi-dimensional feature space.

**Key concepts:**
- **Dependent Variable (Y):** The target variable to predict.
- **Independent Variables (X):** Features used to make predictions.
- **Coefficients (b):** Weights for each feature indicating their impact on Y.
- **Intercept (b0):** The point where the hyperplane crosses the Y-axis when all X are zero.
- **Error Term (e):** Represents the unexplained variance or noise in the model.

**How it works and how is it used?**
MLR works by finding the hyperplane in the feature space that minimizes the Residual Sum of Squares (RSS). The model fits the equation:
```
Y = b0 + b1*X1 + b2*X2 + ... + bn*Xn + e
```

**Usage**:
- Forecasting and predicting outcomes based on multiple factors.
- Analyzing the relative impact of each independent variable on Y.
- Applications in finance, healthcare, marketing, and engineering.

**List of Important Types**
- **Ordinary Least Squares (OLS):** The most common type, minimizing the sum of squared residuals.
- **Stepwise Regression:** Selects significant variables step-by-step.
- **Ridge Regression:** Adds L2 regularization to reduce overfitting.
- **Lasso Regression:** Adds L1 regularization, encouraging sparsity in coefficients.
- **ElasticNet Regression:** Combines Ridge and Lasso penalties for improved performance.

**Goal**
To model the relationship between a dependent variable and multiple independent variables, predict future outcomes, and quantify the importance of each independent variable.

**Important Formula**
The formula for multiple linear regression is:
```
Y = b0 + b1*X1 + b2*X2 + ... + bn*Xn + e
```
Where:
- Y: Predicted dependent variable
- b0: Intercept
- b1, b2, ..., bn: Coefficients of independent variables
- X1, X2, ..., Xn: Independent variables
- e: Error term

**Example**
**Dataset**: Predict house prices based on square footage, number of bedrooms, and proximity to schools.

| Square Footage (X1) | Bedrooms (X2) | Proximity to School (X3) | Price (Y) |
|---------------------|---------------|--------------------------|-----------|
| 2000               | 3             | 1                        | 400,000   |
| 1500               | 2             | 3                        | 300,000   |
| 2500               | 4             | 2                        | 500,000   |

MLR identifies how each feature contributes to the final price.

In [1]:
#### 8. **Python Code**

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

# Sample Data
data = {
    'Square_Footage': [2000, 1500, 2500],
    'Bedrooms': [3, 2, 4],
    'Proximity_to_School': [1, 3, 2],
    'Price': [400000, 300000, 500000]
}

df = pd.DataFrame(data)

# Features and Target
X = df[['Square_Footage', 'Bedrooms', 'Proximity_to_School']]
y = df['Price']

# Model
model = LinearRegression()
model.fit(X, y)

# Coefficients and Intercept
print(f"Intercept: {model.intercept_}")
print(f"Coefficients: {model.coef_}")

# Predictions
y_pred = model.predict(X)
print(f"Predicted Prices: {y_pred}")

Intercept: 0.39999839989468455
Coefficients: [1.99999200e+02 3.99998400e-01 1.82131091e-11]
Predicted Prices: [400000. 300000. 500000.]


**Real World Scenario**
**Scenario**: Predicting employee salaries based on experience, education level, and location.
- **Problem**: A company wants to establish a salary range for incoming candidates.
- **Application**: MLR helps determine how much each factor (experience, education, location) influences salary.

**Problem Statement**
Given a dataset containing various features of houses (size, number of rooms, location), predict the house price for a new set of feature values.

**How it can help**
- Understand the relative importance of different factors in determining an outcome.
- Predict future values with reasonable accuracy.
- Optimize processes and strategies by focusing on significant predictors.

**Alternate Solution**
If the relationship between features and the target variable is non-linear, use Polynomial Regression, Decision Trees, or Random Forest Regression for better performance.

**Final Important Note About the Topic**
Multiple linear regression assumes:
- A linear relationship between the dependent and independent variables.
- Independence of features (low multicollinearity).
- Homoscedasticity (constant variance of residuals).
- Normally distributed residuals.

Violations of these assumptions can lead to biased or unreliable predictions. Always validate the model and test assumptions before relying on the results.
