**`Multiple Linear Regression`**

* There are many times more than one input column thats why it is using

**Multiple Linear Regression (MLR)** is a statistical technique used to model the relationship between one **dependent variable** and **two or more independent variables**. It extends simple linear regression, which uses only one independent variable.

---

### **General Form of Multiple Linear Regression**

$$
Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_nX_n + \epsilon
$$

Where:

* $Y$: Dependent variable (target/output)
* $X_1, X_2, ..., X_n$: Independent variables (features/input)
* $\beta_0$: Intercept
* $\beta_1, \beta_2, ..., \beta_n$: Coefficients of the independent variables
* $\epsilon$: Error term (residual)

---

### **Assumptions of MLR**

1. **Linearity**: The relationship between the predictors and the response is linear.
2. **Independence**: Observations are independent of each other.
3. **Homoscedasticity**: Constant variance of errors.
4. **Normality of errors**: Residuals should be approximately normally distributed.
5. **No multicollinearity**: Independent variables should not be too highly correlated with each other.

---

### **Steps in Performing MLR**

1. **Data Collection**: Gather data with a target variable and multiple predictors.
2. **Exploratory Data Analysis (EDA)**: Visualize and understand the relationships.
3. **Model Building**: Fit the MLR model using tools like Python’s `scikit-learn` or R’s `lm()` function.
4. **Model Evaluation**:

   * **R² (coefficient of determination)**
   * **Adjusted R²** (accounts for number of predictors)
   * **p-values** (test significance of coefficients)
   * **F-statistic** (overall model significance)
5. **Validation**: Use techniques like cross-validation to assess generalizability.

---

### **Example in Python (using scikit-learn)**

```python
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd

# Example data
data = pd.read_csv('your_dataset.csv')
X = data[['feature1', 'feature2', 'feature3']]  # independent variables
y = data['target']  # dependent variable

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

# Coefficients
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)
```

---

Would you like a specific example with data, visualization, or help interpreting a model output?


In [1]:
import numpy as np
import pandas as pd


In [2]:
house = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw/main/Boston.csv')

In [3]:
house.sample(5)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
249,0.19073,22.0,5.86,0,0.431,6.718,17.5,7.8265,7,330.0,19.1,393.74,6.56,26.2
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2
238,0.08244,30.0,4.93,0,0.428,6.481,18.5,6.1899,6,300.0,16.6,379.41,6.36,23.7
228,0.29819,0.0,6.2,0,0.504,7.686,17.0,3.3751,8,307.0,17.4,377.51,3.92,46.7
324,0.34109,0.0,7.38,0,0.493,6.415,40.1,4.7211,5,287.0,19.6,396.9,6.12,25.0


In [5]:
house.corr()['AGE']

CRIM       0.352734
ZN        -0.569537
INDUS      0.644779
CHAS       0.086518
NX         0.731470
RM        -0.240265
AGE        1.000000
DIS       -0.747881
RAD        0.456022
TAX        0.506456
PTRATIO    0.261515
B         -0.273534
LSTAT      0.602339
MEDV      -0.376955
Name: AGE, dtype: float64

In [None]:
# age, lstat, tax, nx, indus,dis,zn

In [6]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [85]:
X = house[['LSTAT','TAX','INDUS','NX','DIS',]]

In [81]:
X.shape[1]

6

In [11]:
y = house['AGE']

In [86]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [24]:
lr = LinearRegression()

In [87]:
lr.fit(X_train,y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


In [88]:
lr.coef_

array([ 9.62674598e-01, -8.35056910e-03,  1.51072548e-01,  6.61085039e+01,
       -5.32055614e+00])

In [89]:
lr.intercept_

np.float64(42.01762617952154)

In [90]:
y_predict = lr.predict(X_test)

In [44]:
from sklearn.metrics import r2_score

In [91]:
r = r2_score(y_test,y_predict)

In [92]:
r

0.7127962241585518

In [93]:
1 - (1-r*r)*((X.shape[1]-1)/(X.shape[1]-1-1))

0.34410460956625133

In [18]:
import plotly.express as px

In [38]:
X

Unnamed: 0,LSTAT,INDUS
0,4.98,2.31
1,9.14,7.07
2,4.03,7.07
3,2.94,2.18
4,5.33,2.18
...,...,...
501,9.67,11.93
502,9.08,11.93
503,5.64,11.93
504,6.48,11.93


In [95]:
px.scatter_3d(house,x='LSTAT',y='INDUS',z='AGE',color='AGE')

**`Mathematical Formulation`**

home

In [96]:
house.sample()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
216,0.0456,0.0,13.89,1,0.55,5.888,56.0,3.1121,5,276.0,16.4,392.8,13.51,23.3


In [97]:
house.shape

(506, 14)

In [100]:
X = house.iloc[:,0:13]

In [105]:
y = house.iloc[:,13:].values

In [107]:
X

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296.0,15.3,396.90,4.98
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.90,9.14
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.90,5.33
...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,69.1,2.4786,1,273.0,21.0,391.99,9.67
502,0.04527,0.0,11.93,0,0.573,6.120,76.7,2.2875,1,273.0,21.0,396.90,9.08
503,0.06076,0.0,11.93,0,0.573,6.976,91.0,2.1675,1,273.0,21.0,396.90,5.64
504,0.10959,0.0,11.93,0,0.573,6.794,89.3,2.3889,1,273.0,21.0,393.45,6.48


In [108]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [109]:
X_train,X_test,y_train,y_test= train_test_split(X,y,test_size=0.2,random_state=2)

In [110]:
lr = LinearRegression()

In [111]:
lr.fit(X_train,y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


In [113]:
from sklearn.metrics import r2_score

In [114]:
y_p = lr.predict(X_test)

In [116]:
r = r2_score(y_test,y_p)

In [117]:
1- (1-r**2)*((X.shape[1]-1)/(X.shape[1]-1-1))

0.5709645752080115

![image.png](attachment:image.png)

`Problem with OLS` <br>

- time complexity while finding inverse of matrix is n^3