# Multiple Linear Regression

## Overview

Multiple Linear Regression (MLR) is a statistical technique used to model the relationship between a dependent variable and two or more independent variables. It extends simple linear regression, which involves only one independent variable.

## Model

The general form of the multiple linear regression model is:

$$ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n + \epsilon $$

where:
- $$ Y $$ is the dependent variable (response).
- $$ \beta_0 $$ is the intercept term.
- $$ \beta_1, \beta_2, \ldots, \beta_n $$ are the coefficients of the independent variables $$ X_1, X_2, \ldots, X_n $$.
- $$ \epsilon $$ is the error term.

## Assumptions

1. **Linearity**: The relationship between the dependent and independent variables is linear.
2. **Independence**: Observations are independent of each other.
3. **Homoscedasticity**: Constant variance of errors across all levels of the independent variables.
4. **Normality**: The residuals (errors) are normally distributed.


In [1]:
from sklearn.datasets import make_regression
import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go

from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

In [2]:
X,y = make_regression(n_samples=100, n_features=2, n_informative=2, n_targets=1, noise=50)

In [3]:
df = pd.DataFrame({'feature1':X[:,0],'feature2':X[:,1],'target':y})

In [4]:
df.head()

Unnamed: 0,feature1,feature2,target
0,-0.105826,1.221511,125.220647
1,0.8368,-0.468675,-56.051605
2,-1.354442,-0.472839,-47.231603
3,-0.16772,-0.149082,33.618795
4,-0.169405,-1.148799,-79.183964


In [5]:
fig = px.scatter_3d(df, x='feature1', y='feature2', z='target')

fig.show()

In [6]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=3)

In [7]:
from sklearn.linear_model import LinearRegression

In [8]:
lr = LinearRegression()

In [9]:
lr.fit(X_train,y_train)

In [10]:
y_pred = lr.predict(X_test)

In [11]:
print("MAE",mean_absolute_error(y_test,y_pred))
print("MSE",mean_squared_error(y_test,y_pred))
print("R2 score",r2_score(y_test,y_pred))

MAE 35.74671187800118
MSE 2077.1434189447978
R2 score 0.7040096289612122


In [13]:
x = np.linspace(-5, 5, 10)
y = np.linspace(-5, 5, 10)
xGrid, yGrid = np.meshgrid(y, x)

final = np.vstack((xGrid.ravel().reshape(1,100),yGrid.ravel().reshape(1,100))).T

z_final = lr.predict(final).reshape(10,10)

z = z_final

In [14]:
fig = px.scatter_3d(df, x='feature1', y='feature2', z='target')

fig.add_trace(go.Surface(x = x, y = y, z =z ))

fig.show()