# Multiple Linear Regression Regression Implementation


## 1. Considerations
*__Multiple linear regression__ refers to a statistical technique that uses two or more independent variables to predict the outcome of a dependent variable. The technique enables analysts to determine the variation of the model and the relative contribution of each independent variable in the total variance. Multiple regression can take two forms, i.e., linear regression and non-linear regression.*<br>

*Polynomial Regression is a special case of Linear Regression where we fit the polynomial equation on the data with a curvilinear relationship between the dependent and independent variables.*

$$
    y_i = b_0 + b_1x_1 + b_2x_2 + ... + b_i x_i + error
$$

__Advantages of Polynomial Regression__
- The most important advantage of Multivariate regression is it helps us to understand the relationships among variables present in the dataset. This will further help in understanding the correlation between dependent and independent variables. Multivariate linear regression is a widely used machine learning algorithm.

__Disadvantage of Polynomial Regression__
- Multivariate techniques are a bit complex and require a high-levels of mathematical calculation. 
- The multivariate regression model’s output is not easy to interpret sometimes, because it has some loss and error output which are not identical.
- This model does not have much scope for smaller datasets. Hence, the same cannot be applied to them. The results are better for larger datasets.


#### 1.2 Choses mathématiques
$$
b \: = \: coefs
$$
$$
x = parameters
$$
$$
\begin{bmatrix}
y_1\\
y_2\\
y_3\\
y_4\\
y_5\\
... \\
y_n \\
\end{bmatrix}
=
\begin{bmatrix}
x_{11} & x_{12} & ... & x_{1m} \\
x_{21} & x_{22} & ... & x_{2m} \\
x_{31} & x_{32} & ... & x_{3m} \\
x_{41} & x_{42} & ... & x_{4m} \\
x_{51} & x_{52} & ... & x_{5m} \\
... & ... & ... & ...\\
x_{n1} & x_{n2} & ... & x_{nm} \\
\end{bmatrix}
\begin{bmatrix}
b_1 \\
b_2 \\
b_3 \\
b_4 \\
b_5 \\
... \\
b_n \\
\end{bmatrix}
+
\begin{bmatrix}
error_1 \\
error_2 \\
error_3 \\
error_4 \\
error_5 \\
... \\
error_n \\
\end{bmatrix}
$$
$$ERR = Y - Xb \rightarrow ERR^T ERR = (Y-Xb)^T(Y-Xb)$$
$$ERR^T ERR = (Y-Xb)^T(Y-Xb)$$
$$ERR^T ERR = (Y^T-b^TX^T)(Y-Xb)$$
$$ERR^T ERR = Y^Y - Y^TXb - b^TX^TY + b^TX^TXb$$
$$ERR^T ERR = Y^Y - 2Y^TXb + b^TX^TXb$$
$$note \: that \: b^TX^TY = (b^TX^TY)^T = Y^TXb $$
$$\frac{\partial (ERR^T ERR)}{\partial b} = \frac{\partial (Y^Y - 2Y^TXb + b^TX^TXb)}{\partial b}$$
$$0 = -2X^TY + 2X^TXB$$
$$B = (X^TX)^{-1}X^TY$$
#### 1.3 NumPy Implementation 

In [92]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score

In [93]:
data = pd.read_csv("./data/kc_house_data.csv")
data.insert(loc=0, column='A', value=1)
Y = data["price"].to_numpy()
X = data[["A", "bedrooms", "bathrooms", "yr_built"]].to_numpy()

$$B = (X^TX)^{-1}X^TY$$

In [94]:
def multilinear_regression(X,Y):
    B = np.dot(np.linalg.inv(np.dot(X.T, X)),np.dot(X.T,Y))
    return B

def polynomial_regression_predict(X, B):
    return np.dot(X,B)

In [95]:
x_train = X[:int(len(X)*0.8)]
x_test = X[int(len(X)*0.8):]
y_train = Y[:int(len(Y)*0.8)]
y_test = Y[int(len(Y)*0.8):]

# Calculate slope and intercept using custom regression
B = multilinear_regression(x_train,y_train)

# Create a Linear Regression model
model = LinearRegression()

# Fit the model to the training data
model.fit(x_train, y_train)

# Make predictions on the test data
y_pred = model.predict(x_test)

print(f"CUSTOM COEFS: {B}")
print(f"SKLEARN COEFS: {model.coef_}\n\n\n")

mse = mean_squared_error(y_test, y_pred)
r_squared = r2_score(y_test, y_pred)
print('Mean_Squared_Error SKLearn:' ,mse)
print('r_square_value SKLearn:',r_squared)

mse = mean_squared_error(y_test, polynomial_regression_predict(x_test, B))
r_squared = r2_score(y_test, polynomial_regression_predict(x_test, B))
print('\n\nMean_Squared_Error Custom:' ,mse)
print('r_square_value Custom:',r_squared)

CUSTOM COEFS: [ 7.16597718e+06  2.60871594e+03  3.17050430e+05 -3.70828359e+03]
SKLEARN COEFS: [     0.           2608.71594022 317050.42962814  -3708.28358646]



Mean_Squared_Error SKLearn: 91529101733.40758
r_square_value SKLearn: 0.33355428196639025


Mean_Squared_Error Custom: 91529101733.40001
r_square_value Custom: 0.3335542819664453
