# Implementing of Linear Regression using the Normal Equation

## Multi Linear Regression and Normal Equation

Normal Equation is an analytical approach to Linear Regression with a Least Square Cost Function. We can directly find out the value of θ without using Gradient Descent.

Suppose there are n independent variables $x_i$ and one dependent variable y. We could express the relationship using the following equation:

$$y = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \dots + \theta_n x_n$$

If there are m groups of **training data**, we can get the matrix equation:
$$\begin{equation}
\begin{bmatrix}
\theta_0x_0^{(1)} & \theta_1 x_1^{(1)} & \theta_2 x_2^{(1)} & \theta_3 x_3^{(1)} & \dots & \theta_n x_n^{(1)} \\
\theta_0x_0^{(2)} & \theta_1 x_1^{(2)} & \theta_2 x_2^{(2)} & \theta_3 x_3^{(2)} & \dots & \theta_n x_n^{(2)} \\
\vdots  & \vdots  & \ddots & \vdots & \cdots & \vdots  \\
\theta_0x_0^{(m)} & \theta_1 x_1^{(m)} & \theta_2 x_2^{(m)} & \theta_3 x_3^{(m)} & \dots & \theta_n x_n^{(m)}
\end{bmatrix}
= \begin{bmatrix}
y^{(1)} \\
y^{(2)} \\
\vdots \\
y^{(m)}
\end{bmatrix}
\end{equation}$$

Simplify to:
$$X\Theta = y$$
Where
$$\Theta=(X^TX)^{(-1)}X^Ty$$

In [1]:
# Importing

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

Let m = # of training examples, n = # of features

In [2]:
m, n = 500, 2

X = np.random.rand(m, n)                    # random values between [0, 1)
y = 5 * X + np.random.randn(m, n) * 0.1     # samples from standard normal distribution

In [3]:
print('X shape', X.shape)
print('X max', round(X.max(), 5))
print('X min', round(X.min(), 5))

print('')
print('y shape', y.shape)
print('y max', round(y.max(), 2))
print('y min', round(y.min(), 2))

X shape (500, 2)
X max 0.99961
X min 0.00064

y shape (500, 2)
y max 5.1
y min -0.15


In [4]:
def linear_regression_normal_equation(X, y):
    ones = np.ones((X.shape[0], 1))
    X = np.append(ones, X, axis=1)
    W = np.dot(np.linalg.inv(np.dot(X.T, X)), np.dot(X.T, y))

    return W

In [5]:
W = linear_regression_normal_equation(X, y)
W

array([[ 0.00920747,  0.03158102],
       [ 4.9805266 , -0.01558621],
       [ 0.00537734,  4.97132535]])

### Boston house-price dataset

In [6]:
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

In [7]:
data_with_target = np.concatenate([data, np.expand_dims(target,1)], axis=1)

In [8]:
boston_data = pd.DataFrame(data=data_with_target, columns=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV'])
boston_data

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.0900,1.0,296.0,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0.0,0.573,6.593,69.1,2.4786,1.0,273.0,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0.0,0.573,6.120,76.7,2.2875,1.0,273.0,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0.0,0.573,6.976,91.0,2.1675,1.0,273.0,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0.0,0.573,6.794,89.3,2.3889,1.0,273.0,21.0,393.45,6.48,22.0


In [9]:
W = linear_regression_normal_equation(data, target)
W

array([ 3.64594884e+01, -1.08011358e-01,  4.64204584e-02,  2.05586264e-02,
        2.68673382e+00, -1.77666112e+01,  3.80986521e+00,  6.92224640e-04,
       -1.47556685e+00,  3.06049479e-01, -1.23345939e-02, -9.52747232e-01,
        9.31168327e-03, -5.24758378e-01])

In [10]:
# - Find theta using Scikit-learn
reg = LinearRegression().fit(data, np.expand_dims(target,1))
theta_scikit = np.c_[reg.intercept_, reg.coef_]
print('theta_scikit ', theta_scikit.transpose())

theta_scikit  [[ 3.64594884e+01]
 [-1.08011358e-01]
 [ 4.64204584e-02]
 [ 2.05586264e-02]
 [ 2.68673382e+00]
 [-1.77666112e+01]
 [ 3.80986521e+00]
 [ 6.92224640e-04]
 [-1.47556685e+00]
 [ 3.06049479e-01]
 [-1.23345939e-02]
 [-9.52747232e-01]
 [ 9.31168327e-03]
 [-5.24758378e-01]]
