### Importing necessary libraries and loading our data into a numpy array

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
def rss(X, theta, y):
    r = 0
    for i in range(X.shape[0]):
        # taking the ith sample for measurement
        new_sample = np.array([X[i]])
        # adding a constant feature column as mentioned in lecture slides
        new_sample_b = np.concatenate([np.ones((len(new_sample), 1)), new_sample], axis=1)
        # predicting y-hat using our linear regression model
        new_sample_pred = new_sample_b.dot(theta)
        r += (new_sample_pred - y[i])**2
    return r

def tss(y):
    t = 0
    avg = np.mean(y)
    for i in range(y.shape[0]):
        t += (y[i] - avg)**2
    return t

In [3]:
X = np.genfromtxt('real-estate.csv', delimiter=',')
X = np.delete(X, (0), axis=0) # Removing first column containing serial numbers
X = np.delete(X, (0), axis=1) # Removing first row containing feature labels
y = X[:, 6]

In [4]:
X = np.delete(X, -1, axis=1)
oneX = np.ones((len(X), 1))
data = np.concatenate([oneX, X], axis=1)

### Implementing the Normal Equation Approach to Linear Regression (derived below)

In [5]:
dtd = data.T.dot(data)
theta = np.linalg.inv(data.T.dot(data)).dot(data.T).dot(y)
# intercept, *coef = theta

### Calculating R2 (evaluation metric discussed in class)

In [6]:
r2 = 1 - (rss(X, theta, y)/tss(y))
print(r2)

# r2 comes out to be 0.5824

[0.58237045]


# Derivation of the Normal Equation approach

Here I will present a broad derivation of the Normal Equation approach in my own words.
Let Q be a column vector of weights. The hypothesis equation for Linear Regression is:

$$
h(Q) = Q^TX
$$

To get the cost function in vector form, we first observe this quantity: (where n is the number of features)

$$
1/2m*(
\begin{bmatrix}
Q^T(x^0) \\
Q^T(x^0) \\
... \\
Q^T(x^n) \\
\end{bmatrix} - Y)
$$

As the cost function involves squares, we multiply it with its transpose:

$$
Cost(Q) = 1/2m*(XQ-Y)^T(XQ-Y) \\
$$

After some more simplification, we arrive at the following expression:

$$
Cost(Q) = 1/2m*((XQ)^TXQ - (XQ)^TY - Y^T(XQ) + Y^TY) \\
or \\Cost(Q) = 1/2m*((XQ)^TXQ - 2*(XQ)^TY + Y^TY)
$$

Now, all we essentially need to do is to minimise this cost. As our cost function is a function in a single variable (Q), we can use basic linear algebra covered in our first semester to find its minima.

$$
\frac{\partial (Cost)}{\partial Q} = 2X^TXQ - 2X^Ty
$$

Equating this to zero will give us our answer. Hence we proceed:

$$
2X^TXQ - 2X^Ty = 0 \\ or \\
X^TXQ = X^Ty
$$

For the next step to be valid, the matrix $X^TX$ has to be invertible. Our final step of the derivation will be:

$$
(X^TX)^{-1}X^TXQ = (X^TX)^{-1}X^Ty \\ or \\
Q = (X^TX)^{-1}X^Ty
$$

Which completes our derivation.

## Limitations of the Normal Equation Approach

The step to invert a matrix effectively runs in O(n^3) time. This can severely hurt our algorithm’s performance for large values of n. Therefore for large datasets, gradient descent is the preferred way to obtain linear regression coefficients.

## References

- **E. Bendersky**, “Derivation of the normal equation for linear regression,” *Eli Bendersky's Blog*. [Online]. Available: https://eli.thegreenplace.net/2014/derivation-of-the-normal-equation-for-linear-regression. [Accessed: 01-Apr-2023].