**Simple Linear Regression using Normal Equation**

The Normal Equation is another method for finding the parameters (or coefficients) of a linear regression model. The goal is to minimize the cost function (typically Mean Squared Error, or MSE) in order to find the best-fit line.

In case of SLR, the hypothesis is:

\begin{equation}
h(x) = \theta_0 + \theta_1 x
\end{equation}

where:

h(x) is the hypothesis/model

x is the independent variable

$\theta_0$ is the y-intercept or the bias term

$\theta_1$ is the slope of the line (coefficient of the feature)

$\theta_0$ & $\theta_1$ together are also called the parameters or coefficients or the weights

Note the above hypothesis is an affine function

Implementing the normal equation:

The normal equation for linear regression is derived from the least squares optimization, which minimizes the error between the predicted values and the actual values. The equation is:

\begin{equation}
\theta = (X^T X)^{-1} X^T h(x)
\end{equation}

where:


X is the matrix of input features (with an extra column of ones added to account for the intercept)

$X^T$ is the transpose of X

$\theta$ is the vector for the parameters of the model which gives the global minimum of the cost function J($\theta$),

\begin{equation}
J(\theta) = \frac{1}{2m} \sum_{i = 1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)})^2
\end{equation}


The soultion of the below equation gives the global minimum,

\begin{equation}
\nabla_{\theta} J(\theta) = \vec{0}
\end{equation}

While the gradient descent algorithm iteratively update the parameters by moving in the direction of the negative gradient of the cost function, the 'normal equation' method provides an analytical solution to the global minimum of the cost function, given that $(X^T X)^{-1}$ exists, in a single step.

While the normal equation is faster and more effective, the computation can become expensive when there are multiple features (X is large) and it doesn't work if $(X^T X)^{-1}$ isn't invertible.

Now let's get to the code. First, import the necessary libraries and generate the data

In [25]:
#Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt

In [26]:
#Generating data
m = 100  # Number of data points
X = 5 * np.random.rand(m, 1)  # Random values from 0 to 5 for feature X
y = 4 + 3 * X + np.random.randn(m, 1)  # Linear relation with noise: y = 4 + 3*X + noise

#Add a column of ones to represent the intercept
X_b = np.c_[np.ones((m, 1)), X] #X_b is of shape (m, 2)

Now, implementing the normal equation in one line of code:

In [27]:
#Normal Equation
theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

The parameters are then:

In [None]:
theta_0, theta_1 = theta[0], theta[1]
print(f"The parameter theta_0: {theta_0}")
print(f"The parameter theta_1: {theta_1}")

#Ploting the result
plt.scatter(X, y, color='blue', label='Data points')  # Plot the data points
plt.plot(np.array([[0], [5]]), np.c_[np.ones((2, 1)), X_new].dot(theta), color='red', label='Best-fit line')  # Plot the best-fit line
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression using Normal Equation')
plt.legend()
plt.grid()
plt.show()