Q2) Refer to the instructor's notebook on multi-colinearity. Use np.linalg.solve instead of np.linalg.inv for the same problem. Compare and contrast their usage, which one is better and why? [1 Marks]

Ans: 
The normal equation for linear regression exists as:
$$
\theta = (X^TX)^{-1}X^TY
$$

In the original notebook, we used the 'np.linalg.inv' function calculate the inverse which might be computationally expensive and less stable for large matrices. Thus, we may also use 'np.linalg.solve' method directly to solve the above system of linear equations.

The code for the same will be:

In [5]:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
x1 = np.array([1, 2, 3])
x2 = 2*x1
y = np.array([4, 6, 8])
all_ones = np.ones(x1.shape[0])
X = np.array([all_ones, x1, x2]).T

def solve_normal_equation(X, y):
    try:
        theta = np.linalg.solve(X.T @ X, X.T @ y)
        return theta
    except np.linalg.LinAlgError:
        print('The matrix is singular')
        print("X.T @ X = \n", X.T @ X)
        return None

theta = solve_normal_equation(X, y)
print("Theta using np.linalg.solve:", theta)

The matrix is singular
X.T @ X = 
 [[ 3.  6. 12.]
 [ 6. 14. 28.]
 [12. 28. 56.]]
Theta using np.linalg.solve: None


If we were to compare both the functions and their utilities, 'np.linalg.inv' method involves the calculation of matrix inverse and thus has a time complexity of the order of O(n^3), where n is the size of the matrix. This is computationally expensive for large matrices, as it requires more operations.
Whereas for 'np.linalg.solve' method, the system of linear equations is solved directly using the specialized algorithm, LU Decomposition, which lowers the time complexity than matrix inversion. 

Also, the process of inverting a matrix may lead to numerical instability in cases of a singular matrix, i.e., when the determinant of the matrix is zero or approaches zero. Whereas there is no such case in the 'np.linalg.solve' method unlike the 'np.linalg.inv' method.

Q3) Referring to the same notebook, explain why Sklearn's linear regression implementation is robust against multicollinearity. Dive deep into Sklearn's code and explain in depth the methodology used in sklearn's implementation. [1 Marks]

Ans: 
Sklearn's LinearRegression uses a more advanced algorithm called the Ordinary Least Squares (OLS) method, similar to the normal equation, but with additional enhancements. It internally handles the issue of multicollinearity by employing Singular Value Decomposition (SVD).

The goal of linear regression is to find the coefficients θ that minimize the sum of squared differences between the predicted values and the actual values. This is typically formulated as the least squares problem.
The ordinary least squares solution involves solving the system of equations: $$ X^TX\theta = X^Ty $$ for θ, where X is the feature matrix and y is the target variable.



The SVD method decomposes the feature matrix X into three matrices U, Σ, and V, such that X = UΣV. The coefficients $\theta$ are then calculated as $\theta = V \cdot \text{diag}\left(\frac{1}{\sigma_i}\right) \cdot U^T \cdot y$, where $\sigma_i$ are the singular values obtained from the diagonal of $\Sigma$.
The SVD approach is numerically stable, hence works even for ill-conditioned matrices, i.e. matrices whose ratio of largest to smallest singular value in the matrix is very high.

The implementation includes checks for edge-cases, such as singular or nearly singular matrices which may lead to issues in the matrix inversion process.

In [6]:
import numpy as np
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
x1 = np.array([1, 2, 3])
x2 = 2*x1
data = np.array([x1, x2]).T
y = np.array([4, 6, 8])
lr.fit(data, y)
lr.coef_, lr.intercept_

(array([0.4, 0.8]), 2.0)