1. **Adding the Bias Term to X_train:**

   The bias term (intercept) is added to the training data `X_train` by inserting a column of 1's at the beginning of the matrix.

   If `X_train` is an **m×n** matrix (where m  is the number of samples and n is the number of features), after adding the bias term, `X_train` becomes an **m×(n+1)** matrix.

2. **Calculating the Coefficients (betas):**

   The code uses the Normal Equation to compute the coefficients (including the intercept). The Normal Equation is given by:

   $$
   \beta = (X^T X)^{-1} X^T y
   $$

   - X is the training data matrix after adding the bias term.
   - X^T is the transpose of X.
   - X^T X results in an **(n+1)×(n+1)** matrix.
   - $$ (X^T X)^{-1} $$ is the inverse of the resulting matrix.
   - X^T y results in an **(n+1)×1** vector.
   - The result, β, is an **(n+1)×1** vector containing the intercept β0 and the coefficients $$ \beta_1, \beta_2, \dots, \beta_n $$.

3. **Extracting the Intercept and Coefficients:**

   - `self.intercept_ = betas[0]`: This assigns the first element of β (i.e., β0 to the intercept.
   - `self.coef_ = betas[1:]`: This assigns the remaining elements of β i.e., $$ \beta_1, \beta_2, \dots, \beta_n $$ to the coefficients.

4. **Making Predictions (predict method):**

   For making predictions on new data (`X_test`), the linear regression model uses the equation:

   $$
   \hat{y} = X_{\text{test}} \cdot \beta_{\text{coefficients}} + \beta_{\text{intercept}}
   $$

   - $$ X_{\text{test}} $$ is the test data matrix (without the bias term).
   - $$ \beta_{\text{coefficients}} $$ is the vector of coefficients (excluding the intercept).
   - $$ \beta_{\text{intercept}} $$ is added to each prediction.

   This equation calculates the predicted values for each row in `X_test`.

   **Expanded Form:**

   For a single data point x_i in `X_test`, the predicted value $$ \hat{y}_i $$ is calculated as:

   $$
   \hat{y}_i = \beta_0 + \beta_1 \cdot x_{i1} + \beta_2 \cdot x_{i2} + \dots + \beta_n \cdot x_{in}
   $$

   Where:
    $$ \beta_0 $$ is the intercept.
    $$ \beta_1, \beta_2, \dots, \beta_n $$ are the coefficients.
    $$ x_{i1}, x_{i2}, \dots, x_{in} $$ are the feature values for the i-th data point.

In [12]:
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd

In [13]:
class MLR:

    def __init__(self):
        self.coef_ = None
        self.intercept_ = None

    def fit(self, X_train, y_train):
        X_train = np.insert(X_train, 0, 1, axis=1)

        # Calculate the coefficients
        betas = np.linalg.inv(np.dot(X_train.T, X_train)).dot(X_train.T).dot(y_train)
        self.intercept_ = betas[0]
        self.coef_ = betas[1:]

    def predict(self, X_test):
        y_pred = np.dot(X_test, self.coef_) + self.intercept_
        return y_pred

    def visualize_regression_plane(self, X_train, y_train):
        if X_train.shape[1] != 2:
            raise ValueError("This visualization only works when there are exactly 2 features.")

        # Create a DataFrame for easier handling in Plotly
        df = pd.DataFrame({
            'feature1': X_train[:, 0],
            'feature2': X_train[:, 1],
            'target': y_train
        })

        # Create the grid for the regression plane
        x = np.linspace(X_train[:, 0].min(), X_train[:, 0].max(), 10)
        y = np.linspace(X_train[:, 1].min(), X_train[:, 1].max(), 10)
        xGrid, yGrid = np.meshgrid(x, y)
        final = np.vstack((xGrid.ravel(), yGrid.ravel())).T

        # Predict the z values on the grid
        z_final = self.predict(final).reshape(10, 10)

        # Plot the scatter and the regression plane
        fig = px.scatter_3d(df, x='feature1', y='feature2', z='target')
        fig.add_trace(go.Surface(x=x, y=y, z=z_final, colorscale='Viridis', opacity=0.5))

        # Show the figure
        fig.show()

In [14]:
# Example usage with non-singular data:
X_train = np.array([[1, 2], [2, 3], [3, 5], [4, 6], [5, 8]])  # Features are not linearly dependent
y_train = np.array([4, 6, 7, 10, 11])

mlr = MLR()
mlr.fit(X_train, y_train)
mlr.visualize_regression_plane(X_train, y_train)