# Multi-Variable Linear Regression

If you're embarking on a project that requires predicting outcomes based on several input variables, **multi-variable linear regression** is an important technique. This approach builds upon simple linear regression, which predicts a dependent variable using a single independent variable, by incorporating multiple independent variables.

## Introduction to Multi-Variable Linear Regression

**Multi-variable linear regression**, also known as **multiple linear regression** or **multivariate linear regression**, is a statistical method used to model the relationship between a dependent variable and two or more independent variables. The goal is to predict the dependent variable's value based on the independent variables.

## How It Works

The formula for a multi-variable linear regression with `n` independent variables is:

y = b0 + b1x1 + b2x2 + ... + bnxn + ε

- `y` is the predicted value of the dependent variable.
- `x1, x2, ..., xn` are the independent variables.
- `b0` is the y-intercept or the prediction when all x's are zero.
- `b1, b2, ..., bn` are the coefficients, representing the influence of each independent variable on the prediction.
- `ε` represents the error term, which is the part of `y` that isn't explained by the x variables.

## The Mathematics Behind It

To determine the optimal coefficients (`b0, b1, ..., bn`) that minimize prediction errors, we use a cost function, typically the **Mean Squared Error (MSE)**. The MSE calculates the average squared difference between the actual and predicted values.

Optimization is usually achieved through **Gradient Descent** or by solving the **Normal Equation**. Gradient Descent adjusts the coefficients iteratively to minimize the MSE, while the Normal Equation computes the coefficients directly through matrix operations.

## Optimizing the Coefficients

In **Gradient Descent**, coefficients are updated using the rule:

bi := bi - α ∂/∂bi MSE

- `bi` is the coefficient's current value.
- `α` is the learning rate, controlling the step size towards the minimum MSE.
- `∂/∂bi MSE` is the partial derivative of the MSE with respect to `bi`.

The learning rate is a hyperparameter that needs to be chosen with care to ensure proper convergence.

With the **Normal Equation**, coefficients are calculated as:

b = (XTX)^−1*XTy

- `X` is the matrix of input features.
- `XT` is the transpose of X.
- `y` is the vector of observed values for the dependent variable.
- `(XTX)^−1XTy` is the matrix operation yielding the coefficient vector.

 It's a closed-form solution, meaning it gives us the exact coefficients in one calculation, as opposed to iterative methods like Gradient Descent, which approach the solution step by step. The Normal Equation method is straightforward but can be computationally intensive for large datasets or when `(XTX)` is not invertible.

## Conclusion

**Multi-variable linear regression** is a potent analytical tool, enabling the modeling of complex relationships between variables. By optimizing coefficients, it allows for nuanced predictions and insights into how each independent variable influences the dependent variable. This technique is widely applicable across various domains, including finance, healthcare, and education, providing a data-driven foundation for decision-making and analysis.


In [1]:
%jars /home/vishnuaa77/vscode/vishnu/lib/commons-math3-3.6.1.jar
%jars /home/vishnuaa77/vscode/vishnu/lib/jfreechart-1.5.4.jar

In [6]:
import org.apache.commons.math3.linear.Array2DRowRealMatrix;
import org.apache.commons.math3.linear.ArrayRealVector;
import org.apache.commons.math3.linear.LUDecomposition;
import org.apache.commons.math3.linear.RealMatrix;
import org.apache.commons.math3.linear.RealVector;

public class BasketballPerformanceRegression {

    public static void main(String[] args) {
        // Mock data for 5 basketball players
        double[][] xData = {
            {22, 7, 10, 35}, // Player 1: points, assists, rebounds, minutes
            {15, 12, 8, 30}, // Player 2
            {30, 4, 12, 40}, // Player 3
            {8,  10, 5, 25}, // Player 4
            {27, 3, 7,  38}  // Player 5
        };
        double[] yData = {90, 75, 85, 65, 88}; // Performance scores for each player

        double[] coefficients = calculateCoefficients(xData, yData);
        System.out.println("Coefficients: " + java.util.Arrays.toString(coefficients));
    }

    public static double[] calculateCoefficients(double[][] xData, double[] yData) {
        int n = xData.length;
        int m = xData[0].length;
        RealMatrix X = new Array2DRowRealMatrix(n, m + 1); // +1 for bias term
        RealVector Y = new ArrayRealVector(yData, false);

        for (int i = 0; i < n; i++) {
            X.setEntry(i, 0, 1);  // Bias term
            for (int j = 0; j < m; j++) {
                X.setEntry(i, j + 1, xData[i][j]);
            }
        }

        RealMatrix Xt = X.transpose();
        RealMatrix XtX = Xt.multiply(X);
        RealMatrix XtXInverse = new LUDecomposition(XtX).getSolver().getInverse();
        RealVector XtY = Xt.operate(Y);
        RealVector B = XtXInverse.operate(XtY);

        return B.toArray();
    }
}

BasketballPerformanceRegression.main(null);


Coefficients: [-500.9701492539607, -18.805970149260247, -0.5970149253726049, -0.8208955223876728, 29.059701492544264]
