Multiple LINEAR REGRESSION: y = intercept + coef1(x1) + coef2(x2) + coef3(x3) + errorterm
- Linear Regression function parameter:
    1. X:np.array -> [[1,2,3],[4,5,6]] "two-Dimension Array"
    2. y:np.array  -> [4,5,6] 'One-Dimension Array"
    * returns -> [coef1, coef2,coef3] and intercept
- Predictions function parameter:
    1. coefficients: np.array "One-Dimension"
    2. intercept: float
    3. X: np.array -> [[1,2,3],[4,5,6]] "Two-Dimension Array"
    * returns -> predictions:np.array "one-Dimension array"
- R2 function:
    1. y_true: np.array -> [1,2,3]
    2. predictions: np.array -> [1,2,3]
    * returns -> R2:float

In [58]:
import numpy as np

def multiple_linear_regression(X:np.array, y:np.array)-> np.array:
   
    # Add a column of ones to X for the intercept term
    X_extended = np.c_[np.ones(X.shape[0]), X]

    # Calculate coefficients (b0, b1, ..., bn) using the normal equation
    #np.linalg.inv is calculate the matrix inverse for the
    #X_extended.t @ X_extended ; which represents dot function of transpose 
    #bcs normal equation involves inverse matrix 
    # and the X.entended.T @ y ; is for calculating the coefficients 
    coefficients = np.linalg.inv(X_extended.T @ X_extended) @ X_extended.T @ y

    # Extract the intercept and coefficients
    intercept = coefficients[0]
    coefficients = coefficients[1:]

    return coefficients, intercept

# Example usage:
# Replace X_data and y_data with your own data
#X_data = np.array([[1, 2], [2, 3], [3, 4]])
#y_data = np.array([5, 8, 10])

#coefficients, intercept = multiple_linear_regression(X_data, y_data)

#print("Coefficients:", coefficients)
#print("Intercept:", intercept)


Formula explain 
Adding a Column of Ones (Intercept):

np.c_[np.ones(X.shape[0]), X]: This line creates a new matrix X_extended by adding a column of ones to the input feature matrix X. The column of ones is added to account for the intercept term in the linear regression model.
Normal Equation:

np.linalg.inv(X_extended.T @ X_extended): This part calculates the inverse of the matrix product of the transpose of X_extended and X_extended. This is a step in solving the normal equation.

X_extended.T @ X_extended represents the dot product of the transpose of X_extended with itself. The normal equation involves the inverse of this matrix.

np.linalg.inv calculates the matrix inverse.

Calculating Coefficients:

@ X_extended.T @ y: After obtaining the inverse of the matrix, the code multiplies it by the transpose of X_extended and then by the target variable y.

The result of this multiplication gives the coefficients (parameters) for the linear regression model. These coefficients include the intercept (b0) and the slopes for each feature (b1, b2, ..., bn).

In summary, the code is solving the normal equation to find the coefficients for a multiple linear regression model. The normal equation provides a closed-form solution to the linear regression problem by directly calculating the coefficients that minimize the sum of squared differences between the predicted and actual values.

In [59]:
#Prediction function

def predict_multiple_linear_regression(X:np.array, coefficients:np.array, intercept:float)-> np.array:
    # Add a column of ones to X for the intercept term
    X_extended = np.c_[np.ones(X.shape[0]), X]

    # Calculate predictions
    y_pred = X_extended @ np.hstack((intercept, coefficients))

    return y_pred

# Example usage:
# Replace X_data, coefficients, and intercept with your own data
#X_data = np.array([[1, 2], [2, 3], [3, 4]])
#coefficients = np.array([2, 3])
#intercept = 1

#predictions = predict_multiple_linear_regression(X_data, coefficients, intercept)

#print("Input features (X):")
#print(X_data)
#print("\nPredicted values:")
#print(predictions)


In [60]:
#Compuet r2 function
#compute R2 - coef of determination = 1- (SSR/SST)
def compute_r2(y_true, y_pred):
    residual = y_true - y_pred
    mean_y_true = np.mean(y_true)
    #** is power of
    #total sum of square
    total_variance = np.sum((y_true - mean_y_true) ** 2)
    #sum of squared errors = SSR
    explained_variance = np.sum((residual) ** 2)

    r2 = 1- (explained_variance / total_variance)

    return r2

REAL DATA APPLICATIONS

In [61]:

# using loadtxt()
Data = np.loadtxt("Startups.csv",
                 delimiter=",", dtype=str)

#display(Data)
#print(Data.ndim)
#print(Data.shape)
#print(Data.size)

In [62]:
#Process Data
Data_without_first_row = np.delete(Data, 0, axis=0)
shuffled_data = np.random.permutation(Data_without_first_row)
#print("Original 2D array:")
#print(Data)
#print("Shuffled copy:")
#print(shuffled_data)

In [63]:
#training and testing set size
train_size=int(0.75*np.size(shuffled_data,0))
test_size=int(0.25*np.size(shuffled_data,0))
print("Training set size : "+ str(train_size))
print("Testing set size : "+str(test_size))

Training set size : 37
Testing set size : 12


In [64]:
#SPLIT FEATURES AND PREDICTOR
#Getting features from dataset select column number
X=shuffled_data[:,[0,1,2]]
y=shuffled_data[:,4]

In [65]:
#SPLIT TRAIN/TEST (make sure X is 2d array and y is 1d array)
#training set split
X_train=X[0:train_size,:]
X_train = X_train.astype(np.float_)
y_train=y[0:train_size]
y_train = y_train.flatten()
y_train = y_train.astype(np.float_)

#testing set split
X_test=X[train_size:,:]
X_test = X_test.astype(np.float_)
Y_test=y[train_size:]
Y_test = Y_test.flatten()
Y_test = Y_test.astype(np.float_)

In [66]:
#Find beta coef and Intercept
coefficients, intercept = multiple_linear_regression(X_train, y_train)
#predictions function
predictions = predict_multiple_linear_regression(np.array(X_train), np.array(coefficients), intercept)
#Compute R2
r2_train = compute_r2(np.array(y_train),predictions)

print("Coefficients:", coefficients)
print("Intercept:", intercept)
print("R-squared :", r2_train)


Coefficients: [ 0.77173952 -0.01549975  0.02528518]
Intercept: 51898.258181425525
R-squared : 0.9569100181557307
