#### Multiple Linear Regression

- Multiple linear regression is a statistical technique used to model the relationship between a dependent variable and two or more independent variables.
- It extends the concept of simple linear regression, which only considers one independent variable, to multiple predictors.
- In multiple linear regression, the goal is to find the best-fitting linear equation that describes the relationship between the dependent variable and the independent variables.


#### Step 1:
### Data Preparation:
- - The dataset contains more than one features(independent variables) and one target(dependent) variable.
- - features are represented as 'x', with its weight/co-efficients as 'a' and dependent variable as 'y'.


In [24]:
import numpy as np

X_train = np.array([[5, 10],
              [6, 8],
              [3, 10],
              [6, 7],
              [5, 5],
              [13, 5]              
                ])

Y_train = np.array([[85],
               [70],
               [90],
               [75],
               [40],
               [89]
             ])

X_test =  np.array([[5, 9],
                  [2, 15]])

Y_test = np.array([[75],
               [98]])

#### Step 2:

### Representation:
- the model is represented as 
- Y = Xβ + ε
- where Y = depenedent variable
- X = independent variables
- β = co-efficient/ weights
- ε = error term
- add a column of ones to X for t-intercept.

- There are two independent variables X(x1, x2), and one dependent variable Y in the above example data.

#### Step 3:

### Estimating Co-Efficient (β)
- The goal is to estimate the coefficients β that minimize the sum of squared errors (SSE) between the actual values Y and the predicted values Xβ.
- SSE = (Y - Xβ)'(Y - Xβ)
- d(SSE) / dx = 0
- we get X'Xβ = X'Y i.e., β = (1/X'X) * (X'Y)


#### Step 4:
### Predict Target value
- calculate Y = Xβ 

In [25]:
class MultipleLinearRegression:
    def __init__(self):
        self.weights = None
        
    def estimate_weights(self, X, Y):
        #add ones column for y intercept to X dataset
        ones = np.ones((X.shape[0], 1))
        X = np.concatenate((ones, X), axis = 1)
        self.weights = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(Y) #(1/X'X) * (X'Y)
    
    def predict(self, X):
        ones = np.ones((X.shape[0], 1))
        X = np.concatenate((ones, X), axis = 1)
        return X.dot(self.weights) #Y = Xβ
    
    def evaluate(self, X, Y):
        predictions = self.predict(X)
        mse = np.mean((Y - predictions) ** 2)
        rmse = np.sqrt(mse)
        return predictions, mse, rmse

#### Step 5:

### Evaluate Model

Evalute the model by calculating Mean Squared Error, Root Mean Squared Error and R_Squared. The goal is to minimize the error.

In [26]:
# we will train the model using the current X and Y data set.
# Ideally we need to divide the dataset into train, val and test.
# since this is a simple example we are using the whole dataset to train the model and give new data to test the model.

model = MultipleLinearRegression()
model.estimate_weights(X_train, Y_train) #model_fitting

prediction = model.predict(X_test)

mse = np.mean((Y_test - prediction) ** 2)
print('prediction : ', prediction)
print('MSE : ', mse)




prediction :  [[ 82.19444444]
 [124.27777778]]
MSE :  371.1408179012156


In [None]:
## we need to increase the dataset to redu

In [27]:
#