# Multiple Linear Regression

Multiple Linear Regression is used when the dependent varibale (y) depends on multiple independent variables(x).

Equation: 

y = m1x1 + m2x2 + m3x3 + ... + mnxn + c

When encoding categorical data into dummy variables, there is no need to take all the dummy variables, as the constant can compensate for that dummy variable. Generally when there are n dummy variables, we take n-1 dummy variables into the equation as the last dummy variable can be found out using the others.
Example: 
For the dummy variables D1 and D2, D2 = 1-D1, which means that if D1 value is 0, then D2 value is one.
For the general perspective Dn = 1 - max(D1, D2, D3, ... , Dn-1)

P-Value: P-value indicates whether to accept or reject the null hypothesis, by specifying the significane level, we can tell with a level of confidence whether to accept or reject the null hypothesis.

It is advised to take limited independent variables to predict the dependent variable. Generally the more the independent variables, the more complex the model becomes and more chances for the model to give out wrong predictions.

There are five ways to pick a model: 
1) All in(Using all independent variables)
2) Backward Elimination
3) Forward Selection
4) Bidirectional Elimination
5) Score Comparision(Making all possible models and choosing the best one, there can be 2^N-1 total combination for N independent variables)

There are three ways to eliminate unnecessary independent variables:
1) Backward Elimination:
    1) Select a significance level for the model
    2) Fit the model with all possible independent variables
    3) Consider the variable with the highest P-value, if P > Significane Level, go to Step 4, else Finish
    4) Remove the predictor
    5) Fit the model without the variable
    
2) Forward Selection:
    1) Select a significance level for the model
    2) Fit all simple regression models Select the one with the lowest P-value
    3) Keep this variable and fit all models with one extra predictor variable
    4) Consider the predictor with the lowest P value. If P < Significance Level, go to Step 3, else Finish
    5) Keep the previous model

3) Bidirectional Elimination:
    1) Select a significane level for the entering and staying in the model SEnter and SStay
    2) Perform the next step of forward selection (new variable nust have P < SEnter)
    3) Perform all steps of backward elimination (old variale must have P < SStay)
    4) If no new variables can enter and no old variable can leave, Finish

There is no need to apply feature scaling in multiple linear regression as the coefficients of x variables compensate to put each variable on the same scale.

## Importing the libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plot

## Importing the dataset

In [None]:
dataset = pd.read_csv("50_Startups.csv")
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, -1].values

## Encoding Categorical Data

In [None]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

ColumnTransformerObj = ColumnTransformer(transformers=[('encoder', OneHotEncoder(),[3])], remainder='passthrough')

X = np.array(ColumnTransformerObj.fit_transform(X))
print(X)

## Splitting the dataset into Training set and Test set

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=0)

## Traning the Mutiple Regression model based on the Training Data

In [None]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, Y_train)

## Predicting the Test Set results

In [None]:
Y_predict = regressor.predict(X_test)
np.set_printoptions(precision=2)
comparision = np.concatenate((Y_predict.reshape(len(Y_predict), 1), Y_test.reshape(len(Y_test), 1)), axis=1)
print(comparision)