## Multiple Linear Regression

Type: Supervised Learning  
No. of Input Variables = n  
No. of Output Variables = 1  
  
Equation: y = b0 + b1*x1 + b2*x2 + ... + bn*xn   

### Assumptions of Linear Regression

1. Linearity  
2. Homoscedasticity  
3. Multivariate Normality  
4. Independence of Errors  
5. Lack of Multicollinearity  

### Methods of Building Models

1. All-in
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score comparison

### Implementation of Backward Elimination

### Importing the libraries

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Importing the Datset

In [5]:
dataset = pd.read_csv("50_startups.csv")

In [6]:
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [7]:
dataset.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94


### Encoding Categorical Data

In [8]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')
x = ct.fit_transform(x)

### Splitting the Dataset in Training set and Test set

In [11]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

### Training the Multiple Regression model on the Training set

In [12]:
from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(x_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

### Predicting the Test Results

In [13]:
y_pred = regressor.predict(x_test)

In [18]:
np.set_printoptions(precision=2)
print(np.concatenate( ((y_pred.reshape(len(y_pred), 1)), y_test.reshape(len(y_test), 1)), 1))

[[103015.2  103282.38]
 [132582.28 144259.4 ]
 [132447.74 146121.95]
 [ 71976.1   77798.83]
 [178537.48 191050.39]
 [116161.24 105008.31]
 [ 67851.69  81229.06]
 [ 98791.73  97483.56]
 [113969.44 110352.25]
 [167921.07 166187.94]]
