### Multiple Linear Regression

![image.png](attachment:image.png)

#### Assumptions of Linear Regression
You can see in the following image that linear regression does a good job for 1 feature: 

![image-2.png](attachment:image-2.png)

But if you apply the same to other features this is what you get: 

![image-3.png](attachment:image-3.png)

Now this can be misleading

![image-4.png](attachment:image-4.png)

### Building a Model

![image.png](attachment:image.png)

#### All-in Cases 
![image-2.png](attachment:image-2.png)

#### Backward Elimination 
![image-4.png](attachment:image-4.png)

#### Forward Selection 
![image-3.png](attachment:image-3.png)

#### Bidirectional Elimination 
![image-5.png](attachment:image-5.png)

#### All Possible Models 
![image-6.png](attachment:image-6.png)


#### Importing the libraries

In [53]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 

#### Importing the dataset

In [54]:
dataset = pd.read_csv('50_Startups.csv')

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

#### Encoding Categorical Data

In [55]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

# On using the passthrough attribute, the method returns entire array with the modifications
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

#### Splitting the dataset into training set and test set

In [56]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#### Training the Multiple Linear Regression Model on the Training set

In [59]:
from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

#### Testing the regression model for Test Set results

In [66]:
y_pred = regressor.predict(X_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))

[[103015.2  103282.38]
 [132582.28 144259.4 ]
 [132447.74 146121.95]
 [ 71976.1   77798.83]
 [178537.48 191050.39]
 [116161.24 105008.31]
 [ 67851.69  81229.06]
 [ 98791.73  97483.56]
 [113969.44 110352.25]
 [167921.07 166187.94]]


### Learning in public
1. Assumptions of Linear Regression
2. Anscombe's quartet
3. Dummy Variables & Dummy Variables Trap (Always use 1 dummy variables)
4. Understanding the p value (Compare it with bombay taco house thing)
5. What is Multiple Linear Regression model 