## Predicting Profit of Ventures

**Import Libraries**

In [14]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

**Import Dataset**

In [15]:
# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

**Encode Categorical Data**

In [16]:
#One Hot Encoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3] )], remainder='passthrough')
X = np.array(ct.fit_transform(X))

**Split Dataset into Train and Test Sets**

In [17]:
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

***In MLR there is no use of feature scaling because of the presence of coefficients***

**Train MLR Model on Training Set**

*NOTE: The MLR Class below will automatically avoid the "dummy variable trap"!*

*NOTE: The MRL Class below also will autically identify the best features to use(highest PValues) to use!*

In [18]:
regressor = LinearRegression()   #this class can complete both Simple and Multiple LR
regressor.fit(X_train, y_train)  #Train the test sets

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

**Predicting the Test set Results**

In [24]:
y_pred = regressor.predict(X_test) #Create test prediction array
np.set_printoptions(precision=2)   #Round print statement to 2 decimal points
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)), 1))  #Print prediction array and actual test array values side by side

[[103015.2  103282.38]
 [132582.28 144259.4 ]
 [132447.74 146121.95]
 [ 71976.1   77798.83]
 [178537.48 191050.39]
 [116161.24 105008.31]
 [ 67851.69  81229.06]
 [ 98791.73  97483.56]
 [113969.44 110352.25]
 [167921.07 166187.94]]


**+++Making Single Prediction, i.e. RD Spend=40000, AS=150000, MS=300000, State=California+++**

In [25]:
print(regressor.predict([[1,0,0,40000, 150000, 300000]]))

[89408.55]


**+++Outputting final MLR equation+++**

In [30]:
print(f'PROFIT = {regressor.intercept_} + ({regressor.coef_[0]}*DummySate1) + ({regressor.coef_[1]}*DummyState2) + ({regressor.coef_[2]}*DummyState3) + ({regressor.coef_[3]}*RDSpend) + ({regressor.coef_[4]}*AdminSpend) + ({regressor.coef_[5]}*MarketSpend)')

PROFIT = 42467.52924855791 + (86.63836917759322*DummySate1) + (-872.6457908716875*DummyState2) + (786.0074217041533*DummyState3) + (0.7734671927324567*RDSpend) + (0.03288459753630255*AdminSpend) + (0.0366100258639116*MarketSpend)
