## Model Perfomance Testing Notebook

## Importing the required libraries


In [20]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

## Preparing the dataset

In [21]:
df = pd.read_csv("dataset/gad.csv")
df = df.iloc[:,1:]
df.head()

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,337,118,4,4.5,4.5,9.65,1,0.92
1,324,107,4,4.0,4.5,8.87,1,0.76
2,316,104,3,3.0,3.5,8.0,1,0.72
3,322,110,3,3.5,2.5,8.67,1,0.8
4,314,103,2,2.0,3.0,8.21,0,0.65


## Importing the required libraries for regression analyzes

In [30]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

## Spliting the dataset into training and testing data

In [31]:
x = df[["GRE Score","TOEFL Score","University Rating","SOP","LOR ","CGPA", "Research"]]
y = df["Chance of Admit "].values.reshape(-1,1)

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=42)

## MULTIPLE LINEAR REGRESSION

In [32]:
#implying multiple linear regression and determining its score

multiple_lin_reg = LinearRegression()
multiple_lin_reg.fit(x_train,y_train)

y_pred_mlr = multiple_lin_reg.predict(x_test)

r2_score_mlr = r2_score(y_test,y_pred_mlr)
print("Mutiple Linear Regression's Score = {:.3f}".format(r2_score_mlr))


Mutiple Linear Regression's Score = 0.819


## DECISION TREE REGRESSION

In [33]:
#implying decision tree regression and determining its score

tree_reg = DecisionTreeRegressor()
tree_reg.fit(x_train,y_train)

y_pred_tree = tree_reg.predict(x_test)

r2_score_tree = r2_score(y_test,y_pred_tree)
print("Decision Tree Regression's Score = {:.3f}".format(r2_score_tree))

Decision Tree Regression's Score = 0.572


## RANDOM FOREST REGRESSION

In [34]:
#implying random forest regression and determining its score

ran_for_reg = RandomForestRegressor(n_estimators=100,random_state=42)
ran_for_reg.fit(x_train,y_train)

y_pred_rfr = ran_for_reg.predict(x_test)

r2_score_rfr = r2_score(y_test,y_pred_rfr)
print("Random Forest Regression's Score = {:.3f}".format(r2_score_rfr))

Random Forest Regression's Score = 0.787


## CONCLUSION

R^2 score is an indicator of accuracy of Regression Models, and the accuracy is measured as close to 1 of this value. Therefore, as seen, Multiple Linear Regression Model is better than Decision Tree Regression and Random Forest Regression on this dataset when comparing their R^2 scores.

## Persisting the Multiple Linear Regression Model in a Pickle File

In [35]:
import pickle
pickle.dump(multiple_lin_reg, open("Multiple_Linear_Regression.pkl", "wb"))

## PERFORMANCE TESTING

In [37]:
#Finding Model Performance Metrics of the finalised model used
from sklearn import metrics
x = df[["GRE Score","TOEFL Score","University Rating","SOP","LOR ","CGPA", "Research"]]
y = df["Chance of Admit "].values.reshape(-1,1)

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,random_state=42)

multiple_lin_reg = LinearRegression()
multiple_lin_reg.fit(x_train,y_train)

y_pred = multiple_lin_reg.predict(x_test)
print("Multiple Linear Regression")
print("Mean Absolute Error     :", metrics.mean_absolute_error(y_test,y_pred))

print("Mean Squared Error      :", metrics.mean_squared_error(y_test,y_pred))

print("Root Mean Squared Error :", metrics.mean_squared_error(y_test,y_pred,squared=False))

print("R2 Score                :", metrics.r2_score(y_test,y_pred))


Multiple Linear Regression
Mean Absolute Error     : 0.04272265427705367
Mean Squared Error      : 0.0037046553987884114
Root Mean Squared Error : 0.06086588041578312
R2 Score                : 0.8188432567829628
