<a href="https://colab.research.google.com/github/aakashkumarme/TFLEARN/blob/main/4_2_Evaluate_Regression_Model_and_Scoring_paramater_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Regression Model Evaluation Metrics/Techniques**

R^2 (pronounced r-squared) or the coefficient of determination - Compares your model's predictions to the mean of the targets. Values can range from negative infinity (a very poor model) to 1. For example, if all your model does is predict the mean of the targets, its R^2 value would be 0. And if your model perfectly predicts a range of numbers it's R^2 value would be 1.

Mean absolute error (MAE) - The average of the absolute differences between predictions and actual values. It gives you an idea of how wrong your predictions were.

Mean squared error (MSE) - The average squared differences between predictions and actual values. Squaring the errors removes negative errors. It also amplifies outliers (samples which have larger errors).

Which regression metric should you use?

R2 is similar to accuracy. It gives you a quick indication of how well your model might be doing. Generally, the closer your R2 value is to 1.0, the better the model. But it doesn't really tell exactly how wrong your model is in terms of how far off each prediction is.

MAE gives a better indication of how far off each of your model's predictions are on average.

As for MAE or MSE, because of the way MSE is calculated, squaring the differences between predicted values and actual values, it amplifies larger differences. Let's say we're predicting the value of houses (which we are).

Pay more attention to MAE: When being $10,000 off is twice as bad as being $5,000 off.

Pay more attention to MSE: When being $10,000 off is more than twice as bad as being $5,000 off.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_boston
boston = load_boston()

boston_df = pd.DataFrame(boston["data"], columns = boston["feature_names"])
boston_df["target"] = pd.Series(boston["target"])
boston_df.head()

X = boston_df.drop("target" , axis=1)
y = boston_df["target"]
 
np.random.seed(42)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)

from sklearn.ensemble import RandomForestRegressor
clf = RandomForestRegressor()
clf.fit(X_train,y_train)
clf.score(X_test,y_test)

0.873969014117403

r2_score or coefficient of determination

1.   Compares your model prediction with , "mean of  target".



In [2]:
from sklearn.metrics import r2_score 

#lets get y_test mean
y_test_mean = np.full(len(y_test),y_test.mean())
y_test.mean()

21.488235294117654

In [5]:
r2_score(y_test, y_test_mean)

2.220446049250313e-16

In [4]:
r2_score(y_test,y_test)

1.0

## **MAE**

In [7]:
from sklearn.metrics import mean_absolute_error

y_preds = clf.predict(X_test)

mae = mean_absolute_error(y_test,y_preds)
mae

2.1226372549019623

In [12]:
df = pd.DataFrame(data={"Actual values" : y_test,
                       "predicted_values" : y_preds})
df["differences"] = df["predicted_values"] - df["Actual values"]
df

Unnamed: 0,Actual values,predicted_values,differences
173,23.6,23.002,-0.598
274,32.4,30.826,-1.574
491,13.6,16.734,3.134
72,22.8,23.467,0.667
452,16.1,16.853,0.753
...,...,...,...
412,17.9,13.030,-4.870
436,9.6,12.490,2.890
411,17.2,13.406,-3.794
86,22.5,20.219,-2.281


# MSE

In [15]:
from sklearn.metrics import  mean_squared_error

y_preds = clf.predict(X_test)

mse = mean_squared_error(y_test,y_preds) 
mse

9.242328990196082

### ***Using Scoring parameter for classification***

In [26]:
import pandas as pd
import numpy as np


from sklearn.ensemble import RandomForestClassifier

np.random.seed(42)

heart_diseases = pd.read_csv("/content/drive/MyDrive/dataset/heart-disease.csv")
heart_diseases.head()

X = heart_diseases.drop("target",axis = 1)
y = heart_diseases["target"]

clf = RandomForestClassifier(n_estimators=100)



#scoring=None

In [29]:
from sklearn.model_selection import cross_val_score
#scoring=None
np.random.seed(42)
cv_acc = cross_val_score(clf,X,y,cv=5,scoring=None)
cv_acc
print (f'{np.mean(cv_acc)*100:.2f}%')

82.48%


if nothing is passed in score - accuracy is default

In [33]:
np.random.seed(42)
cv_acc = cross_val_score(clf,X,y,cv=5,scoring="accuracy")
cv_acc
print (f'{np.mean(cv_acc)*100:.2f}%')

82.48%


In [34]:
np.random.seed(42)
cv_acc = cross_val_score(clf,X,y,cv=5,scoring="precision")
cv_acc
print (f'{np.mean(cv_acc)*100:.2f}%')

83.30%


In [35]:
np.random.seed(42)
cv_acc = cross_val_score(clf,X,y,cv=5,scoring="recall")
cv_acc
print (f'{np.mean(cv_acc)*100:.2f}%')

85.45%


In [36]:
np.random.seed(42)
cv_acc = cross_val_score(clf,X,y,cv=5,scoring="f1")
cv_acc
print (f'{np.mean(cv_acc)*100:.2f}%')

84.27%


# Using scoring parameter for regression model

In [37]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_boston
boston = load_boston()

boston_df = pd.DataFrame(boston["data"], columns = boston["feature_names"])
boston_df["target"] = pd.Series(boston["target"])
boston_df.head()

X = boston_df.drop("target" , axis=1)
y = boston_df["target"]
 
np.random.seed(42)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)

from sklearn.ensemble import RandomForestRegressor
clf = RandomForestRegressor()
clf.fit(X_train,y_train)
clf.score(X_test,y_test)

0.873969014117403

In [41]:
from sklearn.model_selection import cross_val_score
cv_r2 = cross_val_score(clf,X,y,cv=5,scoring=None)
cv_r2

array([0.77128102, 0.85496899, 0.75244145, 0.47998463, 0.27525122])

In [42]:
cv_mae =  cross_val_score(clf,X,y,cv=5,scoring="neg_mean_absolute_error")
cv_mae

array([-2.09036275, -2.6690198 , -3.38085149, -3.75727723, -3.04867327])

In [44]:
cv_mse = cross_val_score(clf,X,y,cv=5,scoring="neg_mean_squared_error")
cv_mse
np.mean(cv_mse)

-22.024957134439905

## **Using Scikit Learn evauation funciton on classification ** 🛺


In [54]:
import pandas as pd
import numpy as np


from sklearn.ensemble import RandomForestClassifier

np.random.seed(42)

heart_diseases = pd.read_csv("/content/drive/MyDrive/dataset/heart-disease.csv")
heart_diseases.head()

X = heart_diseases.drop("target",axis = 1)
y = heart_diseases["target"]

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train,y_train)
y_preds = clf.predict(X_test)

y_preds

array([0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0])

In [63]:
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score #import the parameters


In [56]:
accuracy_score(y_test,y_preds)

0.8524590163934426

In [59]:
precision_score(y_test,y_preds)

0.8484848484848485

In [61]:
recall_score(y_test,y_preds)

0.875

In [62]:
f1_score(y_test,y_preds)

0.8615384615384615

# ***Using Scikit Learn evauation funciton on Regression ***🛹🛹

In [65]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_boston
boston = load_boston()

boston_df = pd.DataFrame(boston["data"], columns = boston["feature_names"])
boston_df["target"] = pd.Series(boston["target"])
boston_df.head()

X = boston_df.drop("target" , axis=1)
y = boston_df["target"]
 
np.random.seed(42)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)

from sklearn.ensemble import RandomForestRegressor
clf = RandomForestRegressor()
clf.fit(X_train,y_train)
clf.score(X_test,y_test)

0.873969014117403

In [70]:
y_preds = clf.predict(X_test)
y_preds

array([23.002, 30.826, 16.734, 23.467, 16.853, 21.725, 19.232, 15.239,
       21.067, 20.738, 19.516, 19.83 ,  8.885, 21.918, 19.477, 26.465,
       19.347,  8.039, 45.414, 14.542, 24.564, 23.941, 14.481, 23.077,
       15.031, 14.625, 21.171, 14.164, 19.251, 20.717, 19.433, 23.242,
       31.091, 20.39 , 14.294, 15.796, 34.3  , 19.155, 20.639, 24.464,
       18.779, 29.688, 45.257, 19.449, 22.334, 13.727, 15.408, 24.621,
       18.783, 28.247, 21.411, 33.961, 17.011, 26.312, 44.904, 21.988,
       15.65 , 32.316, 22.281, 20.394, 25.405, 34.266, 28.938, 18.857,
       26.909, 17.154, 13.731, 23.079, 28.508, 15.818, 20.41 , 28.38 ,
       10.153, 21.336, 22.393,  7.093, 20.059, 45.424, 10.964, 12.914,
       21.387, 12.29 , 20.234,  9.065, 20.218, 26.736, 15.531, 23.228,
       23.568, 17.719, 21.64 ,  7.991, 19.6  , 18.7  , 22.292, 19.665,
       38.756, 13.03 , 12.49 , 13.406, 20.219, 23.898])

In [71]:
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score #import the parameters

In [72]:
r2_score(y_test,y_preds)

0.8739690141174031

In [73]:
mean_absolute_error(y_test,y_preds)

2.1226372549019623

In [74]:
mean_squared_error(y_test,y_preds)

9.242328990196082