# Evaluating a Machine learning model(score)

### Three ways to evaluate Scikit-Learn models/estimators:
1. Estimator `score` method.
2. The `scoring` parameter.
3. Problem-specific metric functions

## 4.1 Evaluating a model with the `score` model

In [1]:
# Standard imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
heart_disease = pd.read_csv('~/sample_project/Data/heart-disease.csv')

In [8]:
heart_disease

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
298,57,0,0,140,241,0,1,123,1,0.2,1,0,3,0
299,45,1,3,110,264,0,1,132,0,1.2,1,0,3,0
300,68,1,0,144,193,1,1,141,0,3.4,1,2,3,0
301,57,1,0,130,131,0,1,115,1,1.2,1,1,3,0


In [4]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

np.random.seed(42)

X = heart_disease.drop('target', axis=1)
y = heart_disease['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.2)

clf = RandomForestClassifier()

clf.fit(X_train, y_train)



RandomForestClassifier()

In [6]:
clf.score(X_train, y_train)

1.0

In [7]:
clf.score(X_test, y_test)

0.831275720164609

### The same but for regression...

In [9]:
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor


boston = load_boston()

boston_df = pd.DataFrame(boston['data'], columns=boston['feature_names'])
boston_df['target'] = pd.Series(boston['target'])



np.random.seed(42)

# Create the data

X = boston_df.drop('target', axis=1)
y = boston_df['target']

# Split into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


# Instatiate and fit model

model = RandomForestRegressor().fit(X_train, y_train)

# Make predictions

y_preds = model.predict(X_test)
y_preds[:10]

array([23.081, 30.574, 16.759, 23.46 , 16.893, 21.644, 19.113, 15.334,
       21.14 , 20.639])

In [10]:
model.score(X_test, y_test)

0.8654448653350507

## 4.2 Evaluating the model using the `scoring` parameter

In [11]:
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

np.random.seed(42)

X = heart_disease.drop('target', axis=1)
y = heart_disease['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.2)

clf = RandomForestClassifier()

clf.fit(X_train, y_train);

In [12]:
clf.score(X_test, y_test)

0.831275720164609

In [20]:
np.random.seed(42)
cross_val_score(clf, X, y)

array([0.81967213, 0.90163934, 0.83606557, 0.78333333, 0.78333333])

In [15]:
cross_val_score(clf, X, y, cv=10)

array([0.90322581, 0.80645161, 0.87096774, 0.9       , 0.86666667,
       0.8       , 0.73333333, 0.86666667, 0.73333333, 0.8       ])

In [17]:
np.random.seed(42)

# Single training and test split score.
clf_single_score = clf.score(X_test, y_test)

# Take the mean of 5-fold cross-validation score
clf_cross_val_score = np.mean(cross_val_score(clf, X, y))

# Compare the two.
clf_single_score, clf_cross_val_score

(0.831275720164609, 0.8248087431693989)

In [None]:
# Default scoring parameter of classifier = mean accuracy
clf.score()

In [19]:
# Scoring parameter set to None
np.random.seed(42)
cross_val_score(clf, X, y, scoring=None)

array([0.81967213, 0.90163934, 0.83606557, 0.78333333, 0.78333333])