# Machine Learning Model Evaluation Metrics Part I. Classification

## Reference Notebook
<a href='https://www.kaggle.com/ishivinal/machine-learning-model-evaluation-metrics'>⚖️ Machine Learning Model Evaluation Metrics</a>

<img src="https://image.freepik.com/free-vector/site-stats-concept-illustration_114360-1434.jpg" width=300>

> Is it all over when machine learning modeling is done?

* I think <code>evaluation</code> is as important as modeling in machine learning.
* How can I explain if my model is really good or not?
* How can I present my model at an important presentation?
* In this notebook, I am going to dig into how to *evaluate machine learning models*.

[[](http://)](http://)<h3 style="color:green">If you think this notebook is helpful, upvotes would be greatly appreciated :-) </h3>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Import Library

In [None]:
import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
    
from sklearn.model_selection import train_test_split

# Pipeline library for Training
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_absolute_error


Ok, Load Data & preprocess it<br>
<br>
*(for classification, we gonna use Titanic data)*
<br>

one more, we use Pipelines. If you want to know more about pipeline. pleas check the <a href='https://www.kaggle.com/leeyj0511/for-starter-top-30-machine-learning-pipelines'>reference</a>.

# Data Preprocessing (using Pipelines)

In [None]:
# Road the data
X = pd.read_csv("../input/titanic/train.csv")
X_test = pd.read_csv("../input/titanic/test.csv")
print(X.shape, X_test.shape)


# Remove rows with missing target, seperate target from predictors
X.dropna(axis=0, subset=['Survived'], inplace=True)
y = X.Survived
X.drop(['Survived'], axis=1, inplace=True)

# "Cardinality" means the number of unique values in a column
# Select categorical columns with relatively low cardinality (convenient but arbitrary)
categorical_cols = [cname for cname in X.columns if X[cname].nunique() < 10 and X[cname].dtype == 'object']

# Select numerical columns
numerical_cols = [cname for cname in X.columns if X[cname].dtype in ['float64', 'int64']]

# keep selected columns only
my_cols = categorical_cols + numerical_cols
X = X[my_cols].copy()
X_test = X[my_cols].copy()

In [None]:
X_ = pd.read_csv("../input/titanic/train.csv")

In [None]:
f, ax = plt.subplots(1, 2, figsize=(12, 4))

X_['Survived'].value_counts().plot.pie(autopct='%1.1f%%', ax=ax[0])

sns.countplot('Survived', data=X_, ax=ax[1])

## Pipelines for training

* numerical_transformer/categorical_transformer
* preprocessor(ColumnTransformer)
* define model
* Bundle preprocessing and modeling
* Preprocessing of training data, fit model
* Preprocessing of validation data, get predictions

In [None]:
# Preprocessing for numerical data
numerical_transformer = SimpleImputer(strategy='constant')

# Preprocessing for categorical data
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])
# Preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
        transformers =[
            ('num', numerical_transformer, numerical_cols),
            ('cat', categorical_transformer, categorical_cols)
        ])

# Modeling & Evaluate Score



In [None]:
rf_clf = RandomForestClassifier()

# Bundle preprocessing and modeling code in a pipeline
clf = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', rf_clf)
])

print(cross_val_score(clf, X, y, cv=10).mean())

# Model Evaluation
## 1. Confusion Matrix
A <code>confusion</code> matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known

<img src="https://i.ytimg.com/vi/AOIkPnKu0YA/maxresdefault.jpg" width=600 />

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [None]:
# train_test_split: 80%, 20%
X_train, X_val, y_train, y_val = train_test_split(X, y, train_size=0.8, test_size=0.2)

# modeling
rf_clf = RandomForestClassifier()

# Bundle preprocessing and modeling code in a pipeline
clf = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', rf_clf)
])

# train
clf.fit(X_train, y_train)
preds = clf.predict(X_val)


# confusion_matrix
cm = confusion_matrix(y_val, preds)
sns.heatmap(cm, annot=True, fmt="d")

### Accuracy
<code>Accuracy</code> in classification problems is the number of correct predictions made by the model over all kinds predictions made

<img src="https://cdn-images-1.medium.com/max/1600/1*5XuZ_86Rfce3qyLt7XMlhw.png" width=600 >

In [None]:
clf.score(X_val, y_val)  # Return the mean accuracy on the given test data and labels

In [None]:
accuracy_score(y_val, preds)

✔️ When to use Accuracy? (Important!)<br>

<code>Accuracy</code> is good measure when the target variable class in the data are nearly balanced. example Survived(60%-yes, 40% no)

### Precision

<code>Precision</code> is defined as the number of true positives divided by the number of true positives plus the number of false positives. Precision is about being precise

<img src="https://cdn-images-1.medium.com/max/640/1*KhlD7Js9leo0B0zfsIfAIA.png" width=600>

### Recall

When it is actually the positive result, how often does it predict correcly

<img src="https://cdn-images-1.medium.com/max/640/1*a8hkMGVHg3fl4kDmSIDY_A.png" width=600>

### F1-Score

<code>F1 score</code> is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. <br>
Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, <code>especially if you have an uneven class distribution</code>

<img src="https://cdn-images-1.medium.com/max/1600/1*UJxVqLnbSj42eRhasKeLOA.png">

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_val, preds))

### Auc - Roc curve

<code>AUC-ROC</code> curve is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability. <br>
It tells how much model is capable of distinguishing between classes.<br>
Higher the AUC, better the model is at predicting 0s and 1s as 1as.<br>
By analogy, Higher the AUC, better the model is at distinguishing between survived and not

The ROC curve is plotted with TPR against the FPR .

<img src="https://cdn-images-1.medium.com/max/1600/1*pk05QGzoWhCgRiiFbz-oKQ.png">

ROC curves are frequently used to show in a graphical way the connection/trade-off between clinical sensitivity and specificity for every possible cut-off for a test or a combination of tests.

In [None]:
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
fpr, tpr, thresholds = roc_curve(y_val,preds)

plt.plot(fpr, tpr, label='ROC curve')
plt.plot([0, 1], [0, 1], 'k--', label='Random guess')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.xlim([-0.02, 1])
plt.ylim([0, 1.02])
plt.legend(loc="lower right")

### Logistic loss

<code>Log loss</code>, aka logistic loss or cross-entropy loss.

This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of the true labels given a probabilistic classifier’s predictions.

<img src="https://cdn-images-1.medium.com/max/1600/0*2ekvLNkZ0_cKcPtv">

In [None]:
from sklearn.metrics import log_loss
log_loss(y_val, preds)

# Summary

## TL;DR
* <code>Accuracy</code>: is good measure when the target variable class in the data are nearly balanced. example Survived(60%-yes, 40% no) 
* <code>f1-Score</code>: is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account.<br>
    * *if you have an <code>uneven class distribution</code>? than use f1-score*
* <code>AUC-ROC curve</code> is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability.
* <code>Log loss</code>, aka logistic loss or cross-entropy loss. when evaluate neural network

## Thank you! :)
I hope this will help you :)
[[](http://)](http://)<h3 style="color:green">If you think this notebook is helpful, upvotes would be greatly appreciated :-) </h3>