#### About

> Model Evaluation metrics

Model evaluation metrics are used to assess the performance and effectiveness of supervised machine learning models. These metrics provide quantitative measures to evaluate how well a trained model is performing in terms of its predictions or classifications. Following are the commonly used model evaluation metrics

1. Accuracy: Accuracy measures the proportion of correctly predicted instances out of the total instances. It is often used as a basic metric for classification problems when classes are balanced.


In [1]:
from sklearn.metrics import accuracy_score


In [2]:
y_true = [1, 0, 1, 1, 0, 1, 0, 1]
y_pred = [1, 0, 1, 0, 1, 1, 0, 0]
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.625


2. Precision: Precision measures the proportion of true positive predictions out of the total positive predictions. It is often used in binary classification problems when the focus is on minimizing false positives.


In [4]:
from sklearn.metrics import precision_score

y_true = [1, 0, 1, 1, 0, 1, 0, 1]
y_pred = [1, 0, 1, 0, 1, 1, 0, 0]
precision = precision_score(y_true, y_pred)
print("Precision:", precision)

Precision: 0.75


3. Recall (Sensitivity or True Positive Rate): Recall measures the proportion of true positive predictions out of the total actual positive instances. It is often used in binary classification problems when the focus is on minimizing false negatives.


In [5]:
from sklearn.metrics import recall_score

y_true = [1, 0, 1, 1, 0, 1, 0, 1]
y_pred = [1, 0, 1, 0, 1, 1, 0, 0]
recall = recall_score(y_true, y_pred)
print("Recall:", recall)

Recall: 0.6


4. F1 Score: The F1 score is the harmonic mean of precision and recall, and provides a balanced measure of both precision and recall.


In [6]:
from sklearn.metrics import f1_score

y_true = [1, 0, 1, 1, 0, 1, 0, 1]
y_pred = [1, 0, 1, 0, 1, 1, 0, 0]
f1 = f1_score(y_true, y_pred)
print("F1 Score:", f1)

F1 Score: 0.6666666666666665


5. Specificity (True Negative Rate): Specificity measures the proportion of true negative predictions out of the total actual negative instances. It is often used in binary classification problems when the focus is on minimizing false positives.



6. Area Under the Receiver Operating Characteristic (ROC) Curve: The ROC curve is a graphical representation of the true positive rate (TPR) against the false positive rate (FPR) at different classification thresholds. 

The Area Under the ROC Curve (AUC-ROC) is a common metric used to assess the performance of binary classification models, where a higher AUC-ROC value indicates better model performance.



In [9]:
from sklearn.metrics import roc_auc_score

y_true = [1, 0, 1, 1, 0, 1, 0, 1]
y_pred_probs = [0.9, 0.3, 0.8, 0.7, 0.1, 0.6, 0.2, 0.8]
roc_auc = roc_auc_score(y_true,y_pred_probs)
print("AUC-ROC:", roc_auc)




AUC-ROC: 1.0


7. Confusion Matrix: A confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known. It provides a breakdown of the number of true positive, true negative, false positive, and false negative predictions made by the model.


In [10]:
from sklearn.metrics import confusion_matrix

y_true = [1, 0, 1, 1, 0, 1, 0, 1]
y_pred = [1, 0, 1, 0, 1, 1, 0, 0]
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(cm)

Confusion Matrix:
[[2 1]
 [2 3]]


> Use cases for model evaluation metrics in supervised learning:

1. Classification problems: Metrics such as accuracy, precision, recall, F1 score, and AUC-ROC are commonly used to evaluate the performance of classification models in binary or multi-class classification problems.

2. Imbalanced datasets: In cases where the dataset is imbalanced, i.e., one class is significantly more dominant than the other(s), metrics such as precision, recall, and specificity may be more relevant, as accuracy may not accurately reflect the model's performance.

3. Threshold tuning: Some classification models, such as logistic regression and support vector machines, use a threshold to make predictions. The threshold can be tuned to achieve a trade-off between precision and recall based on the problem requirements, and the model evaluation metrics can help in selecting the optimal threshold.

4. Model comparison: Model evaluation metrics can be used to compare the performance of different supervised learning models and select the best-performing model for a specific problem.

5. Model monitoring: Model evaluation metrics can be used to monitor the performance of a trained model in production, and trigger alerts or actions based on the performance thresholds defined for the problem at hand.






