# Classification Metrics

In this notebook, we will explore various classification metrics that help us evaluate the performance of machine learning models, such as Confusion Matrix, ROC-AUC, PR Curve, and F1 Score.

## Confusion Matrix

A confusion matrix is a table used to describe the performance of a classification model. It shows the true positive, false positive, true negative, and false negative values.

### Step 1: Import Required Libraries for Classification Metrics

We will begin by importing necessary libraries such as `confusion_matrix`, `roc_curve`, `auc`, and `precision_recall_curve` from `sklearn`.

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, roc_curve, auc, precision_recall_curve, f1_score
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

### Step 2: Create a Sample Dataset

We will generate a synthetic dataset for binary classification.

In [ ]:
# Generating synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_classes=2, random_state=42)

### Step 3: Train a Classifier

Now, we will train a logistic regression model on the generated dataset.

In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)

### Step 4: Confusion Matrix

We will compute the confusion matrix to evaluate the model's performance.

In [ ]:
y_pred = model.predict(X_test)
conf_matrix = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(conf_matrix)

## ROC-AUC

The ROC (Receiver Operating Characteristic) curve illustrates the trade-off between the true positive rate (sensitivity) and false positive rate (1-specificity) at various thresholds. The ROC-AUC score is the area under this curve.

### Step 5: ROC-AUC

We will now calculate and plot the ROC curve, as well as compute the AUC score.

In [ ]:
fpr, tpr, thresholds = roc_curve(y_test, model.predict_proba(X_test)[:, 1])
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, color='blue', label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='gray', linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

## Precision-Recall Curve (PR Curve)

The Precision-Recall curve is used to evaluate models in the case of imbalanced classes. It shows the trade-off between precision and recall for different thresholds.

### Step 6: PR Curve

We will now calculate and plot the Precision-Recall curve.

In [ ]:
precision, recall, thresholds_pr = precision_recall_curve(y_test, model.predict_proba(X_test)[:, 1])
plt.plot(recall, precision, color='green')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()

## F1 Score

The F1 score is the harmonic mean of precision and recall. It provides a balance between these two metrics and is useful when we have imbalanced datasets.

### Step 7: F1 Score

We will now calculate the F1 score for the model.

In [ ]:
f1 = f1_score(y_test, y_pred)
print(f'F1 Score: {f1:.2f}')

## Conclusion

In this notebook, we have explored various classification metrics, including Confusion Matrix, ROC-AUC, PR Curve, and F1 Score. These metrics provide valuable insights into the performance of classification models.