<a href="https://colab.research.google.com/github/cagBRT/Data/blob/main/Imbalanced_Datasets_4c.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Precision-Recall Curves**


**Import libraries**

In [None]:
# example of a precision-recall curve for a predictive model
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot
from numpy import where

**Create a dataset**

In [None]:
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2) # fit a model

In [None]:
for class_value in range(2):
  # get row indexes for samples with this class
  row_ix = where(y == class_value)
    # create scatter of these samples
  pyplot.scatter(X[row_ix, 0], X[row_ix, 1])
  # show the plot
pyplot.show()

**Create and train a logistic regression model**

In [None]:
model = LogisticRegression(solver='lbfgs')
model.fit(trainX, trainy)
# predict probabilities
yhat = model.predict_proba(testX)

In [None]:
# retrieve just the probabilities for the positive class
pos_probs = yhat[:, 1]
# calculate the no skill line as the proportion of the positive class
no_skill = len(y[y==1]) / len(y)

In [None]:
# calculate model precision-recall curve
precision, recall, _ = precision_recall_curve(testy, pos_probs)

The Precision-Recall Curve for the Logistic Regression model is shown (orange).
A random or baseline classifier is shown as a horizontal line (blue with dashes).

A model with perfect skill is depicted as a point at a coordinate of (1,1). <br>

A skillful model is represented by a curve that bows towards a coordinate of (1,1). <br>

A no-skill classifier will be a horizontal line on the plot with a precision that is proportional to the number of positive examples in the dataset.<br>

For a balanced dataset this will be 0.5.<br>

**The focus of the PR curve on the minority class makes it an effective diagnostic for imbalanced binary classification models**

Precision-recall curves (PR curves) are recommended for highly skewed domains
where ROC curves may provide an excessively optimistic view of the performance

In [None]:
# plot the no skill precision-recall curve
pyplot.plot([0, 1], [no_skill, no_skill], linestyle='--', label='No Skill')
# plot the model precision-recall curve
pyplot.plot(recall, precision, marker='.', label='Logistic')
# axis labels
pyplot.xlabel('Recall')
pyplot.ylabel('Precision')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()