<a href="https://colab.research.google.com/github/cagBRT/Confusion-matrix/blob/master/confusionMatrix7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Precision and Recall Curves**

Reviewing both precision and recall is useful when there is imbalance in the observations between the two classes. <br>
Meaning there are many examples of no event (class 0) and only a few examples of an event (class 1)

The area under the precision-recall curve can be approximated by:<br>
>calling the auc() function and passing it the recall (x) and precision (y) values calculated for each threshold.

**Precision** 
Precision is a ratio of the number of true positives divided by the sum of the true positives and false positives. <br>

**It describes how good a model is at predicting the positive class**. <br>
Precision is referred to as the positive predictive value.

**Recall**<br>
Recall is calculated as the ratio of the number of true positives divided by the sum of the true positives and the false negatives. <br>

**Recall is the same as sensitivity.**<br>

**Recall is how many of the true positives were recalled (found)**

In [None]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import f1_score
from sklearn.metrics import auc
from matplotlib import pyplot

In [None]:
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1,weights=[0.5,0.5])
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)

In [None]:
# fit a model
model = LogisticRegression(solver='lbfgs')
model.fit(trainX, trainy)

In [None]:
# predict probabilities
lr_probs = model.predict_proba(testX)
# keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]
# predict class values
yhat = model.predict(testX)
lr_precision, lr_recall, _ = precision_recall_curve(testy, lr_probs)
lr_f1, lr_auc = f1_score(testy, yhat), auc(lr_recall, lr_precision)
# summarize scores
print('Logistic: f1=%.3f auc=%.3f' % (lr_f1, lr_auc))

A model with perfect skill is depicted as a point at (1,1). <br>

A skilful model is represented by a curve that bows towards (1,1) above the flat line of no skill.

In [None]:
# plot the precision-recall curves
no_skill = len(testy[testy==1]) / len(testy)
pyplot.plot([0, 1], [no_skill, no_skill], linestyle='--', label='No Skill')
pyplot.plot(lr_recall, lr_precision, marker='.', label='Logistic')
# axis labels
pyplot.xlabel('Recall')
pyplot.ylabel('Precision')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()

# **When to Use ROC vs. Precision-Recall Curves**
The use of ROC curves and precision-recall curves are as follows:

**ROC curves should be used when there are roughly equal numbers of observations for each class.** ROC curves present an optimistic picture of the model on datasets with a class imbalance.

**Precision-Recall curves should be used when there is a moderate to large class imbalance**

### **Assignment:**<br>

Modify the two classes in our dataset; set the weights to different values and see what happens to the curve