<br>
<br>

# <font color=maroon>**Performance Measures**</font>: sklearn.**metrics** & sklearn.**model_selection**

<br>
<br>

## <font color=maroon>Accuracy</font>: using Cross-Validation

### cross_val_score()

```python
from sklearn.model_selection import cross_val_score

cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")

```

<br>



### StratifiedKFold() & clone()

```python
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone

skfolds = StratifiedKFold(n_splits=3) 
# add shuffle=True if the dataset is not already shuffled

for train_index, test_index in skfolds.split(X_train, y_train_5):
    clone_clf = clone(sgd_clf)
    X_train_folds = X_train[train_index]
    y_train_folds = y_train_5[train_index]
    X_test_fold = X_train[test_index]
    y_test_fold = y_train_5[test_index]
    
    clone_clf.fit(X_train_folds, y_train_folds)
    y_pred = clone_clf.predict(X_test_fold)
    n_correct = sum(y_pred == y_test_fold)
    print(n_correct / len(y_pred))         # prints 0.95035, 0.96035, and 0.9604

```

<br>
<br>

## <font color=maroon>confusion_matrix</font>

### cross_val_predict() & confusion_matrix()

```python
from sklearn.model_selection import cross_val_predict

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_train_5, y_train_pred)

```

<br>
<br>

## <font color=maroon>Precision</font> and <font color=maroon>Recall</font>

### precision_score() & recall_score()

```python
from sklearn.metrics import precision_score, recall_score

precision_score(y_train_5, y_train_pred) # == 3530 / (687 + 3530)

recall_score(y_train_5, y_train_pred)    # == 3530 / (1891 + 3530)
```

<br>
<br>

## <font color=maroon>F<sub>1</sub> score</font>

### f1_score()

The **F1 score** is the `harmonic mean` (调和平均数) of precision and recall. Whereas the `regular mean` treats all values equally, the harmonic mean gives much more weight to low values. As a result, the
classifier will only get a high F1 score if both recall and precision are high.
```python
from sklearn.metrics import f1_score

f1_score(y_train_5, y_train_pred)

```

<br>


## <font color=magenta>The Precision/Recall Trade-off</font>

The F1 score favors classifiers that have similar precision and recall. This is not always what you want: in some contexts you mostly care about precision, and in other contexts you really care about recall.

Unfortunately, you can’t have it both ways: increasing precision reduces recall, and vice versa. This is called the <font color=maroon>**precision/recall trade-off**</font>.


<br>

### decision_function()

Scikit-Learn does not let you set the threshold directly, but it does give you access to the decision scores that it uses to make predictions. Instead of calling the classifier’s `predict()` method, you can call its **`decision_function()`** method, which returns a score for each instance,
and then use any **threshold** you want to make predictions based on those scores:

<font color=maroon size=5>Lowering the threshold increases recall and reduces precision.</font>

```python
y_scores = sgd_clf.decision_function([some_digit])  # array([2164.22030239])    
threshold = 0
y_some_digit_pred = (y_scores > threshold)          # array([ True])


threshold = 3000
y_some_digit_pred = (y_scores > threshold)
y_some_digit_pred                                   # array([False])
# This confirms that raising the threshold decreases recall.
```

<br>
<br>

### <font color=magenta>How do you decide which **threshold** to use?</font>


#### precision_recall_curve()

First, use the **cross_val_predict()** function to get the scores of all instances in the
training set, but this time specify that you want to return `decision scores` instead of `predictions`:

```python
y_scores = cross_val_predict(sgd_clf, 
                             X_train, y_train_5, 
                             cv=3, 
                             method="decision_function")
```

> The **RandomForestClassifier** class does not have a `decision_function()` method, due to the way it works (we will cover this in Chapter 7). Luckily, it has a `predict_proba()` method that returns class probabilities for each instance, and we can just use the probability of the positive class as a score, so it will work fine. 
> ```python
y_probas_forest = cross_val_predict(forest_clf, 
                                    X_train, y_train_5, 
                                    cv=3,
                                    method="predict_proba")```


<br>

With these scores, use the **precision_recall_curve()** function to compute precision and recall for all possible thresholds (the function adds a last precision of 0 and a last recall of 1, corresponding to an infinite threshold):

```python
from sklearn.metrics import precision_recall_curve

precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)

plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
plt.vlines(threshold, 0, 1.0, "k", "dotted", label="threshold")
[...] # beautify the figure: add grid, legend, axis, labels, and circles
plt.show()
```

<br>
<br>


#### Plot precision directly against recall

Another way to select a good precision/recall trade-off is to plot precision
directly against recall.

```python
plt.plot(recalls, precisions, linewidth=2, label="Precision/Recall curve")
[...] # beautify the figure: add labels, grid, legend, arrow, and text
plt.show()


idx_for_90_precision = (precisions >= 0.90).argmax()
threshold_for_90_precision = thresholds[idx_for_90_precision]
threshold_for_90_precision


y_train_pred_90 = (y_scores >= threshold_for_90_precision)
```

<br>
<br>

## <font color=maroon>ROC Curve</font>

The **receiver operating characteristic (ROC) curve** is another common tool
used with `binary classifiers`. It is very similar to the `precision/recall curve`,
but instead of <font color=maroon>plotting **precision** versus **recall**</font>, the `ROC curve` <font color=maroon>plots the **true positive rate** (another name for recall) against the **false positive rate** (FPR)</font>.

......

Hence, the ROC curve plots `sensitivity (recall)` versus `1 – specificity`.

<br>

### roc_curve()

To plot the `ROC curve`, you first use the **roc_curve()** function to <font color=maroon>compute the `TPR` and `FPR` for various threshold values</font>:

```python
from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)

# Since thresholds are listed in decreasing order in this case, 
# we use <= instead of >= on the first line:
idx_for_threshold_at_90 = (thresholds <= threshold_for_90_precision).argmax()
tpr_90, fpr_90 = tpr[idx_for_threshold_at_90], fpr[idx_for_threshold_at_90]

plt.plot(fpr, tpr, linewidth=2, label="ROC curve")
plt.plot([0, 1], [0, 1], 'k:', label="Random classifier's ROC curve")
plt.plot([fpr_90], [tpr_90], "ko", label="Threshold for 90% precision")
[...] # beautify the figure: add labels, grid, legend, arrow, and text
plt.show()
```


Once again there is a trade-off: the higher the **recall (TPR)**, the more **false positives (FPR)** the classifier produces. The dotted line represents the ROC curve of a purely random classifier; a good classifier stays as far away from that line as possible (toward the **top-left corner**).

### ROC <font color=maroon>AUC</font>: roc_auc_score()

One way to compare classifiers is to measure the **area under the curve
(AUC)**. A perfect classifier will have a `ROC AUC equal to 1`, whereas a
purely random classifier will have a ROC AUC equal to 0.5. Scikit-Learn
provides a function to estimate the ROC AUC:

from sklearn.metrics import roc_auc_score

roc_auc_score(y_train_5, y_scores)

<br>
<br>
<br>


# sklearn.**dummy**

```python
from sklearn.dummy import DummyClassifier

dummy_clf = DummyClassifier()
dummy_clf.fit(X_train, y_train_5)
print(any(dummy_clf.predict(X_train)))  # prints False: no 5s detected
```

<br>

In [1]:
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, log_loss

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, \
                             AdaBoostClassifier, \
                             GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, \
                                          QuadraticDiscriminantAnalysis





classifiers = [LogisticRegression(),
               KNeighborsClassifier(3),
               SVC(probability=True),
               DecisionTreeClassifier(),
               RandomForestClassifier(),
               AdaBoostClassifier(),
               GradientBoostingClassifier(),
               GaussianNB(),
               LinearDiscriminantAnalysis(),
               QuadraticDiscriminantAnalysis(),
               ]


for clf in classifiers:
    name = clf.__class__.__name__
    print(name)

LogisticRegression
KNeighborsClassifier
SVC
DecisionTreeClassifier
RandomForestClassifier
AdaBoostClassifier
GradientBoostingClassifier
GaussianNB
LinearDiscriminantAnalysis
QuadraticDiscriminantAnalysis
