# COMP534 Lab session 4
In this session, we would like to introduce some metrics for model evaluation firstly. Then we will show how to build neural networks (NNs) model based on [PyTorch](https://pytorch.org/). You also can use other frameworks, like [tensorflow](https://www.tensorflow.org/) or [keras](https://keras.io/). At last, we will show a classification example using Pytorch.


# Visualizations with Display Objects

In this example, we will construct display objects,
`ConfusionMatrixDisplay`, `RocCurveDisplay`, and
`PrecisionRecallDisplay` directly from their respective metrics. This
is an alternative to using their corresponding plot functions when
a model's predictions are already computed or expensive to compute. Note that
this is advanced usage, and in general we recommend using their respective
plot functions.


## Load Data and train model
For this example, we load a blood transfusion service center data set from
OpenML (<https://www.openml.org/d/1464). This is a binary classification
problem where the target is whether an individual donated blood. Then the
data is split into a train and test dataset and a logistic regression is
fitted with the train dataset.



In [None]:
%matplotlib inline
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X, y = fetch_openml(data_id=1464, return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

# StandardScale normalizes the features: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
clf = make_pipeline(StandardScaler(), LogisticRegression(random_state=0))
clf.fit(X_train, y_train)

### `ConfusionMatrixDisplay`
 With the fitted model, we compute the predictions of the model on the test
 dataset. These predictions are used to compute the confustion matrix which
 is plotted with the `ConfusionMatrixDisplay` method.



In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay

y_pred = clf.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

cm_display = ConfusionMatrixDisplay(cm).plot()

### `RocCurveDisplay`
 The ROC curve requires either the probabilities or the non-thresholded
 decision values from the estimator. Since the logistic regression provides
 a decision function, we will use it to plot the ROC curve:



In [None]:
from sklearn.metrics import roc_curve
from sklearn.metrics import RocCurveDisplay

y_score = clf.decision_function(X_test)  # return the probabilities

fpr, tpr, _ = roc_curve(y_test, y_score, pos_label=clf.classes_[1])
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr, estimator_name="Log Regression").plot()

### `PrecisionRecallDisplay`
 Similarly, the precision recall curve can be plotted using `y_score` from
 the prevision sections.



In [None]:
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import PrecisionRecallDisplay

prec, recall, _ = precision_recall_curve(y_test, y_score, pos_label=clf.classes_[1])
pr_display = PrecisionRecallDisplay(precision=prec, recall=recall).plot()

### Combining the display objects into a single plot
 The display objects store the computed values that were passed as arguments.
 This allows for the visualizations to be easliy combined using Matplotlib's
 API. In the following example, we place the displays next to each other in a
 row.



In [None]:
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))

roc_display.plot(ax=ax1)  # left: roc curve
pr_display.plot(ax=ax2)  # right: precision x recall
plt.show()

### F-1 Score

Another commonly used metric is the F-1 score, which is calculated using precision and recall. Below, we calculate the F1 score using the SKLearn library as well as using precision and recall according to the Equation:

$$F_1 = 2 * \frac{Precision * Recall}{Precision + Recall}$$

In [None]:
from sklearn.metrics import f1_score
f1_score(y_test, y_pred, pos_label= clf.classes_[1])

In [None]:
from sklearn.metrics import precision_score, recall_score

p = precision_score(y_test, y_pred, pos_label= clf.classes_[1])
r = recall_score(y_test, y_pred, pos_label= clf.classes_[1])

print(2 * p * r / (p + r))  # f1-score