# Visualization & Metrics

## Prerequisites

`Plotly` is used for all charts to provide an interactive experience: hover, toggle, and zoom. Reference the [Installation](installation.html#Plotting) notebook for information about configuring Plotly. However, static images are used in this notebook due to recent difficulty with 3rd party JS on the documentation portal.

## Overview

As described in the [Low-Level Docs](api_low_level.html#10.-Assess-the-Results.), the `Predictor` and `Prediction` of each training `Job` are automatically populated with metrics for each split/ fold of samples evaluated against the model. The `Algorithm.analysis_type` determines which metrics and plots are prepared:

* Although `'classification_multi'` and `'classification_binary'` share the same metrics and plots, they go about producing these artifacts differently. For example, ROC curves `roc_multi_class=None` vs `roc_multi_class='ovr'`.

* `'regression'`, unlike the classification analyses, does not have an 'accuracy' metric, so we substitute 'r2', R^2 (coefficient of determination, for it. There are no regression-specific plots.

---

We'll use the `datum` and `tests` modules to rapidly create some examples.

In [4]:
import aiqc
from aiqc import datum
from aiqc import tests

---

## Classification

In [5]:
%%capture
queue_multiclass = tests.make_test_queue('keras_multiclass')

In [6]:
queue_multiclass.run_jobs()

ðŸ”® Training Models ðŸ”®: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 8/8 [00:28<00:00,  3.54s/it]


### Individual Job Metrics

Each training `Prediction` contains the following metrics.

In [10]:
from pprint import pprint as p
p(
    queue_multiclass.jobs[0].predictors[0].predictions[0].metrics
)

{'test': {'accuracy': 0.9629629629629629,
          'f1': 0.9628482972136223,
          'loss': 0.17958050966262817,
          'precision': 0.9666666666666667,
          'recall': 0.9629629629629629,
          'roc_auc': 0.9876543209876543},
 'train': {'accuracy': 0.9705882352941176,
           'f1': 0.9705308775731311,
           'loss': 0.13190817832946777,
           'precision': 0.9729729729729729,
           'recall': 0.9705882352941176,
           'roc_auc': 0.9985582468281432},
 'validation': {'accuracy': 1.0,
                'f1': 1.0,
                'loss': 0.15124531090259552,
                'precision': 1.0,
                'recall': 1.0,
                'roc_auc': 1.0}}


It also contains per-epoch `History` metrics calculated during model training.

In [11]:
queue_multiclass.jobs[0].predictors[0].history.keys()

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

### Aggregate Queue Metrics

* `selected_metrics:list` - If you get overwhelmed by the variety of metrics returned, then you can include the ones you want selectively by name.
* `sort_by:str` - You can sort the dataframe by any column name.
* `ascending:bool=False` - Descending if False.

In [9]:
queue_multiclass.metrics_to_pandas(
    selected_metrics=None
    , sort_by=None
	, ascending=False
)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,split,accuracy,f1,loss,precision,recall,roc_auc
23,8,8,8,train,0.970588,0.970531,0.094776,0.972973,0.970588,0.999135
22,8,8,8,validation,1.0,1.0,0.05943,1.0,1.0,1.0
21,8,8,8,test,0.888889,0.885714,0.158564,0.916667,0.888889,1.0
20,7,7,7,train,0.941176,0.940715,0.140597,0.95,0.941176,0.998414
19,7,7,7,validation,1.0,1.0,0.090078,1.0,1.0,1.0
18,7,7,7,test,0.888889,0.885714,0.301731,0.916667,0.888889,1.0
17,6,6,6,train,0.960784,0.960648,0.100677,0.964912,0.960784,0.999135
16,6,6,6,validation,1.0,1.0,0.101051,1.0,1.0,1.0
15,6,6,6,test,0.888889,0.885714,0.205332,0.916667,0.888889,0.995885
13,5,5,5,validation,0.904762,0.902778,0.190586,0.925926,0.904762,1.0


### Aggregate Queue Visualization

`plot_performance` aka the "boomerang chart" is unique to AIQC, and it really brings the benefits of the library to light. Each model from the Queue is evaluated against all splits/ folds.

When performing classification, the secondary training metric (non-loss) is 'accuracy'.

In [None]:
queue_multiclass.plot_performance(
    max_loss = 1.5, min_accuracy = 0.70
)

![Classify Boomerang](../images/plot_classify_boomerang.png)

### Individual Job Visualization

Loss values in the first few epochs can often be extremely high before they plummet and become more gradual. This really stretches out the graph and makes it hard to see if the evaluation set is diverging or not. The `loss_skip_15pct:bool` parameter skips displaying the first 15% of epochs so that figure is more useful.

In [None]:
queue_multiclass.jobs[0].predictors[0].plot_learning_curve(loss_skip_15pct=True)

![Classify Learn](../images/plot_classify_learn.png)

These classification metrics are preformatted for plotting.

In [13]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_data['test'].keys()

dict_keys(['confusion_matrix', 'roc_curve', 'precision_recall_curve'])

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_roc_curve()

![Classify ROC](../images/plot_roc.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_confusion_matrix()

![Plot Confusion](../images/plot_confusion_matrix.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_precision_recall()

![Precision Recall](../images/plot_precision_recall.png)

---

## Regression

In [7]:
%%capture
queue_regression = tests.make_test_queue('keras_regression')

In [8]:
queue_regression.run_jobs()

ðŸ”® Training Models ðŸ”®: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 4/4 [00:32<00:00,  8.03s/it]


### Individual Job Metrics

Each training `Prediction` contains the following metrics.

In [12]:
from pprint import pprint as p
p(
    queue_regression.jobs[0].predictors[0].predictions[0].metrics
)

{'test': {'explained_variance': 0.7894243409529155,
          'loss': 0.3433103561401367,
          'mse': 0.21202397973530798,
          'r2': 0.7865474072073684},
 'train': {'explained_variance': 0.7772334144881416,
           'loss': 0.3257257640361786,
           'mse': 0.22540750832130774,
           'r2': 0.7745924916786923},
 'validation': {'explained_variance': 0.70325979950094,
                'loss': 0.40005412697792053,
                'mse': 0.38639962177247694,
                'r2': 0.6802937337531519}}


It also contains per-epoch metrics calculated during model training.

In [13]:
queue_regression.jobs[0].predictors[0].history.keys()

dict_keys(['loss', 'mean_squared_error', 'val_loss', 'val_mean_squared_error'])

### Aggregate Queue Metrics

In [14]:
queue_regression.metrics_to_pandas()

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,split,explained_variance,loss,mse,r2
9,12,12,12,test,0.809494,0.423333,0.289514,0.708535
10,12,12,12,validation,0.722865,0.480268,0.521585,0.568441
11,12,12,12,train,0.795168,0.396511,0.321744,0.678256
6,11,11,11,test,0.779358,0.410134,0.274249,0.723904
7,11,11,11,validation,0.717331,0.432595,0.442124,0.634188
8,11,11,11,train,0.753201,0.379794,0.297251,0.702749
3,10,10,10,test,0.805824,0.353231,0.213114,0.78545
4,10,10,10,validation,0.733818,0.387297,0.368265,0.695298
5,10,10,10,train,0.784331,0.33236,0.234408,0.765592
0,9,9,9,test,0.789424,0.34331,0.212024,0.786547


### Aggregate Queue Visualization

When performing regression, the secondary training metric (non-loss) is 'r2'.

In [None]:
queue_regression.plot_performance(
    max_loss=1.5, min_r2=0.65
)

![Regression Boomerang](../images/plot_regression_boomerang.png)

### Individual Job Visualization

In [None]:
queue_regression.jobs[0].predictors[0].plot_learning_curve(loss_skip_15pct=True)

![Regression Learn](../images/plot_regression_learn.png)