# Visualization & Metrics

---

## Prerequisites

`Plotly` is used for interactive charts (hover, toggle, zoom). Reference the [Installation](installation.html#Plotting) section for information about configuring Plotly. However, static images are used in this notebook due to lack of support for 3rd party JS in the documentation portal.

## Overview

As described in the [Low-Level Docs](api_low_level.html#10.-Assess-the-Results.), every training `Job` automatically generates metrics when evaluated against each split/ fold. 

The `Algorithm.analysis_type` determines which metrics and plots are prepared:

* Although `'classification_multi'` and `'classification_binary'` share the same metrics and plots, they go about producing these artifacts differently. For example, ROC curves `roc_multi_class=None` vs `roc_multi_class='ovr'`.

* `'regression'`, unlike the classification analyses, does not have an 'accuracy' metric, so we substitute 'r2', R^2 (coefficient of determination, for it. There are no regression-specific plots. Note that unsupervised/ self-supervised models are also considered a regression.

---

We'll use the `datum` and `tests` modules to rapidly generate a couple examples.

In [2]:
from aiqc import datum
from aiqc import tests

---

## Classification

Let's quickly generate a trained classification model to inspect.

In [3]:
%%capture
queue_multiclass = tests.make_test_queue('keras_multiclass')
queue_multiclass.run_jobs()

### Queue Visualization

`plot_performance` aka the "boomerang chart" is unique to AIQC, and it really brings the benefits of the library to light. Each model from the Queue is evaluated against all splits/ folds.

When performing classification, the secondary training metric (non-loss) is 'accuracy'.

In [None]:
queue_multiclass.plot_performance(
    max_loss = 1.5, min_accuracy = 0.70
)

![Classify Boomerang](../images/plot_classify_boomerang.png)

### Queue Metrics

* `selected_metrics:list` - If you get overwhelmed by the variety of metrics returned, then you can include the ones you want selectively by name.
* `sort_by:str` - You can sort the dataframe by any column name.
* `ascending:bool=False` - Descending if False.

In [5]:
queue_multiclass.metrics_to_pandas(
    selected_metrics=None
    , sort_by='predictor_id'
	, ascending=True
).head(6)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,split,accuracy,f1,loss,precision,recall,roc_auc
0,10,10,18,train,0.921569,0.920468,0.219878,0.936508,0.921569,0.996828
1,10,10,18,validation,0.857143,0.855769,0.32901,0.87037,0.857143,0.972789
2,10,10,18,test,0.962963,0.962848,0.203578,0.966667,0.962963,1.0


These are also aggregated by metric across all splits/folds.

In [6]:
queue_multiclass.metrics_aggregate_to_pandas(
    selected_metrics=None
    , selected_stats=None
    , sort_by='predictor_id'
    , ascending=True
).head(12)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,metric,maximum,minimum,pstdev,median,mean
0,10,10,19,accuracy,0.962963,0.857143,0.043541,0.921569,0.913891
1,10,10,19,f1,0.962848,0.855769,0.04403,0.920468,0.913028
2,10,10,19,loss,0.32901,0.203578,0.055686,0.219878,0.250822
3,10,10,19,precision,0.966667,0.87037,0.040217,0.936508,0.924515
4,10,10,19,recall,0.962963,0.857143,0.043541,0.921569,0.913891
5,10,10,19,roc_auc,1.0,0.972789,0.012149,0.996828,0.989872


### Job Visualization

Loss values in the first few epochs can often be extremely high before they plummet and become more gradual. This really stretches out the graph and makes it hard to see if the evaluation set is diverging or not. The `loss_skip_15pct:bool` parameter skips displaying the first 15% of epochs so that figure is more useful.

In [None]:
queue_multiclass.jobs[0].predictors[0].plot_learning_curve(loss_skip_15pct=True)

![Classify Learn](../images/plot_classify_learn.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_feature_importance(top_n=4)

![Classify Features](../images/plot_classify_features.png)

These classification metrics are preformatted for plotting.

In [9]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_data['test'].keys()

dict_keys(['confusion_matrix', 'roc_curve', 'precision_recall_curve'])

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_roc_curve()

![Classify ROC](../images/plot_roc.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_confusion_matrix()

![Plot Confusion](../images/plot_confusion_matrix.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_precision_recall()

![Precision Recall](../images/plot_precision_recall.png)

### Job Metrics

Each training `Prediction` contains the following metrics by split/fold:

In [13]:
from pprint import pprint as p

In [14]:
p(queue_multiclass.jobs[0].predictors[0].predictions[0].metrics)

{'test': {'accuracy': 0.9629629629629629,
          'f1': 0.9628482972136223,
          'loss': 0.2035779356956482,
          'precision': 0.9666666666666667,
          'recall': 0.9629629629629629,
          'roc_auc': 1.0},
 'train': {'accuracy': 0.9215686274509803,
           'f1': 0.92046783625731,
           'loss': 0.21987806260585785,
           'precision': 0.9365079365079364,
           'recall': 0.9215686274509803,
           'roc_auc': 0.9968281430219147},
 'validation': {'accuracy': 0.8571428571428571,
                'f1': 0.8557692307692308,
                'loss': 0.3290098011493683,
                'precision': 0.8703703703703705,
                'recall': 0.8571428571428571,
                'roc_auc': 0.9727891156462586}}


It also contains per-epoch `History` metrics calculated during model training.

In [15]:
queue_multiclass.jobs[0].predictors[0].history.keys()

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

---

## Regression

Let's quickly generate a trained quantification model to inspect.

In [17]:
%%capture
queue_regression = tests.make_test_queue('keras_regression')
queue_regression.run_jobs()

### Queue Visualization

When performing regression, the secondary training metric (non-loss) is 'r2'.

In [None]:
queue_regression.plot_performance(
    max_loss=1.5, min_r2=0.65
)

![Regression Boomerang](../images/plot_regression_boomerang.png)

### Queue Metrics

In [19]:
queue_regression.metrics_to_pandas().head(9)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,split,explained_variance,loss,mse,r2
0,11,11,19,train,0.033659,0.733253,0.971462,0.028538
1,11,11,19,validation,0.041182,0.723081,0.945438,0.015246
2,11,11,19,test,0.006876,0.599482,0.636772,0.002019


These are also aggregated by metric across all splits/folds.

In [20]:
queue_regression.metrics_aggregate_to_pandas().tail(12)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,metric,maximum,minimum,pstdev,median,mean
0,11,11,20,explained_variance,0.041182,0.006876,0.014723,0.033659,0.027239
1,11,11,20,loss,0.733253,0.599482,0.060805,0.723081,0.685272
2,11,11,20,mse,0.971462,0.636772,0.152012,0.945438,0.851224
3,11,11,20,r2,0.028538,0.002019,0.010827,0.015246,0.015268


### Job Visualization

In [None]:
queue_regression.jobs[0].predictors[0].plot_learning_curve(loss_skip_15pct=True)

![Regression Learn](../images/plot_regression_learn.png)

In [None]:
queue_regression.jobs[0].predictors[0].predictions[0].plot_feature_importance(top_n=12)

![Regression Features](../images/plot_regression_features.png)

### Job Metrics

Each training `Prediction` contains the following metrics.

In [23]:
p(queue_regression.jobs[0].predictors[0].predictions[0].metrics)

{'test': {'explained_variance': 0.006876392861274172,
          'loss': 0.5994815230369568,
          'mse': 0.6367720159085998,
          'r2': 0.0020185652597658477},
 'train': {'explained_variance': 0.0336593253630777,
           'loss': 0.7332529425621033,
           'mse': 0.9714619941285391,
           'r2': 0.028538005871460936},
 'validation': {'explained_variance': 0.041182110870757516,
                'loss': 0.7230809926986694,
                'mse': 0.9454382769758778,
                'r2': 0.015246362433524951}}


It also contains per-epoch metrics calculated during model training.

In [24]:
queue_regression.jobs[0].predictors[0].history.keys()

dict_keys(['loss', 'mean_squared_error', 'val_loss', 'val_mean_squared_error'])