# Visualization & Metrics

## Prerequisites

`Plotly` is used for interactive charts (hover, toggle, zoom). Reference the [Installation](installation.html#Plotting) section for information about configuring Plotly. However, static images are used in this notebook due to lack of support for 3rd party JS in the documentation portal.

## Overview

As described in the [Low-Level Docs](api_low_level.html#10.-Assess-the-Results.), every training `Job` automatically generates metrics when evaluated against each split/ fold. 

The `Algorithm.analysis_type` determines which metrics and plots are prepared:

* Although `'classification_multi'` and `'classification_binary'` share the same metrics and plots, they go about producing these artifacts differently. For example, ROC curves `roc_multi_class=None` vs `roc_multi_class='ovr'`.

* `'regression'`, unlike the classification analyses, does not have an 'accuracy' metric, so we substitute 'r2', R^2 (coefficient of determination, for it. There are no regression-specific plots. Note that unsupervised/ self-supervised models are also considered a regression.

---

We'll use the `datum` and `tests` modules to rapidly generate a couple examples.

In [5]:
import aiqc
from aiqc import datum
from aiqc import tests

---

## Classification

Let's quickly generate a trained classification model to inspect.

In [7]:
%%capture
queue_multiclass = tests.make_test_queue('keras_multiclass')
queue_multiclass.run_jobs()

### Queue Visualization

`plot_performance` aka the "boomerang chart" is unique to AIQC, and it really brings the benefits of the library to light. Each model from the Queue is evaluated against all splits/ folds.

When performing classification, the secondary training metric (non-loss) is 'accuracy'.

In [None]:
queue_multiclass.plot_performance(
    max_loss = 1.5, min_accuracy = 0.70
)

![Classify Boomerang](../images/plot_classify_boomerang.png)

### Queue Metrics

* `selected_metrics:list` - If you get overwhelmed by the variety of metrics returned, then you can include the ones you want selectively by name.
* `sort_by:str` - You can sort the dataframe by any column name.
* `ascending:bool=False` - Descending if False.

In [32]:
queue_multiclass.metrics_to_pandas(
    selected_metrics=None
    , sort_by='predictor_id'
	, ascending=True
).head(6)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,split,accuracy,f1,loss,precision,recall,roc_auc
0,1,1,1,train,0.970588,0.970582,0.116122,0.970851,0.970588,0.999135
1,1,1,1,validation,0.952381,0.952137,0.157847,0.958333,0.952381,1.0
2,1,1,1,test,0.962963,0.962848,0.154691,0.966667,0.962963,0.995885
3,2,2,2,train,0.980392,0.980375,0.105167,0.981481,0.980392,0.999423
4,2,2,2,validation,0.952381,0.952137,0.194987,0.958333,0.952381,0.993197
5,2,2,2,test,0.888889,0.888235,0.259848,0.897727,0.888889,0.987654


These are also aggregated by metric across all splits/folds.

In [31]:
queue_multiclass.metrics_aggregate_to_pandas(
    selected_metrics=None
    , selected_stats=None
    , sort_by='predictor_id'
    , ascending=True
).head(12)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,metric,maximum,minimum,pstdev,median,mean
0,1,1,1,accuracy,0.970588,0.952381,0.007466,0.962963,0.961977
1,1,1,1,f1,0.970582,0.952137,0.007563,0.962848,0.961856
2,1,1,1,loss,0.157847,0.116122,0.018969,0.154691,0.142887
3,1,1,1,precision,0.970851,0.958333,0.005203,0.966667,0.965284
4,1,1,1,recall,0.970588,0.952381,0.007466,0.962963,0.961977
5,1,1,1,roc_auc,1.0,0.995885,0.001772,0.999135,0.99834
11,2,2,2,roc_auc,0.999423,0.987654,0.004807,0.993197,0.993425
10,2,2,2,recall,0.980392,0.888889,0.038281,0.952381,0.940554
9,2,2,2,precision,0.981481,0.897727,0.035314,0.958333,0.945847
7,2,2,2,f1,0.980375,0.888235,0.038544,0.952137,0.940249


### Job Visualization

Loss values in the first few epochs can often be extremely high before they plummet and become more gradual. This really stretches out the graph and makes it hard to see if the evaluation set is diverging or not. The `loss_skip_15pct:bool` parameter skips displaying the first 15% of epochs so that figure is more useful.

In [None]:
queue_multiclass.jobs[0].predictors[0].plot_learning_curve(loss_skip_15pct=True)

![Classify Learn](../images/plot_classify_learn.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_feature_importance()

![Classify Features](../images/plot_classify_features.png)

These classification metrics are preformatted for plotting.

In [13]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_data['test'].keys()

dict_keys(['confusion_matrix', 'roc_curve', 'precision_recall_curve'])

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_roc_curve()

![Classify ROC](../images/plot_roc.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_confusion_matrix()

![Plot Confusion](../images/plot_confusion_matrix.png)

In [None]:
queue_multiclass.jobs[0].predictors[0].predictions[0].plot_precision_recall()

![Precision Recall](../images/plot_precision_recall.png)

### Job Metrics

Each training `Prediction` contains the following metrics by split/fold:

In [8]:
from pprint import pprint as p

In [9]:
p(queue_multiclass.jobs[0].predictors[0].predictions[0].metrics)

{'test': {'accuracy': 0.9629629629629629,
          'f1': 0.9628482972136223,
          'loss': 0.15469132363796234,
          'precision': 0.9666666666666667,
          'recall': 0.9629629629629629,
          'roc_auc': 0.9958847736625513},
 'train': {'accuracy': 0.9705882352941176,
           'f1': 0.9705818732424834,
           'loss': 0.11612199246883392,
           'precision': 0.9708513708513707,
           'recall': 0.9705882352941176,
           'roc_auc': 0.9991349480968857},
 'validation': {'accuracy': 0.9523809523809523,
                'f1': 0.952136752136752,
                'loss': 0.15784651041030884,
                'precision': 0.9583333333333334,
                'recall': 0.9523809523809523,
                'roc_auc': 1.0}}


It also contains per-epoch `History` metrics calculated during model training.

In [34]:
queue_multiclass.jobs[0].predictors[0].history.keys()

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

Feature importance is calculated for each column of each `Feature.id`, except for `Feature.dataset.dataset_type=='image'`.

In [10]:
p(queue_multiclass.jobs[0].predictors[0].predictions[0].feature_importance)

{'1': {'petal_length': 0.2506590038537979,
       'petal_width': 1.5452145487070084,
       'sepal_length': 0.003435656428337097,
       'sepal_width': 0.3922399431467056}}


---

## Regression

Let's quickly generate a trained quantification model to inspect.

In [24]:
%%capture
queue_regression = tests.make_test_queue('keras_regression')
queue_regression.run_jobs()

### Queue Visualization

When performing regression, the secondary training metric (non-loss) is 'r2'.

In [None]:
queue_regression.plot_performance(
    max_loss=1.5, min_r2=0.65
)

![Regression Boomerang](../images/plot_regression_boomerang.png)

### Job Metrics

Each training `Prediction` contains the following metrics.

In [25]:
p(queue_regression.jobs[0].predictors[0].predictions[0].metrics)

{'test': {'explained_variance': 0.1023665892527974,
          'loss': 0.6583271026611328,
          'mse': 0.7793670124150162,
          'r2': 0.09934655089913158},
 'train': {'explained_variance': 0.15700615272739094,
           'loss': 0.6809597611427307,
           'mse': 0.8548576908474455,
           'r2': 0.14514230915255444},
 'validation': {'explained_variance': 0.20880938235215885,
                'loss': 0.6208290457725525,
                'mse': 0.7425323712073475,
                'r2': 0.18091056078803924}}


It also contains per-epoch metrics calculated during model training.

In [28]:
queue_regression.jobs[0].predictors[0].history.keys()

dict_keys(['loss', 'mean_squared_error', 'val_loss', 'val_mean_squared_error'])

Feature importance is calculated for each column of each `Feature.id`, except for `Feature.dataset.dataset_type=='image'`.

In [29]:
p(queue_regression.jobs[0].predictors[0].predictions[0].feature_importance)

{'3': {'age': 0.0025349855422973633,
       'chas': 0.0029946565628051758,
       'crim': 2.4020671844482422e-05,
       'dis': 0.0034887194633483887,
       'indus': 0.000500798225402832,
       'lstat': 0.003727257251739502,
       'nox': 0.0028020739555358887,
       'ptratio': 0.004640340805053711,
       'rad': 0.028512001037597656,
       'rm': 0.0004666447639465332,
       'tax': 0.11411339044570923,
       'zn': 0.00947493314743042}}


### Queue Metrics

In [27]:
queue_regression.metrics_to_pandas().head(9)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,split,explained_variance,loss,mse,r2
9,16,16,12,train,0.200475,0.660347,0.823595,0.176405
10,16,16,12,validation,0.233815,0.596068,0.710509,0.216236
11,16,16,12,test,0.104733,0.636857,0.774979,0.104418
6,15,15,11,train,0.171239,0.675939,0.839556,0.160444
7,15,15,11,validation,0.253439,0.606551,0.707181,0.219907
8,15,15,11,test,0.084613,0.657065,0.794649,0.081686
3,14,14,10,train,0.130221,0.696391,0.879654,0.120346
4,14,14,10,validation,0.220656,0.600674,0.721751,0.203834
5,14,14,10,test,0.061954,0.656068,0.816135,0.056857


These are also aggregated by metric across all splits/folds.

In [30]:
queue_regression.metrics_aggregate_to_pandas().tail(12)

Unnamed: 0,hyperparamcombo_id,job_id,predictor_id,metric,maximum,minimum,pstdev,median,mean
4,14,14,10,explained_variance,0.220656,0.061954,0.065,0.130221,0.13761
5,14,14,10,loss,0.696391,0.600674,0.039237,0.656068,0.651045
6,14,14,10,mse,0.879654,0.721751,0.064873,0.816135,0.805847
7,14,14,10,r2,0.203834,0.056857,0.060188,0.120346,0.127012
8,15,15,11,explained_variance,0.253439,0.084613,0.068931,0.171239,0.169764
9,15,15,11,loss,0.675939,0.606551,0.029293,0.657065,0.646518
10,15,15,11,mse,0.839556,0.707181,0.054965,0.794649,0.780462
11,15,15,11,r2,0.219907,0.081686,0.056611,0.160444,0.154012
12,16,16,12,explained_variance,0.233815,0.104733,0.054712,0.200475,0.179674
13,16,16,12,loss,0.660347,0.596068,0.026557,0.636857,0.63109


### Job Visualization

In [None]:
queue_regression.jobs[0].predictors[0].plot_learning_curve(loss_skip_15pct=True)

![Regression Learn](../images/plot_regression_learn.png)

In [None]:
queue_regression.jobs[0].predictors[0].predictions[0].plot_feature_importance()

![Regression Features](../images/plot_regression_features.png)