Visualizing the Evaluation of ML Models with Yellowbrick and alternate implementations

Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier.

The code producing the following visuals is in the module `model_evaluation_reports.py`

import model_evaluation_reports as rpts

Yellowbrick binary classification example:

http://www.scikit-yb.org/en/latest/tutorial.html

Model evaluation report using Yellowbrick `ClassificationReport()`:

Yellowbrick's visual report returns a matrix of P, R, and F1 scores for each model.
It is indeed very neat, but in my opinion, not very practical since the goal of the visualization is to enable picking "the best" model...

With lots of models → lots of scolling!

Data and models:

# Mushroom dataset & models list:
X, y, labels = rpts.get_mushroom_data()
models = rpts.get_models()

rpts.yellowbrick_model_evaluation_report(X, y, models)

Alternate visualizations:

HTML table (with Pandas Style)
Bar plot

Table output with class & support in column headers:

→ using `model_evaluation_report_tbl(models, X, y, labels, caption)`:

rpts.model_evaluation_report_tbl(models, X, y, labels, 'Model selection report')  # green: max; pink: min

Bar plot output

→ using `model_evaluation_report_bar(models, X, y, labels)`:

rpts.model_evaluation_report_bar(models, X, y, labels, xlim_to_1=False, encode=True)
# Note: xlim_to_1=False, encode=True :: default values

same with x-limit set to 1:

rpts.model_evaluation_report_bar(models, X, y, labels, xlim_to_1=True)

Example with a multi-class classification

# Iris dataset from sklearn & same models:
X, y, labels = rpts.get_iris_data()
models = rpts.get_models()

# The encoding is alredy done on this dataset, so encode=False.
rpts.model_evaluation_report_bar(models, X, y, labels, encode=False)

Clearly, the model scores depend on the model and the dataset.

Using radar plots: one plot for each class.

I've attempted reproducing the radar plots in a single row (whenever possible), but that implementation needs more work as the plots end up being squished too close together.

I'm glad I went through adapting DeepMind/bsuite radar charts, but I am I not quite satisfied with the outcome, at least with the Iris dataset: they only make it easy to id the least performant model, here SGDClassifier.
Additionally, until — and if — I find a way to line up the plots more compactly, they also suffer from the same 'scrolling objection' I initially made...with only 3 classes!

dfm_iris = rpts.get_scores_df(models, X, y, labels, encode=False)

for lbl in labels:
    rpts.scores_radar_plot_example(dfm_iris, cat=lbl)

Implementation notes: `get_scores_df()`

This function has a parameter, class_col, that acts as a swicth to output either the scores or classes as columns.
Only the reporting function using the Pandas Styler, model_evaluation_report_tbl(), requires class_col=True, while the others do not (default = False).

dfm_iris_tbl = rpts.get_scores_df(models, X, y, labels, encode=False, class_col=True)
dfm_iris_tbl.head()

dfm_iris = rpts.get_scores_df(models, X, y, labels, encode=False)
dfm_iris.head()

Either df can be passed through the styler function, so this 'DataFrame approach' could be the most straightforward for cases with a large number of classes:

rpts.model_evaluation_report_from_df(dfm_iris_tbl, 'Model selection report (from df)')

rpts.model_evaluation_report_from_df(dfm_iris, 'Model selection report (from df)')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visualizing the Evaluation of ML Models with Yellowbrick and alternate implementations

The code producing the following visuals is in the module `model_evaluation_reports.py`

Yellowbrick binary classification example:

Model evaluation report using Yellowbrick `ClassificationReport()`:

With lots of models → lots of scolling!

Data and models:

Alternate visualizations:

HTML table (with Pandas Style)

Bar plot

Table output with class & support in column headers:

→ using `model_evaluation_report_tbl(models, X, y, labels, caption)`:

Bar plot output

→ using `model_evaluation_report_bar(models, X, y, labels)`:

same with x-limit set to 1:

Example with a multi-class classification

Clearly, the model scores depend on the model and the dataset.

Using radar plots: one plot for each class.

Implementation notes: `get_scores_df()`

Either df can be passed through the styler function, so this 'DataFrame approach' could be the most straightforward for cases with a large number of classes:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visualizing the Evaluation of ML Models with Yellowbrick and alternate implementations

The code producing the following visuals is in the module model_evaluation_reports.py

Yellowbrick binary classification example:

Model evaluation report using Yellowbrick ClassificationReport():

With lots of models → lots of scolling!

Data and models:

Alternate visualizations:

HTML table (with Pandas Style)

Bar plot

Table output with class & support in column headers:

→ using model_evaluation_report_tbl(models, X, y, labels, caption):

Bar plot output

→ using model_evaluation_report_bar(models, X, y, labels):

same with x-limit set to 1:

Example with a multi-class classification

Clearly, the model scores depend on the model and the dataset.

Using radar plots: one plot for each class.

Implementation notes: get_scores_df()

Either df can be passed through the styler function, so this 'DataFrame approach' could be the most straightforward for cases with a large number of classes:

The code producing the following visuals is in the module `model_evaluation_reports.py`

Model evaluation report using Yellowbrick `ClassificationReport()`:

→ using `model_evaluation_report_tbl(models, X, y, labels, caption)`:

→ using `model_evaluation_report_bar(models, X, y, labels)`:

Implementation notes: `get_scores_df()`