Skip to content

Latest commit

 

History

History
152 lines (105 loc) · 4.6 KB

README.md

File metadata and controls

152 lines (105 loc) · 4.6 KB

Visualizing the Evaluation of ML Models with Yellowbrick and alternate implementations

Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier.


The code producing the following visuals is in the module model_evaluation_reports.py

import model_evaluation_reports as rpts

Yellowbrick binary classification example:

http://www.scikit-yb.org/en/latest/tutorial.html


Model evaluation report using Yellowbrick ClassificationReport():

Yellowbrick's visual report returns a matrix of P, R, and F1 scores for each model.
It is indeed very neat, but in my opinion, not very practical since the goal of the visualization is to enable picking "the best" model...

With lots of models → lots of scolling!

Data and models:

# Mushroom dataset & models list:
X, y, labels = rpts.get_mushroom_data()
models = rpts.get_models()
rpts.yellowbrick_model_evaluation_report(X, y, models)

png

png

png

png

png

png

png

png

png


Alternate visualizations:

  • HTML table (with Pandas Style)

  • Bar plot


Table output with class & support in column headers:

→ using model_evaluation_report_tbl(models, X, y, labels, caption):

rpts.model_evaluation_report_tbl(models, X, y, labels, 'Model selection report')  # green: max; pink: min

png

Bar plot output

→ using model_evaluation_report_bar(models, X, y, labels):

rpts.model_evaluation_report_bar(models, X, y, labels, xlim_to_1=False, encode=True)
# Note: xlim_to_1=False, encode=True :: default values

png

same with x-limit set to 1:

rpts.model_evaluation_report_bar(models, X, y, labels, xlim_to_1=True)

png

Example with a multi-class classification

# Iris dataset from sklearn & same models:
X, y, labels = rpts.get_iris_data()
models = rpts.get_models()
# The encoding is alredy done on this dataset, so encode=False.
rpts.model_evaluation_report_bar(models, X, y, labels, encode=False)

png

Clearly, the model scores depend on the model and the dataset.


Using radar plots: one plot for each class.

I've attempted reproducing the radar plots in a single row (whenever possible), but that implementation needs more work as the plots end up being squished too close together.

I'm glad I went through adapting DeepMind/bsuite radar charts, but I am I not quite satisfied with the outcome, at least with the Iris dataset: they only make it easy to id the least performant model, here SGDClassifier.
Additionally, until — and if — I find a way to line up the plots more compactly, they also suffer from the same 'scrolling objection' I initially made...with only 3 classes!

dfm_iris = rpts.get_scores_df(models, X, y, labels, encode=False)

for lbl in labels:
    rpts.scores_radar_plot_example(dfm_iris, cat=lbl)

png

png

png



Implementation notes: get_scores_df()

This function has a parameter, class_col, that acts as a swicth to output either the scores or classes as columns.
Only the reporting function using the Pandas Styler, model_evaluation_report_tbl(), requires class_col=True, while the others do not (default = False).

dfm_iris_tbl = rpts.get_scores_df(models, X, y, labels, encode=False, class_col=True)
dfm_iris_tbl.head()

png

dfm_iris = rpts.get_scores_df(models, X, y, labels, encode=False)
dfm_iris.head()

png

Either df can be passed through the styler function, so this 'DataFrame approach' could be the most straightforward for cases with a large number of classes:

rpts.model_evaluation_report_from_df(dfm_iris_tbl, 'Model selection report (from df)')

png

rpts.model_evaluation_report_from_df(dfm_iris, 'Model selection report (from df)')

png