[eda] added `explain_rows` method to `autogluon.eda.auto` - Kernel SHAP visualization by gradientsky · Pull Request #3014 · autogluon/autogluon

gradientsky · 2023-03-07T21:57:28Z

Description of changes:

added explain_rows method to autogluon.eda.auto; the methods performs Kernel SHAP values analysis and visualization
quick_fit: fixes for highest_error and undecided rows calculations

Examples

import pandas as pd
import numpy as np
import autogluon.eda.auto as auto

# Load data
df_train = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/titanic/train.csv')
df_test = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/titanic/test.csv')
label='Survived'

# Fit model
state = auto.quick_fit(
    train_data=df_train,
    label=label,
    save_model_to_state=True,
    return_state=True,
    render_analysis=False,  # Don't render analysis
)

# Explain the row with highest error
auto.explain_rows(
    train_data=df_train,
    model=state.model,
    backend='shap',  # default | shap/fastshap
    plot='force',  # default | force/waterfall
    rows=state.model_evaluation.highest_error[:1],
)

# Explain the row predicted incorrectly, but closest to the decision boundary as waterfall plot
auto.explain_rows(
    train_data=df_train,
    model=state.model,
    display_rows=True,
    plot='waterfall',
    rows=state.model_evaluation.undecided[:1],
)

Using primitives

s = auto.analyze(
    train_data=df_train, model=state.model, 
    return_state=True, 
    anlz_facets=[
        # Backend using `shap` package
        eda.explain.ShapAnalysis(state.model_evaluation.highest_error[:2]),
        # Backend using `fastshap` package
        # eda.explain.FastShapAnalysis(state.model_evaluation.highest_error[:2]),
    ],
    viz_facets=[
        viz.explain.ExplainForcePlot(),  # Force layout
        viz.explain.ExplainWaterfallPlot(),  # Waterfall layout
    ]
)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Innixma · 2023-03-07T22:59:40Z

eda/setup.py

    'phik>=0.12.2,<0.13',
    'seaborn>=0.12.0,<0.13',
    'ipywidgets>=7.7.1,<9.0',  # min versions guidance: 7.7.1 collab/kaggle
+    'shap>=0.41,<0.42',


@gradientsky FYI, fastshap may be of interest to test out and compare performance as suggested here: #2222 (comment)

fastshap claims to be much faster than shap:

https://raw.githubusercontent.com/AnotherSamWilson/fastshap/master/benchmarks/iris_benchmark_time.png

Did a code rework. The code now split into backend analysis and rendering parts. Analysis supports both shap and fastshap libraries (two different backends). Using shap as a default one for auto functionality because it is faster.
Visualizations also split into separate primitives; compatible with both of the backends.

github-actions · 2023-03-08T01:40:35Z

Job PR-3014-c5e116b is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3014/c5e116b/index.html

review-notebook-app · 2023-03-10T03:12:03Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

gradientsky · 2023-03-10T16:26:04Z

Blocked by AnotherSamWilson/fastshap#8

Innixma · 2023-03-10T18:57:13Z

eda/setup.py

    'seaborn>=0.12.0,<0.13',
    'ipywidgets>=7.7.1,<9.0',  # min versions guidance: 7.7.1 collab/kaggle
+    'shap>=0.41,<0.42',
+    'fastshap>=0.3,<0.4',


Is fastshap good enough to have as a required dependency? What is the relative advantages of fastshap over shap?

For the purposes of explaining a few rows here, fastshap is slower than shap. I sent an update to remove fastshap backend completely (performance + support concern).

…SHAP values analysis and visualization

github-actions · 2023-03-10T22:19:23Z

Job PR-3014-4a6d17d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3014/4a6d17d/index.html

github-actions · 2023-03-10T22:23:58Z

Job PR-3014-2bc659e is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3014/2bc659e/index.html

github-actions · 2023-03-11T01:37:40Z

Job PR-3014-c4fc440 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3014/c4fc440/index.html

Innixma

LGTM! Had some minor comments

Innixma · 2023-03-13T20:19:59Z