feat: new LIME and KernelSHAP explainers #1077

memoryz · 2021-06-09T17:21:09Z

In this PR, we rewrote the LIME explainers and added KernelSHAP explainers in the com.microsoft.ml.spark.explainers package.

New features:

KernelSHAP explainer for tabular, vector, image and text models.
LIME explainer now supports kernel width and sample weights.
Both explainer support categorical variable (in tabular explainer).
Both explainers report r-squared metric from the underlying regression model.
Both explainers support explaining multiple classes output in one run.
For tabular and vector models, both explainers support passing in a background dataframe. ~~If one is not given, the dataframe used for local interpretation will be used as background data.~~

Sample notebooks will be included in the next PR.

memoryz · 2021-06-09T17:21:21Z

/azp run

azure-pipelines · 2021-06-09T17:21:31Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-09T17:25:43Z

/azp run

azure-pipelines · 2021-06-09T17:25:59Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-09T17:29:04Z

/azp run

azure-pipelines · 2021-06-09T17:29:15Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-09T17:32:33Z

/azp run

azure-pipelines · 2021-06-09T17:32:43Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2021-06-09T17:34:33Z

Codecov Report

Merging #1077 (1364d30) into master (00bac62) will increase coverage by 1.15%.
The diff coverage is 93.86%.

@@            Coverage Diff             @@
##           master    #1077      +/-   ##
==========================================
+ Coverage   84.34%   85.50%   +1.15%     
==========================================
  Files         208      232      +24     
  Lines        9789    10484     +695     
  Branches      565      601      +36     
==========================================
+ Hits         8257     8964     +707     
+ Misses       1532     1520      -12

Impacted Files	Coverage Δ
...a/com/microsoft/ml/spark/core/utils/RowUtils.scala	`0.00% <0.00%> (ø)`
...a/com/microsoft/ml/spark/explainers/RowUtils.scala	`11.11% <11.11%> (ø)`
...om/microsoft/ml/spark/explainers/BreezeUtils.scala	`50.00% <50.00%> (ø)`
...ala/org/apache/spark/ml/param/DataFrameParam.scala	`70.83% <57.14%> (+14.31%)`	⬆️
...microsoft/ml/spark/explainers/LocalExplainer.scala	`76.92% <76.92%> (ø)`
...m/microsoft/ml/spark/explainers/FeatureStats.scala	`87.50% <87.50%> (ø)`
...om/microsoft/ml/spark/explainers/TabularSHAP.scala	`91.66% <91.66%> (ø)`
.../com/microsoft/ml/spark/explainers/ImageSHAP.scala	`93.10% <93.10%> (ø)`
.../com/microsoft/ml/spark/explainers/ImageLIME.scala	`93.33% <93.33%> (ø)`
...com/microsoft/ml/spark/explainers/VectorLIME.scala	`93.33% <93.33%> (ø)`
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00bac62...1364d30. Read the comment docs.

src/main/python/mmlspark/explainers/TabularLIME.py

mhamilton723 · 2021-06-09T18:39:41Z

src/main/scala/com/microsoft/ml/spark/codegen/Wrappable.scala

  protected lazy val pyParamsDefinitions: String = {
    this.params.map { p =>
      val typeConverterString = getParamInfo(p).pyTypeConverter.map(", typeConverter=" + _).getOrElse("")
-      s"""|${p.name} = Param(Params._dummy(), "${p.name}", "${p.doc}"$typeConverterString)
+      s"""|${p.name} = Param(Params._dummy(), "${p.name}", "${escape(p.doc)}"$typeConverterString)


mhamilton723 · 2021-06-09T18:41:27Z

src/main/scala/com/microsoft/ml/spark/core/schema/DatasetExtensions.scala

-      counter += 1
-    }
-    unusedColumnName
+    val stream = Iterator(prefix) ++ Iterator.from(1, 1).map(prefix + "_" + _)


memoryz · 2021-06-09T23:44:26Z

/azp run

azure-pipelines · 2021-06-09T23:44:40Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-10T00:23:17Z

/azp run

azure-pipelines · 2021-06-10T00:23:37Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-10T01:33:42Z

/azp run

azure-pipelines · 2021-06-10T01:33:53Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-10T03:08:45Z

/azp run

azure-pipelines · 2021-06-10T03:08:57Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-10T03:55:55Z

/azp run

azure-pipelines · 2021-06-10T03:56:06Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-10T04:32:49Z

/azp run

azure-pipelines · 2021-06-10T04:32:59Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-10T04:53:19Z

/azp run

azure-pipelines · 2021-06-10T04:53:30Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-10T08:42:38Z

/azp run

azure-pipelines · 2021-06-10T08:42:49Z

Azure Pipelines successfully started running 1 pipeline(s).

… randomness in explanation after deserialization.

…tps://github.com/slundberg/shap

memoryz · 2021-06-17T20:06:58Z

/azp run

azure-pipelines · 2021-06-17T20:07:08Z

Azure Pipelines successfully started running 1 pipeline(s).

memoryz · 2021-06-17T20:45:56Z

/azp run

azure-pipelines · 2021-06-17T20:46:07Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2021-06-18T18:48:18Z

src/main/scala/com/microsoft/ml/spark/core/utils/RowUtils.scala

@@ -6,10 +6,13 @@ package com.microsoft.ml.spark.core.utils
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.expressions.GenericRow

+// This class currently has no usage. Should we just remove it?
+@deprecated("This is a copy of Row.merge function from Spark, which was marked deprecated.", "1.0.0-rc3")


yes we can remove

memoryz requested a review from mhamilton723 June 9, 2021 17:21

memoryz force-pushed the jasowang/lime branch from d0b24d9 to ddfaa28 Compare June 9, 2021 17:28

mhamilton723 reviewed Jun 9, 2021

View reviewed changes

src/main/python/mmlspark/explainers/TabularLIME.py Outdated Show resolved Hide resolved

mhamilton723 reviewed Jun 9, 2021

View reviewed changes

memoryz added 20 commits June 17, 2021 13:06

Excluding SerializationFuzzing for SHAP suites due to error caused by…

f151952

… randomness in explanation after deserialization.

Addressing code review comments

c245975

Code review feedback

a7504ae

Code review feedback

6b86149

code review comments

54e8b1e

code review feedbacks

ba73b50

more...

a6ae582

more...

00aa703

sort

75ee15f

Rename Spark vector imports

9dc3ca8

use string constants

5f85475

Change regression base to support sparse vector as well

c4a9c43

Clean up printlns

a85b879

background dataframe should be mandatory.

77f0e3e

Extracting slicer function

392ac1e

WIP: Rewrite sampler for kernel SHAP

3106557

Rewrite tabular LIME sampler to support non-numerial types

352defd

Add file header, fixing unit tests.

bf10732

Add header

8bc3ce3

Add unit test to compare shap explainer with kernel explainer from ht…

df7f736

…tps://github.com/slundberg/shap

memoryz force-pushed the jasowang/lime branch from 47c4e5b to df7f736 Compare June 17, 2021 20:06

Fixing unit test

1364d30

mhamilton723 approved these changes Jun 18, 2021

View reviewed changes

memoryz merged commit 7dd6bb1 into microsoft:master Jun 18, 2021

memoryz deleted the jasowang/lime branch June 18, 2021 19:07

memoryz mentioned this pull request Jun 20, 2021

Inconsistency in Probability Values in TabularLIME #1049

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new LIME and KernelSHAP explainers #1077

feat: new LIME and KernelSHAP explainers #1077

memoryz commented Jun 9, 2021 •

edited

Loading

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

codecov bot commented Jun 9, 2021 •

edited

Loading

mhamilton723 Jun 9, 2021

mhamilton723 Jun 9, 2021

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 17, 2021

azure-pipelines bot commented Jun 17, 2021

memoryz commented Jun 17, 2021

azure-pipelines bot commented Jun 17, 2021

mhamilton723 Jun 18, 2021

feat: new LIME and KernelSHAP explainers #1077

feat: new LIME and KernelSHAP explainers #1077

Conversation

memoryz commented Jun 9, 2021 • edited Loading

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

codecov bot commented Jun 9, 2021 • edited Loading

Codecov Report

mhamilton723 Jun 9, 2021

Choose a reason for hiding this comment

mhamilton723 Jun 9, 2021

Choose a reason for hiding this comment

memoryz commented Jun 9, 2021

azure-pipelines bot commented Jun 9, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 10, 2021

azure-pipelines bot commented Jun 10, 2021

memoryz commented Jun 17, 2021

azure-pipelines bot commented Jun 17, 2021

memoryz commented Jun 17, 2021

azure-pipelines bot commented Jun 17, 2021

mhamilton723 Jun 18, 2021

Choose a reason for hiding this comment

memoryz commented Jun 9, 2021 •

edited

Loading

codecov bot commented Jun 9, 2021 •

edited

Loading