Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ICE/PDP explainer #1284

Merged
merged 42 commits into from Dec 20, 2021
Merged

Conversation

ezherdeva
Copy link
Contributor

In this PR, I'm introducing ICETransformer and adding it in the com.microsoft.ml.spark.explainers package.

ICETransformer displays the model dependence on specified features with the given data frame.

  • ICETransformer supports categorical and numeric features.
  • It supports 2 types of plots: "average" - PDP and "individual" - ICE
  • This transformer only supports a one-way dependence plot.

Also, I added ICECategoricalFeature and ICENumericFeature classes which are used in ICETransformer.

All of these classes can be called from the python side.​

ezherdeva and others added 28 commits September 17, 2021 11:09
…ICEExplainer.scala

Co-authored-by: Jason Wang <jasonwang_83@hotmail.com>
…ICEExplainer.scala

Co-authored-by: Jason Wang <jasonwang_83@hotmail.com>
…ICEExplainer.scala

Co-authored-by: Jason Wang <jasonwang_83@hotmail.com>
…ICEExplainer.scala

Co-authored-by: Jason Wang <jasonwang_83@hotmail.com>
…ICEExplainer.scala

Co-authored-by: Jason Wang <jasonwang_83@hotmail.com>
…g/FuzzingTest.scala

Co-authored-by: Kashyap Patel <64443771+ms-kashyap@users.noreply.github.com>
@ms-kashyap
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this fantastic work! I put in a few nits but to give it a good review perhaps we should chat so I can get a better idea of what this does then

val result = predicted.withColumn(targetCol, explainTarget)

getKind.toLowerCase match {
case this.averageKind =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: idt you need the "this" here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added backticks, because it's looking for a stable identifier. Ref: https://stackoverflow.com/questions/7078022/why-does-pattern-matching-in-scala-not-work-with-variables

val targetClasses = DatasetExtensions.findUnusedColumnName("targetClasses", df)
val dfWithId = df
.withColumn(idCol, monotonically_increasing_id())
.withColumn(targetClasses, this.get(targetClassesCol).map(col).getOrElse(lit(getTargetClasses)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use getters directly here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targetClassesCol is Optional by design, that's why we're using get like this

@ezherdeva ezherdeva marked this pull request as draft December 3, 2021 20:54
@ezherdeva ezherdeva marked this pull request as ready for review December 11, 2021 00:56
@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bit more detailed feedback. Looks awesome though and appreciate all the hard work and iterations!

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the comments that are marked as resolved don't seem to be resolved perhaps I'm missing something or commenting too early

}

private def collectCategoricalValues[_](df: DataFrame, feature: ICECategoricalFeature): Array[_] = {
val featureCount = DatasetExtensions.findUnusedColumnName("__feature__count__", df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to rename this to featureCount to be consistent with other added columns

@mhamilton723 mhamilton723 merged commit 46cd375 into microsoft:master Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants