Explainability doc #2901

jwood803 · 2019-03-10T02:03:26Z

Initial draft to add explainability documentation.

Fix for #2438.

codecov · 2019-03-10T02:51:32Z

Codecov Report

Merging #2901 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2901      +/-   ##
==========================================
+ Coverage    72.7%   72.71%   +0.01%     
==========================================
  Files         807      807              
  Lines      145172   145301     +129     
  Branches    16225    16227       +2     
==========================================
+ Hits       105541   105662     +121     
- Misses      35217    35223       +6     
- Partials     4414     4416       +2

Flag	Coverage Δ
#Debug	`72.71% <100%> (+0.01%)`	⬆️
#production	`68.22% <ø> (-0.01%)`	⬇️
#test	`89.02% <100%> (+0.04%)`	⬆️

Impacted Files	Coverage Δ
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs	`95.48% <100%> (+1.99%)`	⬆️
...soft.ML.Transforms/Text/WordEmbeddingsExtractor.cs	`87.52% <0%> (-0.91%)`	⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs	`89.26% <0%> (-0.63%)`	⬇️

artidoro

Thanks for the good work @jwood803!

artidoro · 2019-03-20T18:26:35Z

docs/code/ExplainabilityCookBook.md

+
+All of these samples will use the [housing data](https://github.com/dotnet/machinelearning/blob/master/test/data/housing.txt) and will reference the below data schema class and pipeline.
+
+```csharp


We should try to make sure that the cookbooks code compiles and runs properly even with the ongoing changes.
The way we do this is by adding all the code and utilities in a test in https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamplesDynamicApi.cs

We then copy the most important parts of the CookbookSamplesDynamicApi.cs to the .md file.

If you could do that it would be great!

artidoro · 2019-03-20T18:30:21Z

docs/code/ExplainabilityCookBook.md

+
+MLContext context = new MLContext();
+
+IDataView data = context.Data.LoadFromTextFile("./housing.txt", new[]


So this code for loading and specifying the data class would be necessary in the .cs file to run your tests. However, I think we should not add it to the .md file. We have other sections of the cookbook and samples in which we illustrate that.

artidoro · 2019-03-20T18:46:09Z

docs/code/ExplainabilityCookBook.md

+var model = pipeline.Fit(data);
+```
+
+## How do I look at the global feature importance?


I would suggest not to write these under a separate cookbook, but rather to add them to the main MlNetCookBook.md under a separate model explainability section after the model inspection section: "I want to look at my model's coefficients".

rogancarr · 2019-03-26T18:54:25Z

docs/code/MlNetCookBook.md

@@ -578,6 +578,48 @@ var biases = modelParameters.GetBiases();

 ```

+## How do I look at the global feature importance?
+The below snippet shows how to get a glimpse of the the feature importance, or how much each column of data impacts the performance of the model.


column of data [](start = 93, length = 14)

"feature" rather than "column of data". The end features in the model might not be exactly the input columns. #Resolved

rogancarr · 2019-03-26T18:57:33Z

docs/code/MlNetCookBook.md

+
+foreach (var metricsStatistics in featureImportance)
+{
+    Console.WriteLine($"Root Mean Squared - {metricsStatistics.Rms.Mean}");


Console.WriteLine($"Root Mean Squared - {metricsStatistics.Rms.Mean}"); [](start = 4, length = 71)

Explain a bit above about what this is calculating. It's not giving the RMS, but the difference in RMS for each feature if the feature were to be replaced with a random value.

Also, I would print "Feature I: Difference in RMS" rather than just the RMS.

rogancarr · 2019-03-26T18:58:42Z

docs/code/MlNetCookBook.md

+}
+```
+
+## How do I get a model's weights to look at the global feature importance?


Move this above PFI, as it's the most naïve way we have to ask this question. #Resolved

rogancarr · 2019-03-26T19:00:01Z

docs/code/MlNetCookBook.md

+```
+
+## How do I get a model's weights to look at the global feature importance?
+The below snippet shows how to get a model's weights to help determine the feature importance of the model.


The below [](start = 0, length = 9)

Note that for a linear model, the weights are only an approximation. It helps to standardize the variables before the fit, so that they are all on the same scale, and even then, the linear regression solution does not account for correlations between the variables, and therefore this isn't a great measure of explainability.

rogancarr · 2019-03-26T19:00:18Z

docs/code/MlNetCookBook.md

+var linearModel = model.LastTransformer.Model;
+
+var weights = new VBuffer<float>();
+linearModel.GetFeatureWeights(ref weights);


Add for trees as well -- see the functional tests.

rogancarr · 2019-03-26T19:00:54Z

docs/code/MlNetCookBook.md

+linearModel.GetFeatureWeights(ref weights);
+```
+
+## How do I look at the feature importance per row?


"Local feature importance"

"row" => "example" (the language we've shifted to using) #Resolved

rogancarr · 2019-03-26T19:01:33Z

docs/code/MlNetCookBook.md

+
+var shuffledSubset = context.Data.TakeRows(context.Data.ShuffleRows(featureContributionData), 10);
+
+var preview = shuffledSubset.Preview();


manually print the rows of data after casting to an enumerable. You can copy / paste from the FCC Sample.

rogancarr · 2019-03-26T19:02:30Z

This looks great! Just a few comments.

rogancarr · 2019-04-03T23:40:51Z

docs/code/MlNetCookBook.md

+```
+
+## How do I look at the global feature importance?
+The below snippet shows how to get a glimpse of the the feature importance, or how much each feature impacts the performance of the model. It also outputs the difference in root mean squared for each feature as though the feature were replaced with a random value.


It also outputs the difference in root mean squared for each feature as though the feature were replaced with a random value. [](start = 139, length = 125)

"Permutation Feature Importance works by computing the change in the evaluation metrics when each feature is replaced by a random value. In this case, we are investigating the change in the root mean squared error".

rogancarr · 2019-04-03T23:46:36Z

docs/code/MlNetCookBook.md

+
+foreach (var metricsStatistics in featureImportance)
+{
+    Console.WriteLine($"Feature I: Difference in RMS - {metricsStatistics.Rms.Mean}");


I [](start = 32, length = 1)

Sorry, I meant the number of the feature, like 0, 1, 2, ... "Feature 0: " "Feature 1: " etc.

rogancarr · 2019-04-03T23:47:17Z

docs/code/MlNetCookBook.md

+```
+
+## How do I look at the local feature importance per example?
+The below snippet shows how to get feature importance for each example of data.


feature importance for each example of data [](start = 35, length = 43)

Can you link to the appropriate place in docs for more information for all of these? Maybe we don't actually need to go into major details here.

The best doc I could find is this one. Is this ok to link to in each of these sections or would there be a doc for each of these?

Oh, I was thinking we could link to the code samples in the repo. But this is a moving target, so let's revisit later.

Sounds good! Apologies for the misunderstanding. Was there anything else I missed for the PR? Just making sure no one is waiting for me to make more updates 😄

rogancarr

LGTM!

artidoro

jwood803 added 2 commits March 9, 2019 20:03

Initial add of explainability doc

1685e9c

Add more to the explainability doc

dd49800

sfilipi requested review from rogancarr and artidoro March 10, 2019 03:37

sfilipi assigned jwood803 Mar 10, 2019

artidoro reviewed Mar 20, 2019

View reviewed changes

jwood803 added 4 commits March 26, 2019 06:29

Update cookbook

4088804

Merge branch 'master' into explainability-doc

3591f8d

Add initial test for explainability doc

a31227b

Add remaining tests and update doc

78496e9

rogancarr reviewed Mar 26, 2019

View reviewed changes

Update for PR feedback

187497c

rogancarr reviewed Apr 3, 2019

View reviewed changes

Update for more feedback

835d36b

rogancarr approved these changes Apr 12, 2019

View reviewed changes

Merge branch 'master' into explainability-doc

e59f480

artidoro approved these changes Apr 17, 2019

View reviewed changes

shauheen requested review from shmoradims and sfilipi April 20, 2019 04:47

shauheen merged commit d8e0462 into dotnet:master Apr 20, 2019

dotnet locked as resolved and limited conversation to collaborators Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explainability doc #2901

Explainability doc #2901

jwood803 commented Mar 10, 2019

codecov bot commented Mar 10, 2019 •

edited

artidoro left a comment

artidoro Mar 20, 2019

artidoro Mar 20, 2019

artidoro Mar 20, 2019

rogancarr Mar 26, 2019 •

edited

rogancarr Mar 26, 2019

rogancarr Mar 26, 2019 •

edited

rogancarr Mar 26, 2019

rogancarr Mar 26, 2019

rogancarr Mar 26, 2019 •

edited

rogancarr Mar 26, 2019

rogancarr commented Mar 26, 2019

rogancarr Apr 3, 2019

rogancarr Apr 3, 2019

rogancarr Apr 3, 2019

jwood803 Apr 5, 2019

rogancarr Apr 8, 2019

jwood803 Apr 12, 2019

rogancarr left a comment

artidoro left a comment


		All of these samples will use the [housing data](https://github.com/dotnet/machinelearning/blob/master/test/data/housing.txt) and will reference the below data schema class and pipeline.

		```csharp


		MLContext context = new MLContext();

		IDataView data = context.Data.LoadFromTextFile("./housing.txt", new[]


		var shuffledSubset = context.Data.TakeRows(context.Data.ShuffleRows(featureContributionData), 10);

		var preview = shuffledSubset.Preview();

Explainability doc #2901

Explainability doc #2901

Conversation

jwood803 commented Mar 10, 2019

codecov bot commented Mar 10, 2019 • edited

Codecov Report

artidoro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr Mar 26, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr Mar 26, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr Mar 26, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr commented Mar 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rogancarr left a comment

Choose a reason for hiding this comment

artidoro left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 10, 2019 •

edited

rogancarr Mar 26, 2019 •

edited

rogancarr Mar 26, 2019 •

edited

rogancarr Mar 26, 2019 •

edited