Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explainability doc #2901

Merged
merged 9 commits into from Apr 20, 2019
Merged

Explainability doc #2901

merged 9 commits into from Apr 20, 2019

Conversation

jwood803
Copy link
Contributor

Initial draft to add explainability documentation.

Fix for #2438.

@codecov
Copy link

codecov bot commented Mar 10, 2019

Codecov Report

Merging #2901 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2901      +/-   ##
==========================================
+ Coverage    72.7%   72.71%   +0.01%     
==========================================
  Files         807      807              
  Lines      145172   145301     +129     
  Branches    16225    16227       +2     
==========================================
+ Hits       105541   105662     +121     
- Misses      35217    35223       +6     
- Partials     4414     4416       +2
Flag Coverage Δ
#Debug 72.71% <100%> (+0.01%) ⬆️
#production 68.22% <ø> (-0.01%) ⬇️
#test 89.02% <100%> (+0.04%) ⬆️
Impacted Files Coverage Δ
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 95.48% <100%> (+1.99%) ⬆️
...soft.ML.Transforms/Text/WordEmbeddingsExtractor.cs 87.52% <0%> (-0.91%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good work @jwood803!


All of these samples will use the [housing data](https://github.com/dotnet/machinelearning/blob/master/test/data/housing.txt) and will reference the below data schema class and pipeline.

```csharp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to make sure that the cookbooks code compiles and runs properly even with the ongoing changes.
The way we do this is by adding all the code and utilities in a test in https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamplesDynamicApi.cs

We then copy the most important parts of the CookbookSamplesDynamicApi.cs to the .md file.

If you could do that it would be great!


MLContext context = new MLContext();

IDataView data = context.Data.LoadFromTextFile("./housing.txt", new[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this code for loading and specifying the data class would be necessary in the .cs file to run your tests. However, I think we should not add it to the .md file. We have other sections of the cookbook and samples in which we illustrate that.

var model = pipeline.Fit(data);
```

## How do I look at the global feature importance?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not to write these under a separate cookbook, but rather to add them to the main MlNetCookBook.md under a separate model explainability section after the model inspection section: "I want to look at my model's coefficients".

@@ -578,6 +578,48 @@ var biases = modelParameters.GetBiases();

```

## How do I look at the global feature importance?
The below snippet shows how to get a glimpse of the the feature importance, or how much each column of data impacts the performance of the model.
Copy link
Contributor

@rogancarr rogancarr Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column of data [](start = 93, length = 14)

"feature" rather than "column of data". The end features in the model might not be exactly the input columns. #Resolved


foreach (var metricsStatistics in featureImportance)
{
Console.WriteLine($"Root Mean Squared - {metricsStatistics.Rms.Mean}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Console.WriteLine($"Root Mean Squared - {metricsStatistics.Rms.Mean}"); [](start = 4, length = 71)

Explain a bit above about what this is calculating. It's not giving the RMS, but the difference in RMS for each feature if the feature were to be replaced with a random value.

Also, I would print "Feature I: Difference in RMS" rather than just the RMS.

}
```

## How do I get a model's weights to look at the global feature importance?
Copy link
Contributor

@rogancarr rogancarr Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this above PFI, as it's the most naïve way we have to ask this question. #Resolved

```

## How do I get a model's weights to look at the global feature importance?
The below snippet shows how to get a model's weights to help determine the feature importance of the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below [](start = 0, length = 9)

Note that for a linear model, the weights are only an approximation. It helps to standardize the variables before the fit, so that they are all on the same scale, and even then, the linear regression solution does not account for correlations between the variables, and therefore this isn't a great measure of explainability.

var linearModel = model.LastTransformer.Model;

var weights = new VBuffer<float>();
linearModel.GetFeatureWeights(ref weights);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add for trees as well -- see the functional tests.

linearModel.GetFeatureWeights(ref weights);
```

## How do I look at the feature importance per row?
Copy link
Contributor

@rogancarr rogancarr Mar 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Local feature importance"

"row" => "example" (the language we've shifted to using) #Resolved


var shuffledSubset = context.Data.TakeRows(context.Data.ShuffleRows(featureContributionData), 10);

var preview = shuffledSubset.Preview();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manually print the rows of data after casting to an enumerable. You can copy / paste from the FCC Sample.

@rogancarr
Copy link
Contributor

This looks great! Just a few comments.

```

## How do I look at the global feature importance?
The below snippet shows how to get a glimpse of the the feature importance, or how much each feature impacts the performance of the model. It also outputs the difference in root mean squared for each feature as though the feature were replaced with a random value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also outputs the difference in root mean squared for each feature as though the feature were replaced with a random value. [](start = 139, length = 125)

"Permutation Feature Importance works by computing the change in the evaluation metrics when each feature is replaced by a random value. In this case, we are investigating the change in the root mean squared error".


foreach (var metricsStatistics in featureImportance)
{
Console.WriteLine($"Feature I: Difference in RMS - {metricsStatistics.Rms.Mean}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I [](start = 32, length = 1)

Sorry, I meant the number of the feature, like 0, 1, 2, ... "Feature 0: " "Feature 1: " etc.

```

## How do I look at the local feature importance per example?
The below snippet shows how to get feature importance for each example of data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature importance for each example of data [](start = 35, length = 43)

Can you link to the appropriate place in docs for more information for all of these? Maybe we don't actually need to go into major details here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best doc I could find is this one. Is this ok to link to in each of these sections or would there be a doc for each of these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was thinking we could link to the code samples in the repo. But this is a moving target, so let's revisit later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Apologies for the misunderstanding. Was there anything else I missed for the PR? Just making sure no one is waiting for me to make more updates 😄

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shauheen shauheen merged commit d8e0462 into dotnet:master Apr 20, 2019
@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants