New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explainability doc #2901
Explainability doc #2901
Changes from 6 commits
1685e9c
dd49800
4088804
3591f8d
a31227b
78496e9
187497c
835d36b
e59f480
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -578,6 +578,48 @@ var biases = modelParameters.GetBiases(); | |
|
||
``` | ||
|
||
## How do I look at the global feature importance? | ||
The below snippet shows how to get a glimpse of the the feature importance, or how much each column of data impacts the performance of the model. | ||
|
||
```csharp | ||
var transformedData = model.Transform(data); | ||
|
||
var featureImportance = context.Regression.PermutationFeatureImportance(model.LastTransformer, transformedData); | ||
|
||
foreach (var metricsStatistics in featureImportance) | ||
{ | ||
Console.WriteLine($"Root Mean Squared - {metricsStatistics.Rms.Mean}"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Explain a bit above about what this is calculating. It's not giving the RMS, but the difference in RMS for each feature if the feature were to be replaced with a random value. Also, I would print "Feature I: Difference in RMS" rather than just the RMS. |
||
} | ||
``` | ||
|
||
## How do I get a model's weights to look at the global feature importance? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move this above PFI, as it's the most naïve way we have to ask this question. #Resolved |
||
The below snippet shows how to get a model's weights to help determine the feature importance of the model. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Note that for a linear model, the weights are only an approximation. It helps to standardize the variables before the fit, so that they are all on the same scale, and even then, the linear regression solution does not account for correlations between the variables, and therefore this isn't a great measure of explainability. |
||
|
||
```csharp | ||
var linearModel = model.LastTransformer.Model; | ||
|
||
var weights = new VBuffer<float>(); | ||
linearModel.GetFeatureWeights(ref weights); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add for trees as well -- see the functional tests. |
||
``` | ||
|
||
## How do I look at the feature importance per row? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Local feature importance" "row" => "example" (the language we've shifted to using) #Resolved |
||
The below snippet shows how to get feature importance for each row. | ||
|
||
```csharp | ||
var model = pipeline.Fit(data); | ||
var transfomedData = model.Transform(data); | ||
|
||
var linearModel = model.LastTransformer; | ||
|
||
var featureContributionCalculation = context.Transforms.CalculateFeatureContribution(linearModel, normalize: false); | ||
|
||
var featureContributionData = featureContributionCalculation.Fit(transfomedData).Transform(transfomedData); | ||
|
||
var shuffledSubset = context.Data.TakeRows(context.Data.ShuffleRows(featureContributionData), 10); | ||
|
||
var preview = shuffledSubset.Preview(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. manually print the rows of data after casting to an enumerable. You can copy / paste from the FCC Sample. |
||
``` | ||
|
||
## What is normalization and why do I need to care? | ||
|
||
In ML.NET we expose a number of [parametric and non-parametric algorithms](https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms/). | ||
|
@@ -791,6 +833,7 @@ var transformedData = pipeline.Fit(data).Transform(data); | |
var embeddings = transformedData.GetColumn<float[]>(mlContext, "Embeddings").Take(10).ToArray(); | ||
var unigrams = transformedData.GetColumn<float[]>(mlContext, "BagOfWords").Take(10).ToArray(); | ||
``` | ||
|
||
## How do I train using cross-validation? | ||
|
||
[Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) is a useful technique for ML applications. It helps estimate the variance of the model quality from one run to another and also eliminates the need to extract a separate test set for evaluation. | ||
|
@@ -841,6 +884,7 @@ var microAccuracies = cvResults.Select(r => r.Metrics.AccuracyMicro); | |
Console.WriteLine(microAccuracies.Average()); | ||
|
||
``` | ||
|
||
## Can I mix and match static and dynamic pipelines? | ||
|
||
Yes, we can have both of them in our codebase. The static pipelines are just a statically-typed way to build dynamic pipelines. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"feature" rather than "column of data". The end features in the model might not be exactly the input columns. #Resolved