Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for most learning tasks to PFI #1832

Merged
merged 3 commits into from
Dec 12, 2018

Conversation

rogancarr
Copy link
Contributor

This PR adds support to Permutation Feature Importance for Multiclass Classification, Ranking, and Clustering.

Fixes #1771

@Ivanidzo4ka Ivanidzo4ka self-requested a review December 7, 2018 18:11
@rogancarr rogancarr requested review from glebuk and removed request for TomFinley and Ivanidzo4ka December 7, 2018 18:34
Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

…d put array delta calculations into a helper function.
IPredictionTransformer<IPredictor> model,
IDataView data,
string label = DefaultColumnNames.Label,
string features = DefaultColumnNames.Features,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be a list per its description. Also, FFM is a case one can have multiple feature columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point. Field-Aware Factorization Machines and at least one other internal learner have multiple feature columns, which is not supported by PFI. I'm going to add this as a separate fix, because it's a somewhat-orthogonal issue and might be a bit extensive.

Good find!

public void TestPfiMulticlassClassificationOnDenseFeatures()
{
var data = GetDenseDataset(TaskType.MulticlassClassification);
var model = ML.MulticlassClassification.Trainers.LogisticRegression().Fit(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test using FFM with multiple feature columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See previous comment)

@artidoro
Copy link
Contributor

    /// <param name="features">Feature column names.</param>

As pointed out by Wei-Sheng, should be singular, or a list of strings.


Refers to: src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs:53 in 9a5d750. [](commit_id = 9a5d750, deletion_comment = False)

@artidoro
Copy link
Contributor

    /// <param name="useFeatureWeightFilter">Use features weight to pre-filter features.</param>

Same here.


Refers to: src/Microsoft.ML.Transforms/PermutationFeatureImportanceExtensions.cs:128 in 9a5d750. [](commit_id = 9a5d750, deletion_comment = False)

@artidoro artidoro self-requested a review December 12, 2018 19:53
/// y = 10x1 + 10x2vBuff + 30x3 + e.
/// Within xBuff feature 2nd slot will be sparse most of the time.
/// 2nd slot of xBuff has the least importance: Evaluation metrics do not change a lot when this slot is permuted.
/// x2 has the biggest importance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean x3?

@rogancarr rogancarr merged commit 4ed009b into dotnet:master Dec 12, 2018
@rogancarr rogancarr deleted the 1771_PFIForMostTasks branch December 12, 2018 20:33
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants