Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShapRFECV min number of features to keep per feature group #75

Open
Matgrb opened this issue Feb 19, 2021 · 4 comments
Open

ShapRFECV min number of features to keep per feature group #75

Matgrb opened this issue Feb 19, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@Matgrb
Copy link
Collaborator

Matgrb commented Feb 19, 2021

Problem Description
There could be a grouping of features e.g. demographic, siocial etc. In that case you might want to end up with a combination of features from each group, at the end of the feature elimination process.

Desired Outcome
We can add two parameters:

  • feature_grouping - groups to which each feature belongs to
  • min_num_features_per_group - minimum number of features in each group at any point in the feature elimination process.

Thanks to this, even if one type of features is not predictive, at least a couple of them will be kept.
This way the model can be explained better to the business in terms of different characteristics of the sample.

@Matgrb Matgrb added the enhancement New feature or request label Feb 19, 2021
@timvink
Copy link
Collaborator

timvink commented Feb 19, 2021

The more parameters we add to a class, the more intimidating it becomes for a new user to learn it. I prefer a simple knife as a tool. My point:

image

If a user really wants this, they could write some code that runs ShapRFECV on the different groups, then combine the best n features from each group, then run it one more time on the combined featureset, and select your final featureset based on the results.

What I'm getting at is also called the Unix philosophy:

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features".

So, even though I see the added value of the feature, I would advocate against it.

@Matgrb
Copy link
Collaborator Author

Matgrb commented Feb 19, 2021

I agree with the fact that this would complicate the API. That is why a way to implement it could be make a class that inherits from ShapRFECV e.g. GroupedShapRFECV. This way there would be a very specific use case for using the other class. The main difference between the two would be extra parameters in the init and different way of selecting which features to remove at a given iteration.

What do you think about this approach? Does the use case argument implementation of an extra class just for this puprose?

I had a similar idea of tackling the issue, of just running ShapRFECV separately for each group. The main drawback of that we don't take into consideration the predictive power of relations between features in different groups.

@timvink
Copy link
Collaborator

timvink commented Feb 19, 2021

A separate class for this makes sense. However, does the problem of having to keep at least one feature for every group occur often enough to outweigh the costs of having to maintain more code? (I'm trying to push for keeping things as simple as possible ;) )

@timvink
Copy link
Collaborator

timvink commented Feb 19, 2021

After offline discussion, I understand there are teams that could benefit from this use-case. Separating it from ShapRFECV, I think we can manage to keep the most common use-case as simple as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants