Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Recursive Feature Elimination Selector component #3934

Merged
merged 19 commits into from
Jan 20, 2023

Conversation

thehomebrewnerd
Copy link
Contributor

Add Recursive Feature Elimination Selector component

Adds a new feature selector component that uses scikit-learn's RFECV to perform the selection.

@thehomebrewnerd thehomebrewnerd self-assigned this Jan 17, 2023
@thehomebrewnerd thehomebrewnerd marked this pull request as draft January 17, 2023 20:23
@codecov
Copy link

codecov bot commented Jan 17, 2023

Codecov Report

Merging #3934 (a571e12) into main (7bf795d) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3934     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        346     347      +1     
  Lines      36719   36755     +36     
=======================================
+ Hits       36589   36625     +36     
  Misses       130     130             
Impacted Files Coverage Δ
evalml/pipelines/components/__init__.py 100.0% <ø> (ø)
...alml/pipelines/components/transformers/__init__.py 100.0% <ø> (ø)
...eature_selection/rf_classifier_feature_selector.py 100.0% <ø> (ø)
...feature_selection/rf_regressor_feature_selector.py 100.0% <ø> (ø)
evalml/tests/component_tests/test_utils.py 99.1% <ø> (ø)
...ponents/transformers/feature_selection/__init__.py 100.0% <100.0%> (ø)
...election/recursive_feature_elimination_selector.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_components.py 99.0% <100.0%> (+0.1%) ⬆️
...ml/tests/component_tests/test_feature_selectors.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Comment on lines 17 to 23
hyperparameter_ranges = {
"perc": Integer(0, 100),
}
"""{
"percent_features": Real(0.01, 1),
"threshold": ["mean", "median"],
}"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want min_features_to_select and step here but not positive on how this section is mean to be set up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch. I overlooked that, but will take a closer look and update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set up step between 0 and 1 so it selects a eliminates a percentage amount (so we can generalize more easily than an integer) but I don't think we need to tune min_features_to_select since RFECV is supposed to find the optimal amount of features. I think a default min_features_to_select=1 is fine unless there's a reason we want RFE to keep more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to have step between 0.05 and 0.25 and changed the default for min_features_to_select to 1.

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need to fix up the hyperparameter ranges but LGTM otherwise!

"""Selects relevant features using recursive feature elimination."""

hyperparameter_ranges = {
"perc": Integer(0, 100),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perc isn't used anywhere right? This should be updated with step and other parameters we want to tune.

Defaults to None.
n_estimators (float): The number of trees in the forest. Defaults to 100.
max_depth (int): Maximum tree depth for base learners. Defaults to 6.
If both percent_features and number_features are specified, take the greater number of features. Defaults to None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix 😄

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thehomebrewnerd thehomebrewnerd merged commit 078712a into main Jan 20, 2023
@thehomebrewnerd thehomebrewnerd deleted the rfecv-selector branch January 20, 2023 15:11
@chukarsten chukarsten mentioned this pull request Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants