Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step: removing 5 features in each iteration #664

Closed
mubashirtuf opened this issue Dec 3, 2018 · 4 comments · Fixed by #679
Closed

step: removing 5 features in each iteration #664

mubashirtuf opened this issue Dec 3, 2018 · 4 comments · Fixed by #679
Assignees
Labels
priority: high should be done before next release type: bug something isn't working type: question more information is required

Comments

@mubashirtuf
Copy link

Using RFECV, the graph shows best features with one increment in each iteration.
I want to show each iteration should work with an increment of 5 features.
Example: score should be plotted against x-axis of [5,10,15,20,25].
I used step=5, but its not working.

@bbengfort bbengfort added type: question more information is required type: bug something isn't working priority: high should be done before next release labels Dec 29, 2018
@bbengfort
Copy link
Member

@mubashirtuf I'm sorry it's taken so long for us to respond; things got busy at the end of the semester and during the holiday. I can take a look at this, but it would be helpful if you could provide the code you used to generate the error - on my quick first attempt I couldn't reproduce the issue. I see a place where the problem might be occurring, but I'm not totally sure. Would you also upload the figure that was produced?

@bbengfort bbengfort self-assigned this Dec 29, 2018
@bbengfort bbengfort added the review PR is open label Dec 29, 2018
@bbengfort
Copy link
Member

@mubashirtuf ok, I figured out the issue!

Here is the example code I used to reproduce:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from yellowbrick.features import RFECV 

X, y = make_classification(
    n_samples=200, n_features=30, n_informative=18, n_redundant=6,
    n_repeated=0, n_classes=8, n_clusters_per_class=1, random_state=0
)

oz = RFECV(LogisticRegression(), step=5).fit(X, y)
oz.poof()

This resulted in the following image:

example_step_bug_rfecv

Although step is passed to the RFE estimator to eliminate that many features each time, the CV methods are still using a step of 1. I fixed this and now the output looks as follows:

example_stepped_rfecv

I've created a fix for this in #679 and it should be available in version 1.0 -- thank you for sending us the bug report!

bbengfort added a commit that referenced this issue Jan 11, 2019
The RFECV visualizer had a bug when the hyperparameter step > 1. The
step was correctly passed to the internal RFE estimator, which removed
that number of features per iteration, however the feature subsets that
were tried for cross-validation did not match the step resulting in a
figure that looked like no step was actually applied. This patch fixes
the bug and creates a test to ensure this works correctly.

To manage the feature selection subspace, a new learned attribute,
`n_feature_subsets_` was added.

Fixes #664
@bbengfort bbengfort removed the review PR is open label Jan 11, 2019
@bbengfort
Copy link
Member

@mubashirtuf this is fixed in the develop version of Yellowbrick and will be available in Yellowbrick v1.0 -- in the meantime you can pip install git+https://github.com/DistrictDataLabs/yellowbrick to get the latest feature. Thank you again for contributing to yellowbrick!

@mubashirtuf
Copy link
Author

Thanks for your help.
I have figured out this problem with some other technique. In the future, I will consider this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high should be done before next release type: bug something isn't working type: question more information is required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants