step: removing 5 features in each iteration #664

mubashirtuf · 2018-12-03T12:02:44Z

Using RFECV, the graph shows best features with one increment in each iteration.
I want to show each iteration should work with an increment of 5 features.
Example: score should be plotted against x-axis of [5,10,15,20,25].
I used step=5, but its not working.

bbengfort · 2018-12-29T14:44:21Z

@mubashirtuf I'm sorry it's taken so long for us to respond; things got busy at the end of the semester and during the holiday. I can take a look at this, but it would be helpful if you could provide the code you used to generate the error - on my quick first attempt I couldn't reproduce the issue. I see a place where the problem might be occurring, but I'm not totally sure. Would you also upload the figure that was produced?

bbengfort · 2018-12-29T21:55:24Z

@mubashirtuf ok, I figured out the issue!

Here is the example code I used to reproduce:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from yellowbrick.features import RFECV 

X, y = make_classification(
    n_samples=200, n_features=30, n_informative=18, n_redundant=6,
    n_repeated=0, n_classes=8, n_clusters_per_class=1, random_state=0
)

oz = RFECV(LogisticRegression(), step=5).fit(X, y)
oz.poof()

This resulted in the following image:

Although step is passed to the RFE estimator to eliminate that many features each time, the CV methods are still using a step of 1. I fixed this and now the output looks as follows:

I've created a fix for this in #679 and it should be available in version 1.0 -- thank you for sending us the bug report!

The RFECV visualizer had a bug when the hyperparameter step > 1. The step was correctly passed to the internal RFE estimator, which removed that number of features per iteration, however the feature subsets that were tried for cross-validation did not match the step resulting in a figure that looked like no step was actually applied. This patch fixes the bug and creates a test to ensure this works correctly. To manage the feature selection subspace, a new learned attribute, `n_feature_subsets_` was added. Fixes #664

bbengfort · 2019-01-11T16:06:42Z

@mubashirtuf this is fixed in the develop version of Yellowbrick and will be available in Yellowbrick v1.0 -- in the meantime you can pip install git+https://github.com/DistrictDataLabs/yellowbrick to get the latest feature. Thank you again for contributing to yellowbrick!

mubashirtuf · 2019-01-12T13:28:44Z

Thanks for your help.
I have figured out this problem with some other technique. In the future, I will consider this.

bbengfort added type: question more information is required type: bug something isn't working priority: high should be done before next release labels Dec 29, 2018

bbengfort mentioned this issue Dec 29, 2018

Bugfix: RFECV step display bug #679

Merged

bbengfort self-assigned this Dec 29, 2018

bbengfort added the review PR is open label Dec 29, 2018

bbengfort closed this as completed in #679 Jan 11, 2019

bbengfort removed the review PR is open label Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

step: removing 5 features in each iteration #664

step: removing 5 features in each iteration #664

mubashirtuf commented Dec 3, 2018

bbengfort commented Dec 29, 2018

bbengfort commented Dec 29, 2018

bbengfort commented Jan 11, 2019

mubashirtuf commented Jan 12, 2019

step: removing 5 features in each iteration #664

step: removing 5 features in each iteration #664

Comments

mubashirtuf commented Dec 3, 2018

bbengfort commented Dec 29, 2018

bbengfort commented Dec 29, 2018

bbengfort commented Jan 11, 2019

mubashirtuf commented Jan 12, 2019