Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Feature Selection #819

Merged
merged 8 commits into from May 28, 2020
Merged

Remove Feature Selection #819

merged 8 commits into from May 28, 2020

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented May 28, 2020

Closes #808

@eccabay eccabay requested a review from dsherry May 28, 2020
@@ -150,7 +150,7 @@ def test_plot_iterations_max_time(X_y):
go = pytest.importorskip('plotly.graph_objects', reason='Skipping plotting test because plotly not installed')
X, y = X_y

automl = AutoRegressionSearch(max_time=10)
automl = AutoRegressionSearch(max_time=10, random_state=1)
Copy link
Collaborator

@dsherry dsherry May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay was this the test you mentioned, which was taking ~10min to run before setting random_state=1?

Copy link
Collaborator

@dsherry dsherry May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be shocked if the reason this test were taking so long were related to the removal of feature selection from the pipelines. Seems ok to me. Plus, we'll be addressing these sorts of issues in the perf tests.

@codecov
Copy link

codecov bot commented May 28, 2020

Codecov Report

Merging #819 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #819   +/-   ##
=======================================
  Coverage   99.52%   99.52%           
=======================================
  Files         159      160    +1     
  Lines        6306     6333   +27     
=======================================
+ Hits         6276     6303   +27     
  Misses         30       30           
Impacted Files Coverage Δ
...l/pipelines/classification/random_forest_binary.py 100.00% <100.00%> (ø)
...pelines/classification/random_forest_multiclass.py 100.00% <100.00%> (ø)
evalml/pipelines/classification/xgboost_binary.py 100.00% <100.00%> (ø)
...lml/pipelines/classification/xgboost_multiclass.py 100.00% <100.00%> (ø)
evalml/pipelines/regression/random_forest.py 100.00% <100.00%> (ø)
evalml/pipelines/regression/xgboost_regression.py 100.00% <100.00%> (ø)
.../tests/automl_tests/test_auto_regression_search.py 100.00% <100.00%> (ø)
evalml/tests/automl_tests/test_autobase.py 100.00% <100.00%> (ø)
...ml/tests/component_tests/test_feature_selectors.py 100.00% <100.00%> (ø)
...ine_tests/classification_pipeline_tests/test_rf.py 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fdae537...97e955d. Read the comment docs.

Copy link
Collaborator

@dsherry dsherry left a comment

@eccabay LGTM! I don't have anything to add. Thanks for getting this up so quickly.

proposed_parameters = start_iteration_callback.call_args[0][1]
assert proposed_parameters.keys() == {'RF Classifier Select From Model', 'Logistic Regression Classifier'}
assert proposed_parameters['RF Classifier Select From Model']['number_features'] == X.shape[1]
assert proposed_parameters['RF Classifier Select From Model']['n_jobs'] == -1
Copy link
Collaborator

@dsherry dsherry May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay here's the new test I just pushed. Now that we don't define pipelines which use feature selectors, but they still need special treatment in AutoSearchBase for number_features and n_jobs, this test covers the use of a feature selector in a pipeline used in automl

Steps

  • Mock pipelines' fit to a no-op and score to a constant value, to save runtime
  • Define a custom pipeline which uses a feature selector. Mock get_pipelines util method to return only that, meaning it'll be used in automl
  • Set the start_iteration_callback to a MagicMock so we can see the call args
  • Run automl and check that number_features and n_jobs have been called with the expected values

@eccabay eccabay merged commit 5ff1399 into master May 28, 2020
2 checks passed
This was referenced May 29, 2020
@dsherry dsherry deleted the 808_remove_feature_selection branch Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove feature selection from pipelines
2 participants