Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update add_results in IterativeAlgorithm to take in entire pipeline results dict #1891

Merged
merged 8 commits into from Feb 25, 2021

Conversation

angela97lin
Copy link
Contributor

Closes #1873

@angela97lin angela97lin self-assigned this Feb 24, 2021
@codecov
Copy link

codecov bot commented Feb 24, 2021

Codecov Report

Merging #1891 (2640d5e) into main (f23f16b) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1891     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         263      263             
  Lines       21285    21300     +15     
=========================================
+ Hits        21279    21294     +15     
  Misses          6        6             
Impacted Files Coverage Δ
evalml/automl/automl_algorithm/automl_algorithm.py 100.0% <100.0%> (ø)
...lml/automl/automl_algorithm/iterative_algorithm.py 100.0% <100.0%> (ø)
evalml/automl/automl_search.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl_algorithm.py 100.0% <100.0%> (ø)
...lml/tests/automl_tests/test_iterative_algorithm.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f23f16b...2640d5e. Read the comment docs.

@@ -346,3 +350,25 @@ def test_iterative_algorithm_pipeline_params_kwargs(dummy_binary_pipeline_classe

next_batch = algo.next_batch()
assert all([p.parameters['Mock Classifier'] == {"dummy_parameter": "dummy", "n_jobs": -1, "fake_param": "fake"} for p in next_batch])


def test_iterative_algorithm_results_best_pipeline_info_id(dummy_binary_pipeline_classes, logistic_regression_binary_pipeline_class):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blargh, I wonder if there's a better way to test this. Or if we need to, since this is just a private field for now anyways.
But currently, creating IterativeAlgo with two model family pipelines. Add results. Since only one round of results have been added, we know what's in _best_pipeline_info has to be the two pipelines (with ids 0 and 1). :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these IDs are to be used for ensembling how about we add coverage for this in the ensembling unit tests (make sure our ensembler keeps track of these IDs)? Not in the loop on whether that has been implemented or not, so we if we have that feature already we can add the test there if not we can flag to remember to do so when we add that feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave this in for now, if you want, as a quick sanity check. I the only thing I would question is do you need all the other assertions? It seems like they just clutter what it is you're trying to test, which is that the id's get assigned properly to the _best_pipeline_info. I'd think there are other tests to make sure, for example, that next_batch increments the batch_number ala L367. Just my two cents though, don't bother if you like it better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyliweishih @chukarsten Thank you for the suggestions and thoughts!

The IDs are for ensembling, but our ensembler won't be keeping track of it--we decided that since ids are more of a concept for automl and not an individual pipeline, so we need to test that the IterativeAlgorithm class is updating accordingly. This test was a little meh since it only did it for the first batch. Updated the test s.t. the score is always decreasing, so we always try to update _best_pipeline_info, and asserted that the id is indeed always updating too.

Thank you for the suggestion to remove other asserts--they were definitely copy pasta asserts that we've just kept in each test but for the sake of this test, I think it looks much clearer without!

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my end. IMO we should add coverage for the pipeline IDs in their intended use case (ensembling) especially since its a private field.

@@ -346,3 +350,25 @@ def test_iterative_algorithm_pipeline_params_kwargs(dummy_binary_pipeline_classe

next_batch = algo.next_batch()
assert all([p.parameters['Mock Classifier'] == {"dummy_parameter": "dummy", "n_jobs": -1, "fake_param": "fake"} for p in next_batch])


def test_iterative_algorithm_results_best_pipeline_info_id(dummy_binary_pipeline_classes, logistic_regression_binary_pipeline_class):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these IDs are to be used for ensembling how about we add coverage for this in the ensembling unit tests (make sure our ensembler keeps track of these IDs)? Not in the loop on whether that has been implemented or not, so we if we have that feature already we can add the test there if not we can flag to remember to do so when we add that feature.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, quick PR. Love it.

@@ -346,3 +350,25 @@ def test_iterative_algorithm_pipeline_params_kwargs(dummy_binary_pipeline_classe

next_batch = algo.next_batch()
assert all([p.parameters['Mock Classifier'] == {"dummy_parameter": "dummy", "n_jobs": -1, "fake_param": "fake"} for p in next_batch])


def test_iterative_algorithm_results_best_pipeline_info_id(dummy_binary_pipeline_classes, logistic_regression_binary_pipeline_class):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave this in for now, if you want, as a quick sanity check. I the only thing I would question is do you need all the other assertions? It seems like they just clutter what it is you're trying to test, which is that the id's get assigned properly to the _best_pipeline_info. I'd think there are other tests to make sure, for example, that next_batch increments the batch_number ala L367. Just my two cents though, don't bother if you like it better.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@@ -3,7 +3,9 @@ Release Notes
**Future Releases**
* Enhancements
* Fixes
* Added metaclass for time series pipelines and fix binary classification pipeline ``predict`` not using objective if it is passed as a named argument :pr:`1874`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving, oops hehe. Thanks @freddyaboulton!

@angela97lin angela97lin merged commit 7b400c8 into main Feb 25, 2021
@angela97lin angela97lin deleted the 1873_dict branch February 25, 2021 19:31
@angela97lin angela97lin added this to the Sprint 2021 Feb B milestone Mar 4, 2021
@dsherry dsherry mentioned this pull request Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update add_results in IterativeAlgorithm to take in entire pipeline results dict rather than just pipeline
4 participants