New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update add_results
in IterativeAlgorithm to take in entire pipeline results dict
#1891
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1891 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 263 263
Lines 21285 21300 +15
=========================================
+ Hits 21279 21294 +15
Misses 6 6
Continue to review full report at Codecov.
|
@@ -346,3 +350,25 @@ def test_iterative_algorithm_pipeline_params_kwargs(dummy_binary_pipeline_classe | |||
|
|||
next_batch = algo.next_batch() | |||
assert all([p.parameters['Mock Classifier'] == {"dummy_parameter": "dummy", "n_jobs": -1, "fake_param": "fake"} for p in next_batch]) | |||
|
|||
|
|||
def test_iterative_algorithm_results_best_pipeline_info_id(dummy_binary_pipeline_classes, logistic_regression_binary_pipeline_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blargh, I wonder if there's a better way to test this. Or if we need to, since this is just a private field for now anyways.
But currently, creating IterativeAlgo with two model family pipelines. Add results. Since only one round of results have been added, we know what's in _best_pipeline_info
has to be the two pipelines (with ids 0 and 1). :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these IDs are to be used for ensembling how about we add coverage for this in the ensembling unit tests (make sure our ensembler keeps track of these IDs)? Not in the loop on whether that has been implemented or not, so we if we have that feature already we can add the test there if not we can flag to remember to do so when we add that feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave this in for now, if you want, as a quick sanity check. I the only thing I would question is do you need all the other assertions? It seems like they just clutter what it is you're trying to test, which is that the id's get assigned properly to the _best_pipeline_info. I'd think there are other tests to make sure, for example, that next_batch increments the batch_number ala L367. Just my two cents though, don't bother if you like it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyliweishih @chukarsten Thank you for the suggestions and thoughts!
The IDs are for ensembling, but our ensembler won't be keeping track of it--we decided that since ids are more of a concept for automl and not an individual pipeline, so we need to test that the IterativeAlgorithm class is updating accordingly. This test was a little meh since it only did it for the first batch. Updated the test s.t. the score is always decreasing, so we always try to update _best_pipeline_info
, and asserted that the id is indeed always updating too.
Thank you for the suggestion to remove other asserts--they were definitely copy pasta asserts that we've just kept in each test but for the sake of this test, I think it looks much clearer without!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my end. IMO we should add coverage for the pipeline IDs in their intended use case (ensembling) especially since its a private field.
@@ -346,3 +350,25 @@ def test_iterative_algorithm_pipeline_params_kwargs(dummy_binary_pipeline_classe | |||
|
|||
next_batch = algo.next_batch() | |||
assert all([p.parameters['Mock Classifier'] == {"dummy_parameter": "dummy", "n_jobs": -1, "fake_param": "fake"} for p in next_batch]) | |||
|
|||
|
|||
def test_iterative_algorithm_results_best_pipeline_info_id(dummy_binary_pipeline_classes, logistic_regression_binary_pipeline_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these IDs are to be used for ensembling how about we add coverage for this in the ensembling unit tests (make sure our ensembler keeps track of these IDs)? Not in the loop on whether that has been implemented or not, so we if we have that feature already we can add the test there if not we can flag to remember to do so when we add that feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, quick PR. Love it.
@@ -346,3 +350,25 @@ def test_iterative_algorithm_pipeline_params_kwargs(dummy_binary_pipeline_classe | |||
|
|||
next_batch = algo.next_batch() | |||
assert all([p.parameters['Mock Classifier'] == {"dummy_parameter": "dummy", "n_jobs": -1, "fake_param": "fake"} for p in next_batch]) | |||
|
|||
|
|||
def test_iterative_algorithm_results_best_pipeline_info_id(dummy_binary_pipeline_classes, logistic_regression_binary_pipeline_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave this in for now, if you want, as a quick sanity check. I the only thing I would question is do you need all the other assertions? It seems like they just clutter what it is you're trying to test, which is that the id's get assigned properly to the _best_pipeline_info. I'd think there are other tests to make sure, for example, that next_batch increments the batch_number ala L367. Just my two cents though, don't bother if you like it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
@@ -3,7 +3,9 @@ Release Notes | |||
**Future Releases** | |||
* Enhancements | |||
* Fixes | |||
* Added metaclass for time series pipelines and fix binary classification pipeline ``predict`` not using objective if it is passed as a named argument :pr:`1874` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving, oops hehe. Thanks @freddyaboulton!
Closes #1873