-
Notifications
You must be signed in to change notification settings - Fork 90
Replacing tqdm in AutoMLSearch with a logger #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #921 +/- ##
=======================================
Coverage 99.83% 99.83%
=======================================
Files 168 168
Lines 8348 8366 +18
=======================================
+ Hits 8334 8352 +18
Misses 14 14
Continue to review full report at Codecov.
|
@@ -265,6 +265,8 @@ def _dummy_callback(param1, param2): | |||
} | |||
|
|||
automl = AutoMLSearch(**search_params) | |||
mock_score.return_value = {automl.objective.name: 1.0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not specify return values, the logger will try to use the MagicMock implementations of __format__
and __round__
which are not always supported depending on which version of python 3 you are running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. How was this working before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the score was previously not being used in the tests I modified - so the return type of MagicMock went unnoticed. But now we always log the scores.
@@ -18,7 +18,7 @@ def get_logger(name): | |||
|
|||
logger.addHandler(stdout_handler) | |||
logger.addHandler(log_handler) | |||
logger.setLevel(logging.INFO) | |||
logger.setLevel(logging.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so that debug-level messages actually get written to the log file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Thanks.
evalml/automl/progress_monitor.py
Outdated
@property | ||
def time_elapsed(self): | ||
"""How much time has elapsed since the search started.""" | ||
return tqdm.std.tqdm.format_interval(time.time() - self.start_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only place we use tqdm now. If we want to remove it from core-requirements.txt
, we can recreate the implementation of format_interval
in the ProgressMonitor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. As long as you're sure leaving this in won't cause more console logging bugs, I'm on board with this. It shouldn't be too hard to write our own elapsed time str formatting logic should we choose to do so, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will cause any more logging bugs. In the event we don't want to install tqdm anymore, implementing our own formatter should be easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new look!! I left one comment about whether or not we need a class for this. Will approve once we resolve that.
@@ -18,7 +18,7 @@ def get_logger(name): | |||
|
|||
logger.addHandler(stdout_handler) | |||
logger.addHandler(log_handler) | |||
logger.setLevel(logging.INFO) | |||
logger.setLevel(logging.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Thanks.
docs/source/changelog.rst
Outdated
@@ -12,6 +12,7 @@ Changelog | |||
* Moved `list_model_families` to `evalml.model_family.utils` :pr:`903` | |||
* Updated `all_pipelines`, `all_estimators`, `all_components` to use the same mechanism for dynamically generating their elements :pr:`898` | |||
* Rename `master` branch to `main` :pr:`918` | |||
* Replace tqdm progress bar with logger in AutoMLSearch :pr:`921` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Can you also mention that we updated the search output? Something like "Updated AutoMLSearch.search stdout output and logging and removed tqdm progress bar"
evalml/automl/progress_monitor.py
Outdated
@property | ||
def time_elapsed(self): | ||
"""How much time has elapsed since the search started.""" | ||
return tqdm.std.tqdm.format_interval(time.time() - self.start_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. As long as you're sure leaving this in won't cause more console logging bugs, I'm on board with this. It shouldn't be too hard to write our own elapsed time str formatting logic should we choose to do so, right?
evalml/automl/progress_monitor.py
Outdated
self.current_iteration += 1 | ||
else: | ||
self.logger.info(self.output_format.format(pipeline_name=pipeline_name, | ||
time_elapsed=self.time_elapsed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion
format_params = {'pipeline_name': pipeline_name, 'time_elapsed': self.time_elapsed}
if self.max_pipelines:
format_params.update({'max_pipelines': self.max_pipelines, 'current_iteration': self.current_iteration})
self.logger.info(self.output_format.format(**format_params))
self.current_iteration += 1
evalml/automl/progress_monitor.py
Outdated
"""How much time has elapsed since the search started.""" | ||
return tqdm.std.tqdm.format_interval(time.time() - self.start_time) | ||
|
||
def update(self, pipeline_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe update_pipeline
?
evalml/automl/automl_search.py
Outdated
if len(desc) > self._MAX_NAME_LEN: | ||
desc = desc[:self._MAX_NAME_LEN - 3] + "..." | ||
desc = desc.ljust(self._MAX_NAME_LEN) | ||
pbar.set_description_str(desc=desc, refresh=True) | ||
progress_monitor.update(desc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton I'm not convinced we need the complexity of adding a separate progress monitor class in order to do this. We're only calling it in two places and there's not much state we need to hold. Do you think we could convert the time_elapsed
and update
to util methods for now, either in evalml/utils/logger.py
or evalml/utils/gen_utils.py
?
@dsherry This is good for a second review now! I deleted ProgressMonitor and added its methods to logger/utils.py |
f7e7192
to
55bd78c
Compare
evalml/automl/automl_search.py
Outdated
|
||
evaluation_results = self._evaluate(pipeline, X, y, raise_errors=raise_errors, pbar=pbar) | ||
update_pipeline(logger, desc, self._current_iteration, self.max_pipelines, self._start) | ||
self._current_iteration += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the pipeline number? Can you use len(self._results['pipeline_results'])
to get this info instead of adding another variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip!
best_pipeline = self.rankings.iloc[0] | ||
best_pipeline_name = best_pipeline["pipeline_name"] | ||
logger.info(f"Best pipeline: {best_pipeline_name}") | ||
logger.info(f"Best pipeline {self.objective.name}: {best_pipeline['score']:3f}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to log out params too. Can add that later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
scores = cv_pipeline.score(X_test, y_test, objectives=objectives_to_score) | ||
logger.debug(f"\t\t\tFold {i}: {self.objective.name} score: {scores[self.objective.name]:.3f}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logs are great!
scores = cv_pipeline.score(X_test, y_test, objectives=objectives_to_score) | ||
logger.debug(f"\t\t\tFold {i}: {self.objective.name} score: {scores[self.objective.name]:.3f}") | ||
score = scores[self.objective.name] | ||
except Exception as e: | ||
logger.error("Exception during automl search: {}".format(str(e))) | ||
if raise_errors: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're handling this separately in #813 yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
evalml/utils/logger.py
Outdated
start_time (int): Start time. | ||
|
||
Returns: | ||
None - logs progress to logger at info level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we're not surfacing these in the API docs. I think that's fine.
Nit-pick: idk if our docstring format works with "-
" instead of ":
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @freddyaboulton this rocks! I left a few comments including one about removing the self._current_pipeline
var which would be nice to get to before merging
55bd78c
to
b5ccaea
Compare
Pull Request Description
Fixes #761 (which also fixes #629 fixes #245 fixes #609 fixes #693). This PR replaces tqdm with the evalml logger, displays the mean primary objective score for each pipeline, and adds the score on each cv fold to the debug log.
Demo
This is what the log would look like:

After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of
docs/source/changelog.rst
to include this pull request by adding :pr:123
.