Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing tqdm in AutoMLSearch with a logger #921

Merged
merged 7 commits into from Jul 10, 2020

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Jul 9, 2020

Pull Request Description

Fixes #761 (which also fixes #629 fixes #245 fixes #609 fixes #693). This PR replaces tqdm with the evalml logger, displays the mean primary objective score for each pipeline, and adds the score on each cv fold to the debug log.

Demo

logging_demo_with_progress

This is what the log would look like:
image


After creating the pull request: in order to pass the changelog_updated check you will need to update the "Future Release" section of docs/source/changelog.rst to include this pull request by adding :pr:123.

@freddyaboulton freddyaboulton self-assigned this Jul 9, 2020
@codecov
Copy link

codecov bot commented Jul 9, 2020

Codecov Report

Merging #921 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #921   +/-   ##
=======================================
  Coverage   99.83%   99.83%           
=======================================
  Files         168      168           
  Lines        8348     8366   +18     
=======================================
+ Hits         8334     8352   +18     
  Misses         14       14           
Impacted Files Coverage Δ
evalml/automl/automl_search.py 98.78% <100.00%> (-0.01%) ⬇️
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)
.../automl_tests/test_automl_search_classification.py 100.00% <100.00%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.00% <100.00%> (ø)
evalml/utils/logger.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f49eb6...b5ccaea. Read the comment docs.

@freddyaboulton freddyaboulton marked this pull request as ready for review Jul 9, 2020
@@ -265,6 +265,8 @@ def _dummy_callback(param1, param2):
}

automl = AutoMLSearch(**search_params)
mock_score.return_value = {automl.objective.name: 1.0}
Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not specify return values, the logger will try to use the MagicMock implementations of __format__ and __round__ which are not always supported depending on which version of python 3 you are running.

Copy link
Collaborator

@dsherry dsherry Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. How was this working before?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the score was previously not being used in the tests I modified - so the return type of MagicMock went unnoticed. But now we always log the scores.

@@ -18,7 +18,7 @@ def get_logger(name):

logger.addHandler(stdout_handler)
logger.addHandler(log_handler)
logger.setLevel(logging.INFO)
logger.setLevel(logging.DEBUG)
Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that debug-level messages actually get written to the log file.

Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Thanks.

@property
def time_elapsed(self):
"""How much time has elapsed since the search started."""
return tqdm.std.tqdm.format_interval(time.time() - self.start_time)
Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only place we use tqdm now. If we want to remove it from core-requirements.txt, we can recreate the implementation of format_interval in the ProgressMonitor.

Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. As long as you're sure leaving this in won't cause more console logging bugs, I'm on board with this. It shouldn't be too hard to write our own elapsed time str formatting logic should we choose to do so, right?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will cause any more logging bugs. In the event we don't want to install tqdm anymore, implementing our own formatter should be easy.

Copy link
Collaborator

@dsherry dsherry left a comment

I like the new look!! I left one comment about whether or not we need a class for this. Will approve once we resolve that.

@@ -18,7 +18,7 @@ def get_logger(name):

logger.addHandler(stdout_handler)
logger.addHandler(log_handler)
logger.setLevel(logging.INFO)
logger.setLevel(logging.DEBUG)
Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Thanks.

@@ -12,6 +12,7 @@ Changelog
* Moved `list_model_families` to `evalml.model_family.utils` :pr:`903`
* Updated `all_pipelines`, `all_estimators`, `all_components` to use the same mechanism for dynamically generating their elements :pr:`898`
* Rename `master` branch to `main` :pr:`918`
* Replace tqdm progress bar with logger in AutoMLSearch :pr:`921`
Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Can you also mention that we updated the search output? Something like "Updated AutoMLSearch.search stdout output and logging and removed tqdm progress bar"

@property
def time_elapsed(self):
"""How much time has elapsed since the search started."""
return tqdm.std.tqdm.format_interval(time.time() - self.start_time)
Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. As long as you're sure leaving this in won't cause more console logging bugs, I'm on board with this. It shouldn't be too hard to write our own elapsed time str formatting logic should we choose to do so, right?

self.current_iteration += 1
else:
self.logger.info(self.output_format.format(pipeline_name=pipeline_name,
time_elapsed=self.time_elapsed))
Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion

format_params = {'pipeline_name': pipeline_name, 'time_elapsed': self.time_elapsed}
if self.max_pipelines:
    format_params.update({'max_pipelines': self.max_pipelines, 'current_iteration': self.current_iteration})
self.logger.info(self.output_format.format(**format_params))
self.current_iteration += 1

"""How much time has elapsed since the search started."""
return tqdm.std.tqdm.format_interval(time.time() - self.start_time)

def update(self, pipeline_name):
Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update_pipeline?

if len(desc) > self._MAX_NAME_LEN:
desc = desc[:self._MAX_NAME_LEN - 3] + "..."
desc = desc.ljust(self._MAX_NAME_LEN)
pbar.set_description_str(desc=desc, refresh=True)
progress_monitor.update(desc)
Copy link
Collaborator

@dsherry dsherry Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton I'm not convinced we need the complexity of adding a separate progress monitor class in order to do this. We're only calling it in two places and there's not much state we need to hold. Do you think we could convert the time_elapsed and update to util methods for now, either in evalml/utils/logger.py or evalml/utils/gen_utils.py?

@freddyaboulton
Copy link
Contributor Author

freddyaboulton commented Jul 10, 2020

@dsherry This is good for a second review now! I deleted ProgressMonitor and added its methods to logger/utils.py

@freddyaboulton freddyaboulton requested a review from dsherry Jul 10, 2020
@freddyaboulton freddyaboulton force-pushed the 761-replace-tqdm-with-logger branch from f7e7192 to 55bd78c Compare Jul 10, 2020

evaluation_results = self._evaluate(pipeline, X, y, raise_errors=raise_errors, pbar=pbar)
update_pipeline(logger, desc, self._current_iteration, self.max_pipelines, self._start)
self._current_iteration += 1
Copy link
Collaborator

@dsherry dsherry Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pipeline number? Can you use len(self._results['pipeline_results']) to get this info instead of adding another variable?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip!

best_pipeline = self.rankings.iloc[0]
best_pipeline_name = best_pipeline["pipeline_name"]
logger.info(f"Best pipeline: {best_pipeline_name}")
logger.info(f"Best pipeline {self.objective.name}: {best_pipeline['score']:3f}")
Copy link
Collaborator

@dsherry dsherry Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to log out params too. Can add that later

Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

scores = cv_pipeline.score(X_test, y_test, objectives=objectives_to_score)
logger.debug(f"\t\t\tFold {i}: {self.objective.name} score: {scores[self.objective.name]:.3f}")
Copy link
Collaborator

@dsherry dsherry Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These logs are great!

scores = cv_pipeline.score(X_test, y_test, objectives=objectives_to_score)
logger.debug(f"\t\t\tFold {i}: {self.objective.name} score: {scores[self.objective.name]:.3f}")
score = scores[self.objective.name]
except Exception as e:
logger.error("Exception during automl search: {}".format(str(e)))
if raise_errors:
raise e
Copy link
Collaborator

@dsherry dsherry Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're handling this separately in #813 yes?

Copy link
Contributor Author

@freddyaboulton freddyaboulton Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

start_time (int): Start time.

Returns:
None - logs progress to logger at info level.
Copy link
Collaborator

@dsherry dsherry Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we're not surfacing these in the API docs. I think that's fine.

Nit-pick: idk if our docstring format works with "-" instead of ": "

Copy link
Collaborator

@dsherry dsherry left a comment

Thanks @freddyaboulton this rocks! I left a few comments including one about removing the self._current_pipeline var which would be nice to get to before merging

@freddyaboulton freddyaboulton force-pushed the 761-replace-tqdm-with-logger branch from 55bd78c to b5ccaea Compare Jul 10, 2020
@freddyaboulton freddyaboulton merged commit 3077671 into main Jul 10, 2020
2 checks passed
@freddyaboulton freddyaboulton deleted the 761-replace-tqdm-with-logger branch Jul 10, 2020
@dsherry dsherry mentioned this pull request Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment