Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove filepath for logged message in log_error_callback #2429

Merged
merged 11 commits into from Jun 28, 2021
Merged

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Jun 23, 2021

Closes #2149

This PR adds a logger argument to the callbacks in callbacks.py and updates AutoMLSearch (via EngineBase) to accept this new logger. EngineBase will pass along the JobLogger, which stores messages but does not automatically log them.

The one exception is our raise_error_callback; since we don't print the log messages until the post_evaluation_callback and raising an exception would mean that we don't ever get to post_evaluation_callback, I updated raise_error_callback to write to the callback.py logger before raising. This may potentially still cause attempts to write to the same log file but I think is similar to the behavior for post_evaluation_callback? Open to feedback and suggestions on how else to handle this case, though :)

test_automl_error_callback ensures that our messages are still being logged, added new test to ensure we're passing in JobLogger instead of the original file logger.

@angela97lin angela97lin self-assigned this Jun 23, 2021
@codecov
Copy link

codecov bot commented Jun 23, 2021

Codecov Report

Merging #2429 (4348223) into main (5b43a17) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2429   +/-   ##
=====================================
  Coverage   99.7%   99.7%           
=====================================
  Files        283     283           
  Lines      25500   25500           
=====================================
  Hits       25400   25400           
  Misses       100     100           
Impacted Files Coverage Δ
evalml/automl/callbacks.py 100.0% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b43a17...4348223. Read the comment docs.

@angela97lin angela97lin changed the title Replace global logger with JobLogger for AutoML Add logger parameter to AutoML error callbacks and replace global logger with JobLogger for AutoML Jun 25, 2021
"""Raises the exception thrown by the AutoMLSearch object. Also logs the exception as an error."""
logger.error(f"AutoML search raised a fatal exception: {str(exception)}")
logger.error("\n".join(traceback))
logger.write_to_logger(callback_file_logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton @angela97lin if this happens on a remote worker, how does the log make it back to the main node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't think it would? I have two thoughts:

  1. Can we just raise an exception here rather than writing to log and raising? This would be simple to do but would mean the stacktrace is just reported to stdout and not in the log-file (if evalml was configured to a file) so this feels like a big breaking change.
  2. Can we change who is raising the exception? Like we can change raise_error_callback to just write to the JobLogger like it is here, then rather than raising an exception from train_and_score pipeline, we have it return the exception and JobLogger:
if automl_config.error_callback is not None:
    automl_config.error_callback(
        exception=e,
        traceback=traceback.format_tb(sys.exc_info()[2]),
        automl=automl_config,
        logger=logger,
        fold_num=i,
        pipeline=pipeline,
    )
if automl_config.error_callback == raise_error_callback:
    return  {
        "scores": {
        "cv_data": None,
        "training_time": None,
        "cv_scores": None,
        "cv_score_mean": None,
        },
        "pipeline": cv_pipeline,
        "logger": logger,
        "exception": exception

Then we modify the _post_evaluation_callback to check the new exception key of the dict, and raise if present. The benefit of doing the writing to log and raising during _post_evaluation_callback is that _post_evaluation_callback happens sequentially on the main node. So there's no race condition, and the file we're logging to will be present in the main node.

Possible downsides:

  • Execution does terminate immediately but after we return from train_and_score_pipeline. Since train_and_score_pipeline is essentially already over, this delay should be like milliseconds so I don't think it's a big deal
  • Do we restrict users' ability to define their own callback? I don't think so. They can still pass what they want but there's no guarantee everything will be captured in the log which I think we're ok with.

What do you all think?

Also, I think removing logger.handlers[1].baseFilename is an important enough change so if we can't reach an immediate consensus I'm happy punting on it and just getting that smaller change in!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of getting this PR in and having the additional discussion separately. If this resolves the issue it sets out to, I think we can accept it, as is and discuss an improvement afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughts @freddyaboulton! I agree that 1) seems less ideal--if an error is raised, it would be nice to have a log of the full stack trace to look at and in the case that stdout is not available, the trace is fully lost.

  1. is an interesting idea, though a little weird in nature too that our raise_error_callback / any callback a user defines that should raise an error doesn't unless AutoMLSearch allows it to. I'd love to discuss this further and if the change is larger than expected, push it out of this PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about this further and I'm not so sure I like this change: even if remote logs are accessible on the main node, all of the progress the JobLogger is holding onto will be written using the callback.py's logger which is a little odd. I'll back out this change and just fix the naming :d

):
X, y = X_y_binary
mock_score.return_value = {"Log Loss Binary": 0.8}
msg = "all your model are belong to us"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol do you guys even know where this comes from??

"""Raises the exception thrown by the AutoMLSearch object. Also logs the exception as an error."""
logger.error(f"AutoML search raised a fatal exception: {str(exception)}")
logger.error("\n".join(traceback))
logger.write_to_logger(callback_file_logger)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of getting this PR in and having the additional discussion separately. If this resolves the issue it sets out to, I think we can accept it, as is and discuss an improvement afterwards.

@angela97lin angela97lin changed the title Add logger parameter to AutoML error callbacks and replace global logger with JobLogger for AutoML Remove filepath for logged message in log_error_callback Jun 28, 2021
@angela97lin angela97lin merged commit 301c09f into main Jun 28, 2021
@angela97lin angela97lin deleted the 2149_logger branch June 28, 2021 21:02
@dsherry dsherry mentioned this pull request Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't use filepath in callback's logged message
4 participants