Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CT-1685: Restore certain aspects of legacy logging behavior important… #6443

Merged
merged 2 commits into from
Dec 14, 2022

Conversation

peterallenwebb
Copy link
Contributor

@peterallenwebb peterallenwebb commented Dec 14, 2022

resolves #6437

Description

These changes restore the behavior described in the associated issue. The legacy (logbook) logger is now active by default, and the appropriate log level is applied to ensure the events are seen by dbt-rpc.

Checklist

@peterallenwebb peterallenwebb requested a review from a team as a code owner December 14, 2022 02:38
@cla-bot cla-bot bot added the cla:yes label Dec 14, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Comment on lines +113 to +116
# The default event manager will not log anything, but some tests run code that
# generates events, without configuring the event manager, so we create an empty
# manager here until there is a better testing strategy in place.
EVENT_MANAGER: EventManager = EventManager()
Copy link
Contributor

@jtcohen6 jtcohen6 Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a functional regression related to the fact that EVENT_MANAGER does not have any logger configured, prior to setup_event_logger being called.

If dbt runs into configuration errors during initialization, they are not surfaced to the user.

See #6434

Copy link
Contributor Author

@peterallenwebb peterallenwebb Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was worried about that. I'll address that today with a separate PR. There is a strange issue where file logging is not set up until further into the initialization process. Do you think it would be acceptable to set up only stdout logging until the log file path can be determined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work in dbt-rpc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO only stdout logging is acceptable for now. My sense is, we really only expect these logs to fire if something goes wrong during initialization. It's okay to have some hardcoded, generic, unconfigurable behavior that just ensures we're showing the error to the user.

We'll need to figure out the story for dbt-server, since I think the initial plan is to tail the logs/dbt.log file. If we think the longer-term answer is registering a custom callback, that might solve our problem.

As far as dbt-rpc, all we need is ongoing compatibility with the legacy logger for v1.4. The way I understand the code in this PR, if the ENABLE_LEGACY_LOGGER env var is enabled, we'll immediately flip on the legacy logger, without waiting for the call to setup_event_logger.

Copy link
Contributor Author

@peterallenwebb peterallenwebb Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 Your understanding is correct w.r.t. dbt-rpc

From speaking to @racheldaniel, I have a fairly good idea how dbt-server is using logging now. They do their own configuration of the EVENT_MANAGER before running tasks, so I feel confident that we have a good story for them.

I'm going to prepare a PR to address #6434, now. I believe that will restore all of the important logging behaviors we had in 1.3, and I have to apologize for not considering how important it was to get something set up before the call to setup_event_logger() is made.

With all that said, there are some interesting things to note about the file logging behavior of the legacy logger. I found earlier this week that it starts recording messages after it is set up, which is immediately dbt starts. It then writes the recorded messages to the log file (if any) when the correct file path has been determined, and stops recording. My hope is that in 1.5 we can find a way to start all logging, both console and file, immediately when dbt starts. Does that seem feasible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to apologize for not considering how important it was to get something set up before the call to setup_event_logger() is made.

I'm glad we caught it early! I think the failure here was lack of sufficient inline code comment around the event system initialization, to scare you away from what was a perfectly sensible refactor. "We need to configure these defaults here, even though it looks like we shouldn't, because some log events might arrive before we have the real inputs for proper configuration — namely, if we encounter an early exception."

I found earlier this week that it starts recording messages after it is set up, which is immediately dbt starts. It then writes the recorded messages to the log file (if any) when the correct file path has been determined, and stops recording.

IIRC, it does this by storing logs in a "buffer," and then "replaying" the buffer once the proper log path has been determined. A little too magical for my taste buds, but this might be an acceptable pattern to emulate.

dbt-core/core/dbt/logger.py

Lines 417 to 421 in 5e4e917

def _replay_buffered(self):
assert self._msg_buffer is not None, "_msg_buffer should never be None in _replay_buffered"
for record in self._msg_buffer:
super().emit(record)
self._msg_buffer = None

My hope is that in 1.5 we can find a way to start all logging, both console and file, immediately when dbt starts. Does that seem feasible?

Maybe! We could roll our own similar version of a stateful "buffer"? Or split log-path out of dbt_project.yml, so that we can know where we're logging without having to parse any config/files? (Feels related: #6207)

FWIW in the happy path, when config initialization succeeds (99% of the time), I don't think we're firing any log events before setup_event_logger is called.

@peterallenwebb peterallenwebb merged commit 5e4e917 into main Dec 14, 2022
@peterallenwebb peterallenwebb deleted the paw/ct-1658-legacy-logging-fixes branch December 14, 2022 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1658] Logging changes break dbt-rpc
3 participants