Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-662] [Feature] Allow the configuration of the timezone for timestamps in event logging #5267

Open
1 task done
Stefan-Rann opened this issue May 18, 2022 · 17 comments
Open
1 task done
Labels
enhancement New feature or request logging

Comments

@Stefan-Rann
Copy link

Is this your first time opening an issue?

Describe the Feature

Currently the timestamp in event logging is always output in utc.
It would be helpful to be able to set the time zone via a configuration or parameter (either local or any other time zone).

Describe alternatives you've considered

I have not found any alternatives. One has to live with utc.

Who will this benefit?

This would make it easier to evaluate the logs and help to avoid confusion.
Additionally the evaluation of dbt logs in combination with other logs using lokal or any other timezone would be easier.

Are you interested in contributing this feature?

No response

Anything else?

No response

@Stefan-Rann Stefan-Rann added enhancement New feature or request triage labels May 18, 2022
@github-actions github-actions bot changed the title [Feature] Allow the configuration of the timezone for timestamps in event logging [CT-662] [Feature] Allow the configuration of the timezone for timestamps in event logging May 18, 2022
@gshank
Copy link
Contributor

gshank commented May 20, 2022

This sounds like a reasonable request. We're going to be looking at logging updates later in this quarter. I'm not sure if it would get onto the schedule or not, but we can take a look.

@gshank gshank removed the triage label May 20, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 17, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@jtcohen6
Copy link
Contributor

Reopening because we just got a duplicate opened in #6319

Things I'm curious about: How would you want to configure this? Should there only be two options (local system time versus UTC)?

@jtcohen6 jtcohen6 reopened this Nov 28, 2022
@jtcohen6 jtcohen6 removed the stale Issues that have gone stale label Nov 28, 2022
@ahuiiiiiii
Copy link

Reopening because we just got a duplicate opened in #6319

Things I'm curious about: How would you want to configure this? Should there only be two options (local system time versus UTC)?

can i add an option that set zonetime in dbt-core project configuration file?

@ahuiiiiiii
Copy link

sorry spelling error above should be timezone

@Stefan-Rann
Copy link
Author

can i add an option that set zonetime in dbt-core project configuration file?

Yes, a project level setting would be cool to set the timezone by name.
Additionally the option "local" would be great to use the system timezone without the need to explicitly define the timezone.

@Gunnnn
Copy link

Gunnnn commented Dec 31, 2022

Please! I see timestamps in log in UTC and its annoying =(

@Frunkh
Copy link

Frunkh commented Apr 6, 2023

Atleast change
record.extra[self.name] = datetime.utcnow().isoformat()
to
record.extra[self.name] = datetime.now().isoformat()

in logger.py
So it respects the timezone of the system.. this is really annoying.

@sgaraycoa-de
Copy link

Adding my vote in favor of this issue

@jtcohen6
Copy link
Contributor

Putting aside, for the moment, the question of creating a standalone configuration for this — I'd be supportive of switching from datetime.utcnow() to datetime.now(), so that the user's local system timezone is what's displayed in the logs. My sense is that most remotely-running orchestration systems would be in UTC, so it wouldn't be a change there.

Timestamp Doug (@dbeatty10) - does that proposed switch give you any trepidation?

@dbeatty10
Copy link
Contributor

Main trepidation

For both datetime.utcnow() and datetime.now(), the Python datetimes are naive rather than aware [1][2] so the UTC offset is not explicit in either case.

Consequences

With the current dbt log output standardized to UTC, it's still comparable to other dbt logs (as long as you know it's UTC). But most human beings think about time in their local time zone.

When logs from dbt (or other systems) use naive timestamps that are localized, then comparing them really becomes quite the forensic exercise 😰

Possible solution

A possible solution is to use aware datetimes and print out the UTC offset. The trade-off is that this costs 6 extra characters in the log output.

$ dbt run
07:42:18-06:00  Running with dbt=1.4.4
============================== 2023-04-18 07:42:18.050150-06:00 | 07e7091e-7086-4fb5-9cab-643f3579e090 ==============================
07:42:18.050150-06:00 [info ] [MainThread]: Running with dbt=1.4.4

Python examples

from datetime import timezone, datetime as dt
Python Output UTC Offset
dt.utcnow().isoformat() 2023-04-18T13:42:18.050150 Naive N/A
dt.now().isoformat() 2023-04-18T07:42:18.050150 Naive N/A
dt.now().astimezone().isoformat() 2023-04-18T07:42:18.050150-06:00 Aware System time zone
dt.now(timezone.utc).isoformat() 2023-04-18T13:42:18.050150+00:00 Aware UTC

@nevdelap
Copy link
Contributor

nevdelap commented Jan 31, 2024

Just having it log in the system's timezone would be great. I'm lucky enough to be at UTC+10 so it is only mildly annoying, but still strange that it would be that way.

I've created this workaround, which also adds an elapsed time to the output. Someone might find it useful. 😉

@dbeatty10
Copy link
Contributor

I'd propose two new configurations to handle this use case:

  1. log_timestamp_tz_source: utc (default) | system
  2. log_timestamp_print_offset: false (default) | true

*I didn't put a ton of thought into the names, so they could be shorted during design refinement as-needed. For example, maybe log_timestamp_tz_offset_print would be better, etc. Or maybe these two for brevity:

  • log_tz_source
  • log_tz_offset_print

Example logic

ts.py

from datetime import timezone, datetime as dt


OPTIONS_LOG_TIMESTAMP_TZ_SOURCE = ["utc", "system"]
OPTIONS_LOG_TIMESTAMP_PRINT_OFFSET = [False, True]


def get_now_aware_utc_tz():
    return dt.now(timezone.utc)


def get_now_aware_system_tz():
    return dt.now().astimezone()


def get_now_formatted(log_timestamp_tz_source="utc", log_timestamp_print_offset=False):
    # default to UTC
    aware_timestamp = get_now_aware_utc_tz()

    # default to omitting the UTC offset
    format = "%H:%M:%S"

    # use the system timezone if specfied
    if log_timestamp_tz_source == "system":
        aware_timestamp = get_now_aware_system_tz()

    # include the UTC offset if specfied
    if log_timestamp_print_offset:
        format = "%H:%M:%S %z"

    return aware_timestamp.strftime(format)


for log_timestamp_tz_source in OPTIONS_LOG_TIMESTAMP_TZ_SOURCE:
    for log_timestamp_print_offset in OPTIONS_LOG_TIMESTAMP_PRINT_OFFSET:
        now_formatted = get_now_formatted(log_timestamp_tz_source, log_timestamp_print_offset)
        print(f"\nlog_timestamp_tz_source: {log_timestamp_tz_source}; log_timestamp_print_offset: {log_timestamp_print_offset}")
        print(now_formatted)

Run it:

python ts.py

Example output:

$ python ts.py

log_timestamp_tz_source: utc; log_timestamp_print_offset: False
19:58:19

log_timestamp_tz_source: utc; log_timestamp_print_offset: True
19:58:19 +0000

log_timestamp_tz_source: system; log_timestamp_print_offset: False
12:58:19

log_timestamp_tz_source: system; log_timestamp_print_offset: True
12:58:19 -0700

Flags / global configs

CLI flags:

  • --log-timestamp-tz-source
  • --no-log-timestamp-print-offset / --log-timestamp-print-offset

environment variables:

  • LOG_TIMESTAMP_TZ_SOURCE
  • LOG_TIMESTAMP_PRINT_OFFSET

dbt_project.yml flags:

  • log_timestamp_tz_source
  • log_timestamp_print_offset

Usage

CLI flags:

dbt compile --log-timestamp-tz-source system
dbt compile --log-timestamp-tz-source utc
dbt compile --no-log-timestamp-print-offset
dbt compile --log-timestamp-print-offset

environment variables:

export LOG_TIMESTAMP_TZ_SOURCE=system
export LOG_TIMESTAMP_TZ_SOURCE=utc
export LOG_TIMESTAMP_PRINT_OFFSET=1
export LOG_TIMESTAMP_PRINT_OFFSET=true
export LOG_TIMESTAMP_PRINT_OFFSET=T
export LOG_TIMESTAMP_PRINT_OFFSET=0
export LOG_TIMESTAMP_PRINT_OFFSET=false
export LOG_TIMESTAMP_PRINT_OFFSET=F

dbt_project.yml flags:

flags:
  log_timestamp_tz_source: system
  log_timestamp_tz_source: utc
  log_timestamp_print_offset: true
  log_timestamp_print_offset: false

Rationale

The defaults would preserve backwards-compatibility.

The first config would allow using the system timezone wherever dbt is running from (whether it is a user's machine or a container orchestrated somewhere remotely).

The second config would allow the timestamps to be interpreted in an aware fashion rather than forcing either a naive interpretation or assuming UTC.

Rejected idea

One thing I considered and decided against:

  • a log_timestamp_format config that would accept a format that can be passed to the datetime.strftime method.

The reason is that logs display a different level of granularity when emitted to the CLI versus the file logs. The first does not include fractional seconds, whereas the second does.

Also there is a different format when --log-format json vs. not. The former always emits a ISO8601-formatted string that ends in Z whereas the latter does not.

Risk reduction

If you are certain that your orchestration environment system time zone is set to UTC (and won't ever change!) or that your log ingestion can interpret localized timestamps properly (somehow), then you can set these flags within dbt_project.yml.

But to reduce risk of a non-UTC orchestration environment messing up log ingestion and parsing by a tool like Datadog, etc., you could configure these via environment variable in your local environment rather than using flags within dbt_project.yml:

export DBT_LOG_TIMESTAMP_TZ_SOURCE=system
export LOG_TIMESTAMP_PRINT_OFFSET=0

@jim256
Copy link

jim256 commented Mar 26, 2024

I'd also really like to see two options (local system time versus UTC) rather than always defaulting to UTC only.

@PORXavierM
Copy link

As somebody find solution for this? is a problem from python or dbt himself?

@dbeatty10
Copy link
Contributor

@PORXavierM the timestamps are coming from the logging of dbt.

There is some of amount of configurability of the timestamps via the LOG_FORMAT and LOG_FORMAT_FILE configs. Here's a brief summary:

Log format Time zone Aware vs. naive Precision
text UTC naive timestamp second
debug system local time zone naive timestamp microsecond
json UTC aware timestamp microsecond

There's some more nitty-gritty code details discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request logging
Projects
None yet
Development

No branches or pull requests