Skip to content

Refactor Opensearch log formatter to use timezone.from_timestamp and export it in Task SDK#66856

Open
23tae wants to merge 5 commits into
apache:mainfrom
23tae:refactor-opensearch-log-formatter
Open

Refactor Opensearch log formatter to use timezone.from_timestamp and export it in Task SDK#66856
23tae wants to merge 5 commits into
apache:mainfrom
23tae:refactor-opensearch-log-formatter

Conversation

@23tae
Copy link
Copy Markdown

@23tae 23tae commented May 13, 2026

Description

This PR refactors the OpensearchJSONFormatter to use the standard Airflow timezone.from_timestamp utility, addressing a TODO in the codebase.

Key changes

  • Exported from_timestamp in airflow.sdk.timezone to make it part of the public Task SDK API, ensuring consistency across the project.
  • Implemented version-based compatibility logic in the Opensearch provider. The formatter now uses the new timezone.from_timestamp on Airflow 3.3.0+ and maintains backward compatibility by falling back to the original pendulum/datetime logic on older Airflow 3.x releases (e.g., 3.0.6, 3.2.1).

Verification Results

I have verified the changes using both prek (ruff) and breeze (standardized Docker environment).

  • Unit Tests (Breeze): Passed (3 tests)
  • Static Checks (Prek): Passed (ruff)
image1 image2
Was generative AI tooling used to co-author this PR?
  • Yes

Generated-by: Antigravity following the guidelines

Please review this refactor.
cc. @choo121600 @noeunkim


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

Copy link
Copy Markdown
Contributor

@SameerMesiah97 SameerMesiah97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a contradiction between the #TODO and what you are doing in this PR. I have left a comment.

def formatTime(self, record, datefmt=None):
"""Return the creation time of the LogRecord in ISO 8601 date/time format in the local time zone."""
# TODO: Use airflow.utils.timezone.from_timestamp(record.created, tz="local")
# as soon as min Airflow 2.9.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why you are importing timezone from task sdk when the intention was to use timezone from airflow.utils.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.
I initially tried to follow the TODO and use airflow.utils.timezone, but the linter (Ruff) gave an error (TID251) and recommended airflow.sdk.timezone instead. I also saw that other files in this provider were already using the Task SDK, so I followed that pattern.
I’ve also moved the import inside the block to fix the ImportError in CI. Let me know if you’d still like me to revert it to utils.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I initially tried to follow the TODO and use airflow.utils.timezone, but the linter (Ruff) gave an error (TID251) and recommended airflow.sdk.timezone instead. I also saw that other files in this provider were already using the Task SDK, so I followed that pattern. I’ve also moved the import inside the block to fix the ImportError in CI. Let me know if you’d still like me to revert it to utils.

Okay. That makes sense. Thanks for clarifying that.

coerce_datetime,
convert_to_utc,
datetime,
from_timestamp,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to avoid expanding the API surface of the task SDK? It seems overkill for a simple provider PR. Maybe you can try using datetime?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.
I tried to use existing SDK utilities like coerce_datetime to avoid the expansion, but I noticed there's already a linter rule in pyproject.toml (line 886) that explicitly recommends airflow.sdk.timezone.from_timestamp over pendulum.from_timestamp.
It seems the intention was for providers to use this function, but it was just missing from the SDK's public __all__. So I think adding it there is the right fix rather than working around it.

@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 18, 2026

cc @Owen-CH-Leung can you help review this PR?

@23tae can you confirm if the issue affect only Opensearch and not also Elasticsearch?

@23tae
Copy link
Copy Markdown
Author

23tae commented May 18, 2026

@eladkal Yes, I confirmed that the exact same legacy pendulum usage and TODO exist in the Elasticsearch provider (es_json_formatter.py).
Should I open a separate PR to keep the changelogs independent, or just add a commit here?
Let me know how you'd like me to proceed.

@eladkal
Copy link
Copy Markdown
Contributor

eladkal commented May 20, 2026

@eladkal Yes, I confirmed that the exact same legacy pendulum usage and TODO exist in the Elasticsearch provider (es_json_formatter.py). Should I open a separate PR to keep the changelogs independent, or just add a commit here? Let me know how you'd like me to proceed.

We can have seperate PR for elastic that we will review in parallel to this one

@23tae 23tae force-pushed the refactor-opensearch-log-formatter branch from 374f24c to 4efb3fe Compare May 21, 2026 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants