ElasticsearchRemoteLogIO: Harden _parse_raw_log to handle malformed and non-JSON log lines#66383
Merged
potiuk merged 1 commit intoMay 10, 2026
Conversation
…ntroducing best-effort parsing with a fallback structure. Add unit tests.
potiuk
approved these changes
May 10, 2026
jason810496
pushed a commit
to jason810496/airflow
that referenced
this pull request
May 11, 2026
…ntroducing best-effort parsing with a fallback structure. Add unit tests. (apache#66383) Co-authored-by: Sameer Mesiah <smesiah971@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR improves the robustness of log parsing in
ElasticsearchRemoteLogIO._parse_raw_log.Previously, each log line was assumed to be valid JSON and parsed with
json.loads, so a single malformed or partially written line could raise an exception and interrupt ingestion. This change introduces best-effort parsing so invalid lines are preserved instead of causing failures.When parsing fails, the raw line is wrapped in a minimal structured format and normalized to ensure compatibility with Airflow’s expectations by guaranteeing the presence of a
messageoreventfield.Rationale
Task logs are not always well-formed JSON due to partial writes, mixed stdout/stderr output, or third-party libraries. Treating parsing failures as fatal reduces observability and makes debugging harder.
Handling these cases gracefully ensures logs remain accessible while still conforming to the structure expected by Airflow’s logging system.
Tests
Added unit tests verifying that:
Documentation
Updated inline comments in
_parse_raw_logto describe parsing and fallback behavior.Backwards Compatibility
This change is fully backwards compatible. Valid logs are unchanged, and malformed lines are now handled gracefully instead of raising exceptions.