Skip to content

ElasticsearchRemoteLogIO: Harden _parse_raw_log to handle malformed and non-JSON log lines#66383

Merged
potiuk merged 1 commit into
apache:mainfrom
SameerMesiah97:ElasticSearchRemoteLogIO-Log-Parsing
May 10, 2026
Merged

ElasticsearchRemoteLogIO: Harden _parse_raw_log to handle malformed and non-JSON log lines#66383
potiuk merged 1 commit into
apache:mainfrom
SameerMesiah97:ElasticSearchRemoteLogIO-Log-Parsing

Conversation

@SameerMesiah97
Copy link
Copy Markdown
Contributor

Description

This PR improves the robustness of log parsing in ElasticsearchRemoteLogIO._parse_raw_log.

Previously, each log line was assumed to be valid JSON and parsed with json.loads, so a single malformed or partially written line could raise an exception and interrupt ingestion. This change introduces best-effort parsing so invalid lines are preserved instead of causing failures.

When parsing fails, the raw line is wrapped in a minimal structured format and normalized to ensure compatibility with Airflow’s expectations by guaranteeing the presence of a message or event field.

Rationale

Task logs are not always well-formed JSON due to partial writes, mixed stdout/stderr output, or third-party libraries. Treating parsing failures as fatal reduces observability and makes debugging harder.

Handling these cases gracefully ensures logs remain accessible while still conforming to the structure expected by Airflow’s logging system.

Tests

Added unit tests verifying that:

  • Mixed valid and invalid JSON lines are parsed correctly, with invalid lines preserved as raw entries and offsets maintained.
  • Plain text logs are fully preserved as unparsed entries with correct message content.
  • Mixed structured and unstructured content is handled correctly, preserving JSON fields and wrapping plain text lines.
  • Empty and whitespace-only lines are ignored during parsing.

Documentation

Updated inline comments in _parse_raw_log to describe parsing and fallback behavior.

Backwards Compatibility

This change is fully backwards compatible. Valid logs are unchanged, and malformed lines are now handled gracefully instead of raising exceptions.

…ntroducing best-effort parsing with a fallback structure. Add unit tests.
@potiuk potiuk merged commit cb23988 into apache:main May 10, 2026
103 checks passed
jason810496 pushed a commit to jason810496/airflow that referenced this pull request May 11, 2026
…ntroducing best-effort parsing with a fallback structure. Add unit tests. (apache#66383)

Co-authored-by: Sameer Mesiah <smesiah971@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants