Skip to content

fix(logging): route all console logs to stderr to protect stdout JSON#497

Merged
juanmichelini merged 2 commits intomainfrom
fix-logging-stderr
Mar 10, 2026
Merged

fix(logging): route all console logs to stderr to protect stdout JSON#497
juanmichelini merged 2 commits intomainfrom
fix-logging-stderr

Conversation

@juanmichelini
Copy link
Collaborator

This PR routes all console logs to stderr instead of stdout in benchmarks/utils/console_logging.py, even when rich formatting is enabled.

Why

Scripts that consume the output of benchmark runners (like swebenchmultimodal) often expect pure JSON on stdout. If logs are printed to stdout, they can corrupt the JSON stream, causing tools like jq to fail.

The swebenchmultimodal benchmark, for example, crashes consistently with jq: parse error: Invalid numeric literal when OpenTelemetry logs an error to stdout.

By ensuring all logs go to stderr, we keep stdout clean for machine-readable output.

Fixes an issue where OpenTelemetry context detachment errors printed to stdout break JSON parsing in downstream scripts.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean, simple fix that solves a real problem by following Unix conventions (stdout for data, stderr for diagnostics).

Verdict: ✅ Worth merging

Key Insight: The original code had a subtle bug where rich-mode logs went to stdout but plain logs went to stderr—this fix correctly routes everything to stderr, protecting structured output on stdout.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - This is the right fix. The original code had a subtle but real bug: rich mode dumped diagnostics to stdout while plain mode correctly used stderr. Your change fixes it properly by following the Unix rule: stdout is for data, stderr is for diagnostics. No special cases, no complexity, just correct I/O redirection. Ship it.

@juanmichelini juanmichelini merged commit 51bbbc7 into main Mar 10, 2026
3 checks passed
simonrosenberg pushed a commit that referenced this pull request Mar 18, 2026
Root cause of swebenchmultimodal jq parse error (exit code 5):
_ThreadRoutedConsoleHandler wrote formatted log messages to
sys.__stdout__, which got captured by shell $() substitution.
OpenTelemetry context-detach errors appearing after JSON output
corrupted the stream that run_swebenchmultimodal.sh pipes to jq.

Changes:
- Route ALL console handler output to sys.__stderr__ (matches main's
  approach from PR #497) to protect stdout for JSON output parsing
- Suppress opentelemetry.context logger (CRITICAL level) to prevent
  harmless context-detach errors from leaking to any output stream
- Extract setup_routed_logging() to set up main-thread logging
  defaults explicitly, eliminating fallback paths in handlers
- Change PendingInstance.start_time to Optional[float] (None while
  queued) instead of float("inf") sentinel value
- Replace _ThreadLocalWriter.__getattr__ with explicit properties
  (encoding, closed, isatty, fileno)
- Add integration tests: test_evaluation_run_end_to_end (4 instances
  with 1 failure through full Evaluation.run()) and
  test_evaluation_timeout_cancels_instance
- Fix test_workspace_cleanup.py to use _process_one_sync (renamed
  from _process_one_mp in asyncio refactor)
- Add pytest-asyncio to dev dependencies with asyncio_mode = "auto"
- Update SDK submodule to latest main
- Rebase onto latest main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants