fix(logging): route all console logs to stderr to protect stdout JSON#497
fix(logging): route all console logs to stderr to protect stdout JSON#497juanmichelini merged 2 commits intomainfrom
Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean, simple fix that solves a real problem by following Unix conventions (stdout for data, stderr for diagnostics).
Verdict: ✅ Worth merging
Key Insight: The original code had a subtle bug where rich-mode logs went to stdout but plain logs went to stderr—this fix correctly routes everything to stderr, protecting structured output on stdout.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - This is the right fix. The original code had a subtle but real bug: rich mode dumped diagnostics to stdout while plain mode correctly used stderr. Your change fixes it properly by following the Unix rule: stdout is for data, stderr is for diagnostics. No special cases, no complexity, just correct I/O redirection. Ship it.
Root cause of swebenchmultimodal jq parse error (exit code 5): _ThreadRoutedConsoleHandler wrote formatted log messages to sys.__stdout__, which got captured by shell $() substitution. OpenTelemetry context-detach errors appearing after JSON output corrupted the stream that run_swebenchmultimodal.sh pipes to jq. Changes: - Route ALL console handler output to sys.__stderr__ (matches main's approach from PR #497) to protect stdout for JSON output parsing - Suppress opentelemetry.context logger (CRITICAL level) to prevent harmless context-detach errors from leaking to any output stream - Extract setup_routed_logging() to set up main-thread logging defaults explicitly, eliminating fallback paths in handlers - Change PendingInstance.start_time to Optional[float] (None while queued) instead of float("inf") sentinel value - Replace _ThreadLocalWriter.__getattr__ with explicit properties (encoding, closed, isatty, fileno) - Add integration tests: test_evaluation_run_end_to_end (4 instances with 1 failure through full Evaluation.run()) and test_evaluation_timeout_cancels_instance - Fix test_workspace_cleanup.py to use _process_one_sync (renamed from _process_one_mp in asyncio refactor) - Add pytest-asyncio to dev dependencies with asyncio_mode = "auto" - Update SDK submodule to latest main - Rebase onto latest main Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This PR routes all console logs to
stderrinstead ofstdoutinbenchmarks/utils/console_logging.py, even when rich formatting is enabled.Why
Scripts that consume the output of benchmark runners (like
swebenchmultimodal) often expect pure JSON onstdout. If logs are printed tostdout, they can corrupt the JSON stream, causing tools likejqto fail.The
swebenchmultimodalbenchmark, for example, crashes consistently withjq: parse error: Invalid numeric literalwhen OpenTelemetry logs an error tostdout.By ensuring all logs go to
stderr, we keepstdoutclean for machine-readable output.Fixes an issue where OpenTelemetry context detachment errors printed to stdout break JSON parsing in downstream scripts.