Skip to content

playwright_log.txt improvements. #4855

@aaltat

Description

@aaltat

Browser Library Playwright Log Improvement Plan

This issue is backwards incompatible, because it changes how we log library behavior in playwright_log.txt file. There is risk that someone uses that debug file for some reason and parses the output with a script. But likely hood of that happening is very low, because the format is not that useful. It has always needed manual mapping (from test data and/or from log.html) to the playwright_log.txt file, to see what part of the .txt actually shows which part of the test execution. Therefore not marking this as backwards incompatible. The new format actually allows one to map events from test data or logs to the .txt file with ease, so it is a great improvement.

Purpose

This document defines the producer-side changes to be made in the Browser library
repository to make playwright-log.txt more useful for downstream automated failure
analysis (agentic root cause analysis in CI).

It does not describe consumer-side MCP servers, LLM agents, or analysis logic.
Those remain separate.

Scope

Included:

  • Structured metadata fields on every Browser-library-emitted log line
  • Robot Framework execution context (keyword, test, suite) surfaced in Node.js logs
  • Async/background error structured fields
  • pino configuration improvements
  • Test strategy

Excluded:

  • MCP implementation
  • LLM prompts or agent orchestration
  • output.xml parsing
  • Consumer-side report generation
  • Changes to Playwright-native pw:api lines (outside library control)

Current State

The log file is a mix of two formats:

  1. Browser-library lines — pino JSON, e.g.:

    {"level":30,"time":"2026-04-03T22:35:56.364Z","pid":13488,"hostname":"Mac","msg":"Start of node method newBrowser"}
    
  2. Playwright-native lines — plain text, e.g.:

    2026-04-03T22:35:56.368Z pw:api => browserType.launch started
    

Known structural weaknesses in current Browser-library lines:

  • level is emitted as a numeric code (30) rather than a string ("info")
  • hostname and pid are emitted by pino defaults — noise for CI log consumers
  • Event type and lifecycle stage are only in the free-form msg string
    ("Start of node method newBrowser", "End of node method newBrowser")
  • No Robot Framework execution context (which keyword, test, or suite triggered the event)
  • No stable correlation identifiers for browser/context/page objects
  • No deterministic ordering support when two events share the same millisecond
  • Multi-line error blocks can be untimestamped or untethered from a structured header

Design Decisions

Format: pino JSON is the format. It is already in use. No alternative format will be
introduced. Playwright-native pw:api lines will continue to coexist as-is.

Mechanism for RF context: The existing Robot Framework Listener API hooks
(_start_keyword / _end_keyword, _start_test / _end_test,
_start_suite / _end_suite) in browser.py already call open_trace_group and
close_trace_group via gRPC to drive Playwright tracing groups. The same mechanism
will be extended to carry RF execution context into Node.js, where it is bound into a
module-level context object and merged into every pino log line via pino's mixin()
function. No new architecture, no invasive call-site changes.

Dropped from original plan:

  • Phase 1 (format decision) — already settled, pino/JSON in use
  • data generic payload field — use explicit named fields only
  • error_code synthesized stable code — error_type (class name) is sufficient and
    more stable
  • hostname and pid pino defaults — noise, dropped
  • Option B (Playwright-like text format) — dropped, JSON is strictly better for
    machine consumption

Log Schema

Every Browser-library-emitted pino line will carry these fields:

Field Type Description
time ISO 8601 UTC string Timestamp, same format Playwright emits
level string Log level as a string: "trace", "debug", "info", "warn", "error"
seq number Monotonically increasing counter within the Node.js process for deterministic ordering
component string Fixed value "browser-library"
event_kind string High-level category — see values below
action string Concrete method or operation name
status string "started", "succeeded", "failed", or "info"
msg string Short human-readable description
kw_name string Current RF keyword name, when known
kw_file string Source file of the current keyword, when known
kw_line number Line number of the current keyword, when known
test_id string RF test/task id, when known
test_name string RF test/task name, when known
suite_id string RF suite id, when known
suite_name string RF suite name, when known
browser_id string Browser correlation id, when known
context_id string Browser context correlation id, when known
page_id string Page correlation id, when known
operation_id string Correlation id for a single logical multi-step operation, when applicable
error_type string Exception class name on failure, when applicable

Fields that are not known at the time of emission are omitted (not emitted as null).

event_kind values

Value When used
browser_lifecycle Browser open/close events
context_lifecycle Context open/close events
page_lifecycle Page open/close, navigation events
locator Element finding operations
screenshot Screenshot operations
timeout Timeout and retry events
assertion Assertion keyword events
grpc gRPC method start/end/error wrap lines
internal_error Uncaught errors, background crashes, teardown failures
rf_context RF execution context change events (keyword/test/suite boundaries)

Phases

Phase 1 — pino configuration cleanup

Single-file changes in browser_logger.ts. Independent of all other phases and
done first because it is low-risk and gives immediate log quality improvements.

  1. seq: Add a module-level counter; emit via mixin(). Every log line gets a
    monotonically increasing sequence number for deterministic ordering under concurrency.

  2. formatters.level: Emit string level names ("info", "error", etc.) instead
    of pino's default numeric codes (30, 50).

  3. Remove hostname and pid: Set pino base to null, removing these fields
    from every line.

  4. component: Add fixed value "browser-library" via mixin().

The mixin() function introduced here is the same hook that Phase 2 will extend to
include currentRFContext.


Phase 2 — Structured metadata via the existing Listener + gRPC hook

Primary implementation targets: browser_logger.ts, playwright-state.ts
(openTraceGroup / closeTraceGroup), grpc-service.ts (wrapping* wrappers),
keyword-decorators.ts (async_logger).

What changes:

  1. browser_logger.ts: add pino mixin() that reads a module-level currentRFContext
    object and merges its fields into every emitted log line. currentRFContext is a
    plain module-level variable, updated by the functions below.

  2. playwright-state.tsopenTraceGroup: in addition to its current Playwright
    tracing work, set currentRFContext with kw_name, kw_file, kw_line.

  3. playwright-state.tscloseTraceGroup: clear the keyword fields from
    currentRFContext (test/suite fields remain until the test/suite ends).

  4. grpc-service.ts — all wrapping* wrappers: replace the free-form
    "Start of node method X" / "End of node method X" / "Error of node method X"
    messages with structured log lines carrying event_kind="grpc", action=func.name,
    and status="started" / "succeeded" / "failed".

  5. keyword-decorators.tsasync_logger: same structured fields as above for the
    decorator path.

What this achieves:

Every pino line emitted between openTraceGroup and closeTraceGroup automatically
carries the current keyword name, source file, and line number. No individual
call-site changes needed outside the four files listed.

The existing _start_keywordopen_trace_groupopenTraceGroup gRPC call chain
already fires on every RF keyword. Extending it to set currentRFContext costs nothing
architecturally.


Phase 3 — Async/background error structured fields

With Phase 2 in place, synchronous keyword errors are already correlated via
currentRFContext. What remains are async/background errors that fire outside any
active keyword — browser crashes, stale page events, page errors, teardown races.

What changes:

Targeted updates to existing catch blocks and Playwright event listeners
(e.g., page.on('pageerror', ...), browser.on('disconnected', ...)) in the Node.js
layer:

  • Emit a structured pino line with event_kind="internal_error", status="failed",
    error_type=<error class name>, and whatever currentRFContext is bound at that
    moment (may be empty, which is honest).
  • Where a multi-line stack trace follows, the structured line precedes it, anchoring
    the block to a timestamp and a known event.

Scope: targeted changes only. Not a global rewrite of error handling.


Phase 4 — Test and suite context propagation

What changes:

  1. Proto / gRPC: Add a SetRFContext RPC (or extend the TraceGroup proto message)
    to carry test_id, test_name, suite_id, suite_name. This is the only proto
    change in the plan.

  2. Python side (browser.py):

    • _start_test / _end_test: call SetRFContext with test id and name (clear on
      end).
    • _start_suite / _end_suite: call SetRFContext with suite id and name (clear
      on end).
      Both hooks already receive this data from the RF Listener API attrs dict —
      no new plumbing on the Python side beyond the gRPC call.
  3. Node.js side:

    • SetRFContext handler updates currentRFContext.test_id, currentRFContext.test_name,
      currentRFContext.suite_id, currentRFContext.suite_name.
    • These survive keyword boundaries — they are only cleared when the test/suite ends.
  4. browser_id / context_id / page_id: Add these as explicit fields to
    browser lifecycle log lines. The ids are already present in the currentBrowser
    JSON dump in existing log messages — promote them to top-level fields.

  5. operation_id: Add for multi-step logical operations such as assertion polling
    retry loops. A short UUID generated at the start of the operation, attached to all
    log lines within it.


Phase 5 — pino configuration cleanup

Single-file changes in browser_logger.ts:

  1. seq: Add a module-level counter; emit via mixin() alongside
    currentRFContext.

  2. formatters.level: Emit string level names ("info", "error", etc.) instead
    of pino's default numeric codes (30, 50).

  3. Remove hostname and pid: Set pino base to null, removing these fields
    from every line.

  4. component: Add fixed value "browser-library" via mixin().

Phase 5 is independent of Phases 2–4 and can be done in any order.

Rollout Order

Phase 1 is done first — independent of all others.
Phase 3 depends on Phase 2 (currentRFContext must be in place).
Phase 4 depends on Phase 2 (context binding established).

Phase 1 (pino config)
Phase 2 (RF context + grpc fields) ──── Phase 3 ──── Phase 4

What Is Not in This Plan

Item Reason excluded
Changing pw:api lines Outside library control
data generic payload field Replaced by explicit named fields
error_code synthesized stable code error_type (class name) is sufficient
Option B text format pino JSON is the format
hostname and pid Noise for CI log consumers, dropped in Phase 1

Test Strategy

Approval Tests

Approval tests make log contract changes visible and prevent accidental formatting
regressions. Recommended scenarios:

  1. Browser launch and close lifecycle — confirms browser_lifecycle lines and
    browser_id propagation
  2. Context and page lifecycle under normal flow — confirms context_lifecycle and
    page_lifecycle lines
  3. Keyword boundary wrapping — confirms kw_name, kw_file, kw_line appear on
    lines emitted inside a keyword and are absent after closeTraceGroup
  4. Test and suite context — confirms test_id, test_name, suite_id, suite_name
    on lines inside a test
  5. Structured error line for an async failure — confirms internal_error event with
    error_type on a background crash or page error event
  6. Concurrent activity — confirms seq is monotonically increasing and
    browser_id/context_id/page_id distinguish concurrent objects

Unit Tests

  1. seq is monotonically increasing across calls
  2. event_kind and status mapping in wrapping* wrappers
  3. currentRFContext is set on openTraceGroup and cleared on closeTraceGroup
  4. currentRFContext test/suite fields persist across keyword boundaries
  5. formatters.level emits strings not numbers
  6. hostname and pid are absent from emitted lines

Quality Gates

  • inv utest (pytest) passes
  • inv utest-node (Jest) passes
  • Approval tests pass
  • inv lint-python and inv lint-node pass

Acceptance Criteria

The work is complete when all of the following are true:

  1. Every Browser-library-emitted pino line has a string level field.
  2. hostname and pid are absent from all emitted lines.
  3. Every line has a seq field; values are monotonically increasing within a process.
  4. Every grpc method start/end/error line carries event_kind, action, and status
    as structured fields instead of free-form message text.
  5. All pino lines emitted during a keyword execution carry kw_name, kw_file, and
    kw_line.
  6. All pino lines emitted during a test carry test_id and test_name.
  7. All pino lines emitted during a suite carry suite_id and suite_name.
  8. Background/async error events emit a structured line with event_kind="internal_error",
    status="failed", and error_type.
  9. Browser lifecycle lines carry browser_id; context lines carry context_id; page
    lines carry page_id.
  10. Human readability of the log file remains acceptable for manual debugging.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions