Skip to content

Enable OpenClaw OTEL diagnostics with proxy-aware SDK and log transport patching#163

Merged
bbrowning merged 5 commits intomainfrom
openclaw-otel-diagnostics
Mar 30, 2026
Merged

Enable OpenClaw OTEL diagnostics with proxy-aware SDK and log transport patching#163
bbrowning merged 5 commits intomainfrom
openclaw-otel-diagnostics

Conversation

@bbrowning
Copy link
Copy Markdown
Owner

  • OTEL diagnostics support for OpenClaw: Automatically enable the diagnostics-otel plugin when --otel-endpoint is set, configuring traces, metrics, and logs export via OTLP/Protobuf
  • Proxy-aware OTEL SDK: Patch OpenClaw's bundled @opentelemetry/otlp-exporter-base at build time to route exports through the squid proxy (required for OpenShift)
  • Log transport fix: Patch logger-*.js to promote externalTransports and activeLogger to a globalThis singleton, fixing jiti dual-module isolation that silently breaks OTEL log export (mirrors
    upstream fix(logging): use globalThis for log transport registry to survive jiti plugin loading openclaw/openclaw#50085, idempotent if that PR lands)
  • Port-forward reliability: Add logging and automatic death detection for OpenShift port-forward processes
  • Upgrade fixes: Ensure proxy allowed domains and container images are updated correctly during paude upgrade when --otel-endpoint is set

bbrowning and others added 5 commits March 30, 2026 00:30
Port-forward processes previously ran with stdout/stderr sent to
DEVNULL, making failures invisible. Now output is captured to a
per-session log file under the PID directory.

A background monitor thread watches the port-forward process during
active sessions and writes a terminal warning if it dies unexpectedly.
After disconnect, diagnostics are shown with the last 20 lines of the
log file and a hint to reconnect.

Changes:
- Redirect port-forward stdout/stderr to session log file
- Return the spawned Popen from PortForwardManager.start()
- Add _monitor_port_forward thread for real-time death detection
- Add _show_port_forward_diagnostics for post-disconnect summary
- Clean up log files alongside PID files on stop
- Add log_file() helper to port_forward_utils

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upgrade command was not rebuilding or updating the proxy container
image, so proxy pods kept running stale versions. Additionally, changing
--otel-endpoint during upgrade did not update the proxy's allowed
domains list, causing OTEL exports to be blocked by squid.

Changes:
- Rebuild proxy image during upgrade and patch the Deployment
- Add update_deployment_image() for image-only proxy patches
- Refactor update_deployment_domains() to accept optional otel_ports
  and image parameters, avoiding double pod restarts
- Extract _patch_deployment_container() shared helper
- Update proxy allowed domains when --otel-endpoint is added, changed,
  or cleared during upgrade
- Add comprehensive test coverage for OTEL upgrade scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenClaw's bundled @opentelemetry/otlp-exporter-base creates HTTP
agents that ignore HTTP_PROXY/HTTPS_PROXY environment variables. This
means OTEL exports fail silently when running behind the squid proxy
in OpenShift.

Add patch-openclaw-otel-proxy.sh which injects a proxy-aware agent
factory into both the bundled dist file and node_modules fallback.
The factory uses http-proxy-agent/https-proxy-agent (already present
in the OpenClaw image) and handles ESM bundles via createRequire().

The OpenClaw agent's sandbox config now conditionally enables the
diagnostics-otel plugin when OTEL_EXPORTER_OTLP_ENDPOINT is set,
using a node script to merge the plugin config into .openclaw's
JSON config file.

Also enables OTEL debug logging (OTEL_LOG_LEVEL=debug) when an
OTEL endpoint is configured, to aid in diagnosing export issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When diagnostics-otel loads as a plugin via jiti, it gets a separate
module instance of logger-*.js with its own externalTransports Set and
loggingState. Transports registered by the plugin are invisible to the
gateway's logger, so OTEL logs never flow despite traces and metrics
working.

Fix: patch logger-*.js at build time to promote externalTransports and
activeLogger to a globalThis singleton, so both module instances share
the same state. Mirrors the upstream fix in openclaw/openclaw#50085.
The patch is idempotent — if #50085 lands upstream, the marker check
detects it and skips patching.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upgrade test creates a session with allowed_domains, so a proxy
deployment exists. Our new code path calls ensure_proxy_image_via_build
which creates an ImageStream — an OpenShift-only resource that doesn't
exist on plain Kubernetes. Mock it alongside ensure_image_via_build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bbrowning bbrowning merged commit 0a2dce7 into main Mar 30, 2026
6 checks passed
@bbrowning bbrowning deleted the openclaw-otel-diagnostics branch March 30, 2026 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant