Enable OpenClaw OTEL diagnostics with proxy-aware SDK and log transport patching#163
Merged
Enable OpenClaw OTEL diagnostics with proxy-aware SDK and log transport patching#163
Conversation
Port-forward processes previously ran with stdout/stderr sent to DEVNULL, making failures invisible. Now output is captured to a per-session log file under the PID directory. A background monitor thread watches the port-forward process during active sessions and writes a terminal warning if it dies unexpectedly. After disconnect, diagnostics are shown with the last 20 lines of the log file and a hint to reconnect. Changes: - Redirect port-forward stdout/stderr to session log file - Return the spawned Popen from PortForwardManager.start() - Add _monitor_port_forward thread for real-time death detection - Add _show_port_forward_diagnostics for post-disconnect summary - Clean up log files alongside PID files on stop - Add log_file() helper to port_forward_utils Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upgrade command was not rebuilding or updating the proxy container image, so proxy pods kept running stale versions. Additionally, changing --otel-endpoint during upgrade did not update the proxy's allowed domains list, causing OTEL exports to be blocked by squid. Changes: - Rebuild proxy image during upgrade and patch the Deployment - Add update_deployment_image() for image-only proxy patches - Refactor update_deployment_domains() to accept optional otel_ports and image parameters, avoiding double pod restarts - Extract _patch_deployment_container() shared helper - Update proxy allowed domains when --otel-endpoint is added, changed, or cleared during upgrade - Add comprehensive test coverage for OTEL upgrade scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenClaw's bundled @opentelemetry/otlp-exporter-base creates HTTP agents that ignore HTTP_PROXY/HTTPS_PROXY environment variables. This means OTEL exports fail silently when running behind the squid proxy in OpenShift. Add patch-openclaw-otel-proxy.sh which injects a proxy-aware agent factory into both the bundled dist file and node_modules fallback. The factory uses http-proxy-agent/https-proxy-agent (already present in the OpenClaw image) and handles ESM bundles via createRequire(). The OpenClaw agent's sandbox config now conditionally enables the diagnostics-otel plugin when OTEL_EXPORTER_OTLP_ENDPOINT is set, using a node script to merge the plugin config into .openclaw's JSON config file. Also enables OTEL debug logging (OTEL_LOG_LEVEL=debug) when an OTEL endpoint is configured, to aid in diagnosing export issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When diagnostics-otel loads as a plugin via jiti, it gets a separate module instance of logger-*.js with its own externalTransports Set and loggingState. Transports registered by the plugin are invisible to the gateway's logger, so OTEL logs never flow despite traces and metrics working. Fix: patch logger-*.js at build time to promote externalTransports and activeLogger to a globalThis singleton, so both module instances share the same state. Mirrors the upstream fix in openclaw/openclaw#50085. The patch is idempotent — if #50085 lands upstream, the marker check detects it and skips patching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upgrade test creates a session with allowed_domains, so a proxy deployment exists. Our new code path calls ensure_proxy_image_via_build which creates an ImageStream — an OpenShift-only resource that doesn't exist on plain Kubernetes. Mock it alongside ensure_image_via_build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
diagnostics-otelplugin when--otel-endpointis set, configuring traces, metrics, and logs export via OTLP/Protobuf@opentelemetry/otlp-exporter-baseat build time to route exports through the squid proxy (required for OpenShift)logger-*.jsto promoteexternalTransportsandactiveLoggerto aglobalThissingleton, fixing jiti dual-module isolation that silently breaks OTEL log export (mirrorsupstream fix(logging): use globalThis for log transport registry to survive jiti plugin loading openclaw/openclaw#50085, idempotent if that PR lands)
paude upgradewhen--otel-endpointis set