Add optional OpenTelemetry trace export for job lifecycle#4465
Draft
stefanpenner wants to merge 1 commit intoactions:masterfrom
Draft
Add optional OpenTelemetry trace export for job lifecycle#4465stefanpenner wants to merge 1 commit intoactions:masterfrom
stefanpenner wants to merge 1 commit intoactions:masterfrom
Conversation
Adds an OTel trace recorder that implements the existing MetricsRecorder interface. When configured with an OTLP endpoint, the listener emits three child spans per completed job: - runner.queue: QueueTime → ScaleSetAssignTime - runner.startup: ScaleSetAssignTime → RunnerAssignTime - runner.execution: RunnerAssignTime → FinishTime Spans use deterministic trace/span IDs (MD5 of runID-attempt, big-endian jobID) compatible with tools that reconstruct GitHub Actions workflows as OpenTelemetry traces. Configuration: set otel_endpoint (and optionally otel_insecure) in the listener config JSON, or pass via Helm values. When no endpoint is configured, behavior is unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
listener.MetricsRecorderinterface alongside the Prometheus exporterrunner.queue,runner.startup,runner.executionCompositeRecorderto fan out to both Prometheus and OTel when both are enabledotel_endpointis not setMotivation
GitHub Actions workflows can be reconstructed as OpenTelemetry traces (workflow → job → step), but there's a visibility gap between "job was queued" and "step started executing." ARC has the timestamps that explain this gap —
QueueTime,ScaleSetAssignTime,RunnerAssignTime,FinishTime— but currently only exposes them as Prometheus histogram aggregates.This PR emits those timestamps as individual trace spans, giving per-job visibility into:
Trace correlation
Spans use deterministic IDs (
TraceID = MD5(runID-attempt),SpanID = BigEndian(jobID)) that are compatible with tools like otel-explorer which reconstruct workflow traces from the GitHub API. ARC's runner spans automatically merge into the same trace as the workflow/job/step spans — no correlation configuration needed.Configuration
{ "otel_endpoint": "otel-collector.monitoring:4318", "otel_insecure": true }Or via Helm values:
Files changed
cmd/ghalistener/metrics/otel.goOTelRecorderimplementinglistener.MetricsRecordercmd/ghalistener/metrics/composite.goCompositeRecorderfan-out wrappercmd/ghalistener/metrics/otel_test.gocmd/ghalistener/main.gocmd/ghalistener/config/config.gootel_endpointandotel_insecurefieldsgo.mod/go.sumTest plan
go build ./cmd/ghalistener/compiles clean🤖 Generated with Claude Code