Skip to content

Feature/stream only tracing#24

Merged
loning merged 11 commits intodevfrom
feature/stream-only-tracing
Mar 5, 2026
Merged

Feature/stream only tracing#24
loning merged 11 commits intodevfrom
feature/stream-only-tracing

Conversation

@louis4li
Copy link
Copy Markdown
Contributor

@louis4li louis4li commented Mar 4, 2026

Summary

  • implement stream-first tracing end-to-end with Jaeger integration
  • standardize tracing correlation across runtime logs and workflow API using:
    • trace_id
    • correlation_id
    • causation_id
  • align HTTP/WebSocket/API tracing visibility with the same 3-key contract
  • refactor observability tracing structure to reduce duplication and keep behavior consistent
  • consolidate tracing documentation and validation guidance around the current baseline

Validation

  • tracing behavior is verified through host API and runtime tracing tests
  • Jaeger setup and verification flow is documented and aligned with current implementation
  • log/API tracing fields are consistent with the 3-key contract

louis4li and others added 11 commits March 4, 2026 10:43
Align wording to emphasize event-class enrichment (not suppression sampling) and refine Jaeger validation guidance so correlation_id stays business-operation scoped instead of command-id constrained.

Made-with: Cursor
… API.

Unify trace, correlation, and causation handling via shared publish and scope helpers so Orleans, Local runtime, and workflow HTTP/WS endpoints expose consistent observability signals and coverage.

Made-with: Cursor
… handle-envelope instrumentation.

This streamlines runtime observability by deleting dead configuration and centralizing activity+log-scope setup in one helper, reducing duplicated Orleans/Local tracing code while preserving behavior.

Made-with: Cursor
The coordinator now focuses on command ack and run-event streaming only, dropping the extra snapshot query.result step and associated query service plumbing in endpoints and tests.

Made-with: Cursor
- Add UseAevatarApiTracingScope middleware for unified trace_id/correlation_id scope
- Remove per-handler BeginApiScope from ChatEndpoints
- Merge jaeger-stream-tracing-validation into workflow-jaeger-observability-guide
- Add log samples and clarify scope vs manual field usage
- Strengthen tracing contract tests for HTTP headers and WS error envelopes

Made-with: Cursor
Reduce duplicated tracing guidance by turning observability docs into a concise runbook, removing obsolete validation placeholder docs, and trimming speculative future evolution from the core stream-first design.

Made-with: Cursor
Capture a scoped quality assessment against dev for tracing documentation changes, including dimension scores, findings, and recommendations for follow-up maintenance.

Made-with: Cursor
Keep trace_id for internal logging and Jaeger correlation only, while preserving correlationId in HTTP/WebSocket contracts and updating tests accordingly.

Made-with: Cursor
… scopes

The middleware pushed trace_id/correlation_id/causation_id into every
request log scope, but the values were nearly always empty strings.
The runtime-level TracingContextHelpers already provides real values
during event processing, and ASP.NET Core Activity scope covers TraceId
at the request level. Removing this layer deduplicates the JSON log
Scopes array from 5 entries to 3 with no information loss.

Made-with: Cursor
- Introduced a new scorecard document detailing the audit findings for the PR review workflow observability sampling ratio.
- Highlighted issues with the current sampling ratio configuration that could lead to host startup failures due to NaN/Infinity values.
- Recommended fixes and additional tests to ensure robust handling of sampling ratio inputs.

This commit aims to enhance the observability and reliability of the workflow by addressing potential configuration pitfalls.
…acing

# Conflicts:
#	src/workflow/Aevatar.Workflow.Infrastructure/CapabilityApi/ChatEndpoints.cs
#	src/workflow/Aevatar.Workflow.Infrastructure/CapabilityApi/ChatWebSocketRunCoordinator.cs
#	test/Aevatar.Workflow.Host.Api.Tests/ChatWebSocketCoordinatorAndProtocolTests.cs
@loning loning merged commit 6451abe into dev Mar 5, 2026
8 checks passed
eanzhao added a commit that referenced this pull request May 8, 2026
- #13 (major, arch): /api/oauth/aevatar-client/rebuild now dispatches
  ProvisionAevatarOAuthClientCommand via IActorDispatchPort.DispatchAsync,
  matching /unbind. The inline actor.HandleEventAsync was a known
  CLAUDE.md "投递语义必须 runtime-neutral" violation; aligning the two
  endpoints removes the inconsistency that any future inbox middleware
  would silently bypass on rebuild.
- #24 (minor, design): callback endpoint accepts ?format=json on the URL
  to opt back into the {status:"bound", already_bound, display_name}
  envelope that programmatic CLI/SDK consumers used pre-HTML-render.
  Default stays HTML for browser callbacks.
- #26 (minor, arch): /rebuild now sits behind a RebuildAuthEndpointFilter
  that enforces the admin-token check before model binding and per-request
  DI activation kick in. The filter + the inline check in the handler are
  redundant by design (defense in depth) — the filter rejects unauth
  posts before deserialization runs, and the handler still validates so
  hand-rolled tests/integration scenarios cannot bypass.
- #28 (minor, design): document the readmodel-deletion contract in the
  ExternalIdentityBindingProjector header — empty BindingId deletes the
  document instead of upserting an inactive record; downstream audit
  consumers must read the committed-event log directly.
- #1 + #2 (blocker, arch): no change needed. Earlier commits in this PR
  already moved /model self-heal to IActorDispatchPort.DispatchAsync and
  removed the EnsureProjectionForActorAsync call from the slash-command
  request path. Verified by reading the current handler.
- #25 (minor, test): documented in the rebuild handler comment — concurrent
  /rebuild calls would race on the same actor, but this is operator-grade
  break-glass and de-duping concurrent rebuilds is out of scope.

Build clean (Identity), 34 OAuth-path tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants