Conversation
Add a canonical TELEMETRY.md at repo root and remove legacy docs/TELEMETRY.md; update README and CLI disclosure to point to the canonical doc. Implement server-side telemetry ingestion (route CLI/MCP telemetry through /v1/telemetry/events) and introduce a shared telemetry schema/validation in packages/types. Propagate origin attribution changes: default SDK origin metadata is internal-only, added internal SDK factory and MCP now imports the internal factory, and server honors explicit origin params (header-over-query precedence). Expand server telemetry coverage across routes and add/update tests across CLI, MCP, SDK, Python SDK, server, and types packages. Also add numerous trajectory records tracking these changes.
|
Preview deployed!
This preview shares the staging database and will be cleaned up when the PR is closed. Run E2E testsnpm run e2e -- https://pr40-api.relaycast.dev --ciOpen observer dashboard |
There was a problem hiding this comment.
Pull request overview
This PR implements comprehensive telemetry improvements for the Relaycast system by establishing server-side event tracking and standardizing origin attribution across all clients.
Changes:
- Added shared telemetry schema with validation in
@relaycast/typesfor consistent event structures across CLI, MCP, SDK, and server - Implemented server-side telemetry capture for 40+ lifecycle events (workspace, agent, channel, message, DM, file, reaction, search, command, webhook, presence, WebSocket sessions)
- Introduced internal-only origin metadata in SDK/MCP to track request sources without exposing configuration to end users
Reviewed changes
Copilot reviewed 87 out of 87 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/types/src/telemetry.ts | New shared telemetry schema with Zod validation, event enums, sanitization, and required property enforcement |
| packages/server/src/lib/telemetry.ts | Server telemetry client with PostHog integration, batching support, and opt-out handling |
| packages/server/src/lib/serverTelemetry.ts | Helper to emit server events with origin attribution from route handlers |
| packages/server/src/lib/origin.ts | Origin metadata extraction from headers/query params with header-over-query precedence |
| packages/server/src/routes/* | Added telemetry events across 15 route files for workspace/agent/channel/message/DM/file/reaction/search/presence/command/webhook operations |
| packages/server/src/durable-objects/* | WebSocket session telemetry in AgentDO and WorkspaceStreamDO with duration tracking |
| packages/sdk/src/origin.ts | SDK origin constants (surface/client/version) |
| packages/sdk/src/internal.ts | Internal factory for SDK/WsClient with custom origin (used by MCP) |
| packages/sdk/src/client.ts | Origin metadata injection via internal Symbol pattern |
| packages/mcp/src/telemetry.ts | Updated to use shared schema validation and send origin in properties |
| packages/cli/src/telemetry.ts | Updated to use shared schema validation and include canonical docs URL |
| packages/python-sdk/src/relay_sdk/* | Added origin metadata parameters to HTTP clients and WsClient |
| TELEMETRY.md | Canonical telemetry documentation with event catalog and opt-out instructions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _wasClean: boolean, | ||
| ): Promise<void> { | ||
| // Hibernation API removes the socket from getWebSockets() automatically. | ||
| // If no sockets remain, notify PresenceDO so the agent goes offline immediately. | ||
| const remaining = this.state.getWebSockets(); |
There was a problem hiding this comment.
The _code and _wasClean parameters are prefixed with underscores indicating they're intentionally unused, but they're actually used in the telemetry event properties on lines 471-472. Either use the parameters without underscores or ensure the naming is consistent with their actual usage.
| async webSocketClose( | ||
| _ws: WebSocket, | ||
| code: number, | ||
| _reason: string, | ||
| wasClean: boolean, | ||
| ): Promise<void> { |
There was a problem hiding this comment.
The _ws, _reason, and _wasClean parameters are prefixed with underscores indicating they're unused, but wasClean is actually used in the properties. The parameter names should match their usage - either rename _wasClean to wasClean or ensure naming consistency.
| const meta = await this.state.storage.get<AgentConnectionMeta>('meta'); | ||
| if (meta) { | ||
| const doId = this.env.PRESENCE_DO.idFromName(meta.workspaceId); | ||
| const stub = this.env.PRESENCE_DO.get(doId); | ||
| stub.fetch(new Request('http://do/disconnect', { | ||
| method: 'POST', | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ agentId: meta.agentId, workspaceId: meta.workspaceId }), | ||
| })).catch(() => {}); | ||
| const workspaceId = meta.workspaceId; | ||
| const agentId = meta.agentId; | ||
| if (workspaceId && agentId) { | ||
| const doId = this.env.PRESENCE_DO.idFromName(workspaceId); | ||
| const stub = this.env.PRESENCE_DO.get(doId); | ||
| stub.fetch(new Request('http://do/disconnect', { | ||
| method: 'POST', | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| body: JSON.stringify({ agentId, workspaceId }), | ||
| })).catch(() => {}); | ||
| } | ||
|
|
||
| const origin: TelemetryOrigin = normalizeTelemetryOrigin({ | ||
| origin_surface: meta.origin_surface, | ||
| origin_client: meta.origin_client, | ||
| origin_version: meta.origin_version, | ||
| }); | ||
| const resolvedWorkspaceId = workspaceId ?? 'unknown_workspace'; | ||
| const connectedAt = meta.connectedAtMs ?? Date.now(); | ||
| const durationMs = Math.max(Date.now() - connectedAt, 0); | ||
|
|
||
| await captureInternalTelemetry(this.env, { | ||
| event: 'relaycast_server_ws_session_ended', | ||
| distinct_id: workspaceDistinctId(resolvedWorkspaceId), | ||
| origin, | ||
| properties: { | ||
| workspace_id: resolvedWorkspaceId, | ||
| session_scope: meta.sessionScope ?? 'agent', | ||
| duration_ms: durationMs, | ||
| close_code: _code, | ||
| was_clean: _wasClean, | ||
| }, | ||
| }); | ||
| } |
There was a problem hiding this comment.
The telemetry event is emitted even if workspaceId or agentId are missing (when they are set to 'unknown_workspace'). This means telemetry will be sent with potentially invalid workspace identifiers. Consider only emitting telemetry when valid workspace and agent IDs are available, or ensure the meta always contains valid values before reaching this point.
| const remaining = this.state.getWebSockets(); | ||
| if (remaining.length > 0) return; | ||
|
|
||
| const meta = await this.state.storage.get<WorkspaceConnectionMeta>('meta'); | ||
| const workspaceId = meta?.workspaceId ?? 'unknown_workspace'; | ||
| const connectedAt = meta?.connectedAtMs ?? Date.now(); | ||
| const durationMs = Math.max(Date.now() - connectedAt, 0); | ||
|
|
||
| await captureInternalTelemetry(this._env, { | ||
| event: 'relaycast_server_ws_session_ended', | ||
| distinct_id: workspaceDistinctId(workspaceId), | ||
| origin: normalizeTelemetryOrigin({ | ||
| origin_surface: meta?.origin_surface, | ||
| origin_client: meta?.origin_client, | ||
| origin_version: meta?.origin_version, | ||
| }), | ||
| properties: { | ||
| workspace_id: workspaceId, | ||
| session_scope: meta?.sessionScope ?? 'workspace', | ||
| duration_ms: durationMs, | ||
| close_code: code, | ||
| was_clean: wasClean, | ||
| }, | ||
| }); | ||
| } |
There was a problem hiding this comment.
The telemetry event is emitted even if workspaceId is 'unknown_workspace'. This means telemetry will be sent with potentially invalid workspace identifiers. Consider only emitting telemetry when a valid workspace ID is available from the meta.
Add a canonical TELEMETRY.md at repo root and remove legacy docs/TELEMETRY.md; update README and CLI disclosure to point to the canonical doc. Implement server-side telemetry ingestion (route CLI/MCP telemetry through /v1/telemetry/events) and introduce a shared telemetry schema/validation in packages/types. Propagate origin attribution changes: default SDK origin metadata is internal-only, added internal SDK factory and MCP now imports the internal factory, and server honors explicit origin params (header-over-query precedence). Expand server telemetry coverage across routes and add/update tests across CLI, MCP, SDK, Python SDK, server, and types packages. Also add numerous trajectory records tracking these changes.