Skip to content

Improve telemetry for server events#40

Merged
willwashburn merged 1 commit intomainfrom
more-telemetry
Feb 20, 2026
Merged

Improve telemetry for server events#40
willwashburn merged 1 commit intomainfrom
more-telemetry

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Add a canonical TELEMETRY.md at repo root and remove legacy docs/TELEMETRY.md; update README and CLI disclosure to point to the canonical doc. Implement server-side telemetry ingestion (route CLI/MCP telemetry through /v1/telemetry/events) and introduce a shared telemetry schema/validation in packages/types. Propagate origin attribution changes: default SDK origin metadata is internal-only, added internal SDK factory and MCP now imports the internal factory, and server honors explicit origin params (header-over-query precedence). Expand server telemetry coverage across routes and add/update tests across CLI, MCP, SDK, Python SDK, server, and types packages. Also add numerous trajectory records tracking these changes.

Add a canonical TELEMETRY.md at repo root and remove legacy docs/TELEMETRY.md; update README and CLI disclosure to point to the canonical doc. Implement server-side telemetry ingestion (route CLI/MCP telemetry through /v1/telemetry/events) and introduce a shared telemetry schema/validation in packages/types. Propagate origin attribution changes: default SDK origin metadata is internal-only, added internal SDK factory and MCP now imports the internal factory, and server honors explicit origin params (header-over-query precedence). Expand server telemetry coverage across routes and add/update tests across CLI, MCP, SDK, Python SDK, server, and types packages. Also add numerous trajectory records tracking these changes.
@github-actions
Copy link
Copy Markdown

Preview deployed!

Environment URL
API https://pr40-api.relaycast.dev
Health https://pr40-api.relaycast.dev/health
Observer https://pr40-observer.relaycast.dev

This preview shares the staging database and will be cleaned up when the PR is closed.

Run E2E tests

npm run e2e -- https://pr40-api.relaycast.dev --ci

Open observer dashboard

https://pr40-observer.relaycast.dev

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements comprehensive telemetry improvements for the Relaycast system by establishing server-side event tracking and standardizing origin attribution across all clients.

Changes:

  • Added shared telemetry schema with validation in @relaycast/types for consistent event structures across CLI, MCP, SDK, and server
  • Implemented server-side telemetry capture for 40+ lifecycle events (workspace, agent, channel, message, DM, file, reaction, search, command, webhook, presence, WebSocket sessions)
  • Introduced internal-only origin metadata in SDK/MCP to track request sources without exposing configuration to end users

Reviewed changes

Copilot reviewed 87 out of 87 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/types/src/telemetry.ts New shared telemetry schema with Zod validation, event enums, sanitization, and required property enforcement
packages/server/src/lib/telemetry.ts Server telemetry client with PostHog integration, batching support, and opt-out handling
packages/server/src/lib/serverTelemetry.ts Helper to emit server events with origin attribution from route handlers
packages/server/src/lib/origin.ts Origin metadata extraction from headers/query params with header-over-query precedence
packages/server/src/routes/* Added telemetry events across 15 route files for workspace/agent/channel/message/DM/file/reaction/search/presence/command/webhook operations
packages/server/src/durable-objects/* WebSocket session telemetry in AgentDO and WorkspaceStreamDO with duration tracking
packages/sdk/src/origin.ts SDK origin constants (surface/client/version)
packages/sdk/src/internal.ts Internal factory for SDK/WsClient with custom origin (used by MCP)
packages/sdk/src/client.ts Origin metadata injection via internal Symbol pattern
packages/mcp/src/telemetry.ts Updated to use shared schema validation and send origin in properties
packages/cli/src/telemetry.ts Updated to use shared schema validation and include canonical docs URL
packages/python-sdk/src/relay_sdk/* Added origin metadata parameters to HTTP clients and WsClient
TELEMETRY.md Canonical telemetry documentation with event catalog and opt-out instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 434 to 438
_wasClean: boolean,
): Promise<void> {
// Hibernation API removes the socket from getWebSockets() automatically.
// If no sockets remain, notify PresenceDO so the agent goes offline immediately.
const remaining = this.state.getWebSockets();
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _code and _wasClean parameters are prefixed with underscores indicating they're intentionally unused, but they're actually used in the telemetry event properties on lines 471-472. Either use the parameters without underscores or ensure the naming is consistent with their actual usage.

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +94
async webSocketClose(
_ws: WebSocket,
code: number,
_reason: string,
wasClean: boolean,
): Promise<void> {
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _ws, _reason, and _wasClean parameters are prefixed with underscores indicating they're unused, but wasClean is actually used in the properties. The parameter names should match their usage - either rename _wasClean to wasClean or ensure naming consistency.

Copilot uses AI. Check for mistakes.
Comment on lines +440 to 475
const meta = await this.state.storage.get<AgentConnectionMeta>('meta');
if (meta) {
const doId = this.env.PRESENCE_DO.idFromName(meta.workspaceId);
const stub = this.env.PRESENCE_DO.get(doId);
stub.fetch(new Request('http://do/disconnect', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ agentId: meta.agentId, workspaceId: meta.workspaceId }),
})).catch(() => {});
const workspaceId = meta.workspaceId;
const agentId = meta.agentId;
if (workspaceId && agentId) {
const doId = this.env.PRESENCE_DO.idFromName(workspaceId);
const stub = this.env.PRESENCE_DO.get(doId);
stub.fetch(new Request('http://do/disconnect', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ agentId, workspaceId }),
})).catch(() => {});
}

const origin: TelemetryOrigin = normalizeTelemetryOrigin({
origin_surface: meta.origin_surface,
origin_client: meta.origin_client,
origin_version: meta.origin_version,
});
const resolvedWorkspaceId = workspaceId ?? 'unknown_workspace';
const connectedAt = meta.connectedAtMs ?? Date.now();
const durationMs = Math.max(Date.now() - connectedAt, 0);

await captureInternalTelemetry(this.env, {
event: 'relaycast_server_ws_session_ended',
distinct_id: workspaceDistinctId(resolvedWorkspaceId),
origin,
properties: {
workspace_id: resolvedWorkspaceId,
session_scope: meta.sessionScope ?? 'agent',
duration_ms: durationMs,
close_code: _code,
was_clean: _wasClean,
},
});
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The telemetry event is emitted even if workspaceId or agentId are missing (when they are set to 'unknown_workspace'). This means telemetry will be sent with potentially invalid workspace identifiers. Consider only emitting telemetry when valid workspace and agent IDs are available, or ensure the meta always contains valid values before reaching this point.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +119
const remaining = this.state.getWebSockets();
if (remaining.length > 0) return;

const meta = await this.state.storage.get<WorkspaceConnectionMeta>('meta');
const workspaceId = meta?.workspaceId ?? 'unknown_workspace';
const connectedAt = meta?.connectedAtMs ?? Date.now();
const durationMs = Math.max(Date.now() - connectedAt, 0);

await captureInternalTelemetry(this._env, {
event: 'relaycast_server_ws_session_ended',
distinct_id: workspaceDistinctId(workspaceId),
origin: normalizeTelemetryOrigin({
origin_surface: meta?.origin_surface,
origin_client: meta?.origin_client,
origin_version: meta?.origin_version,
}),
properties: {
workspace_id: workspaceId,
session_scope: meta?.sessionScope ?? 'workspace',
duration_ms: durationMs,
close_code: code,
was_clean: wasClean,
},
});
}
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The telemetry event is emitted even if workspaceId is 'unknown_workspace'. This means telemetry will be sent with potentially invalid workspace identifiers. Consider only emitting telemetry when a valid workspace ID is available from the meta.

Copilot uses AI. Check for mistakes.
@willwashburn willwashburn merged commit 5bd6534 into main Feb 20, 2026
7 checks passed
@willwashburn willwashburn deleted the more-telemetry branch February 20, 2026 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants