feat: comprehensive telemetry instrumentation#39
Conversation
…d data moat Instrument 25 event types across the CLI lifecycle — auth, MCP servers, Python engine, provider errors, permissions, upgrades, context utilization, agent outcomes, workflow sequencing, and environment census. Adds telemetry docs page, firewall endpoint, 25 unit tests, and privacy-safe helpers (categorizeToolName, bucketCount). No behavioral changes; all telemetry is opt-out via config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jontsai
left a comment
There was a problem hiding this comment.
LGTM
Comprehensive telemetry instrumentation with good privacy practices. Verified:
✓ 25 well-typed event types with consistent snake_case naming
✓ Privacy-safe: no SQL/code/credentials/paths, error messages truncated to 500 chars
✓ categorizeToolName() and bucketCount() are smart privacy helpers
✓ sequence_index + previous_tool on tool_call enables workflow pattern analysis
✓ Engine error tracking across phases (uv_download, venv_create, pip_install) — good for debugging install issues
✓ Docs page with event table, opt-out instructions, and contributor guide
✓ 25 tests covering helpers, events, and privacy validation
Minor observations:
agent_outcomehasdoom_loops: 0, permission_denials: 0hardcoded — worth wiring up actual counters in a follow-upcategorizeToolNameuses substring matching which could miscategorize (e.g., a tool namedsql_warehouse_connectionhitssqlfirst) — ordering matters, current priority seems reasonable
Ship it! 🚢
|
|
||
| export async function process(input: { | ||
| parentID: string | ||
| messages: MessageV2.WithParts[] |
There was a problem hiding this comment.
issue: compactionAttempt is a module-level let — it persists across sessions in the same process. If a user runs multiple sessions (or the CLI is long-lived), the counter leaks between them. Should this be scoped inside process() or reset per-session?
| permission_denials: 0, | ||
| outcome, | ||
| }) | ||
| Telemetry.track({ |
There was a problem hiding this comment.
suggestion: doom_loops: 0 and permission_denials: 0 are hardcoded — they'll never reflect actual counts. Consider tracking these via counters in the prompt loop (increment on doom_loop_detected / permission_denied events) and wiring them in here, or remove the fields until they're real to avoid misleading dashboards.
| const remoteResourcesList = await client.listResources().catch(() => ({ resources: [] })) | ||
| Telemetry.track({ | ||
| type: "mcp_server_census", | ||
| timestamp: Date.now(), |
There was a problem hiding this comment.
thought: listTools() and listResources() are awaited inline during the connect path. If an MCP server is slow to respond to these, it adds latency to the connection sequence. Consider firing these asynchronously (fire-and-forget with .catch) since census data is purely telemetry and shouldn't block the critical path.
| cache_hit_ratio: totalInput > 0 ? Math.round((cacheRead / totalInput) * 1000) / 1000 : 0, | ||
| }) | ||
| } | ||
| if (snapshot) { |
There was a problem hiding this comment.
issue: generation_number is set to toolCallCounter, but these count different things — toolCallCounter tracks tool calls, not generations/steps. This will overcount in multi-tool turns and undercount in zero-tool turns. Should this use a separate generation counter (or step from the prompt loop)?
| if (type === "mcp") return "mcp" | ||
| const n = name.toLowerCase() | ||
| if (n.includes("sql") || n.includes("query")) return "sql" | ||
| if (n.includes("schema") || n.includes("column") || n.includes("table")) return "schema" |
There was a problem hiding this comment.
suggestion: The substring match ordering can cause edge-case miscategorization. For example, warehouse_usage_stats matches warehouse before reaching the warehouse_usage → finops check. Currently the finops check with warehouse_usage comes before the warehouse check so it works — but this ordering dependency is fragile. Consider using a more explicit matching approach (exact prefixes or a lookup map) to make this order-independent.
| Telemetry.track({ | ||
| type: "upgrade_attempted", | ||
| timestamp: Date.now(), | ||
| session_id: Telemetry.getContext().sessionId || "cli", |
There was a problem hiding this comment.
nit: The as "npm" | "bun" | "brew" cast silently maps unknown methods (like choco, yarn, etc.) to "npm". Consider adding an "other" bucket to the type union and using it as the fallback, so dashboards can distinguish actual npm installs from unknown methods.
jontsai
left a comment
There was a problem hiding this comment.
PR #39 Review Tour Guide 🗺️
Overview
+979/-9 across 15 files — comprehensive telemetry instrumentation adding 25 event types across the CLI lifecycle. Single commit by @anandgupta42 + Claude.
File Order
| # | File | +/- | Purpose |
|---|---|---|---|
| 1 | docs/docs/configure/telemetry.md |
+122 | New telemetry docs page |
| 2 | docs/docs/network.md |
+1 | AppInsights firewall endpoint |
| 3 | docs/mkdocs.yml |
+1 | Nav entry |
| 4 | src/bridge/engine.ts |
+49/-3 | engine_started, engine_error events |
| 5 | src/cli/cmd/auth.ts |
+83 | auth_login (7 paths), auth_logout |
| 6 | src/installation/index.ts |
+22 | upgrade_attempted |
| 7 | src/mcp/index.ts |
+75/-1 | mcp_server_status + mcp_server_census |
| 8 | src/permission/next.ts |
+19/-1 | permission_denied |
| 9 | src/session/compaction.ts |
+10 | compaction_triggered |
| 10 | src/session/index.ts |
+10 | session_forked |
| 11 | src/session/processor.ts |
+70/-4 | Tool sequencing, context_utilization, error_recovered, provider_error, doom_loop_detected |
| 12 | src/session/prompt.ts |
+24 | agent_outcome |
| 13 | src/telemetry/index.ts |
+168 | 15 new event types + categorizeToolName() + bucketCount() |
| 14 | src/tool/project-scan.ts |
+41 | environment_census |
| 15 | test/telemetry/telemetry.test.ts |
+284 | 25 tests, 81 assertions |
Tour Highlights
🟢 What's Good
- Privacy design is solid — no SQL/code/creds, errors truncated to 500 chars, counts bucketed via
bucketCount(), file paths excluded - Type system — 25 event types as a discriminated union, compile-time safe
- Comprehensive coverage — auth (7 paths!), MCP lifecycle, engine, permissions, compaction, context utilization
- Tests — 284 lines covering helpers, privacy validation, naming conventions, and all event types
- Docs — clear telemetry page with event table, privacy policy, opt-out, and contributor guide
🟡 Issues Found (6 inline comments)
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | issue | compaction.ts |
Module-level compactionAttempt counter leaks across sessions |
| 2 | suggestion | prompt.ts |
doom_loops: 0 and permission_denials: 0 hardcoded — misleading data |
| 3 | thought | mcp/index.ts |
listTools()/listResources() census calls block the MCP connect path |
| 4 | issue | processor.ts |
generation_number uses toolCallCounter — wrong semantic |
| 5 | suggestion | telemetry/index.ts |
categorizeToolName substring ordering is fragile |
| 6 | nit | installation/index.ts |
Unknown upgrade methods silently map to "npm" |
🟢 No Red Flags
- No security concerns — telemetry is opt-out, privacy-safe
- No breaking changes to existing behavior
- Error paths re-throw properly after tracking
Summary
Solid PR. The 6 inline comments are mostly minor follow-ups — #1 (module-level state) and #4 (wrong counter) are the most impactful. None are blockers for shipping.
- compaction.ts: scope compactionAttempt per-session via Map instead of module-level counter that leaked across sessions - prompt.ts: remove hardcoded doom_loops/permission_denials fields from agent_outcome (were always 0); doom loops and permission denials are already tracked as separate events - mcp/index.ts: make mcp_server_census fire-and-forget so listTools/ listResources never block the MCP connect critical path - processor.ts: use dedicated generationCounter for context_utilization instead of toolCallCounter which counts a different thing - telemetry/index.ts: refactor categorizeToolName to use a pattern array with explicit ordering, making the match logic order-independent and easier to maintain - installation/index.ts: add "other" to upgrade method union instead of silently casting unknown methods to "npm" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
suryaiyer95
left a comment
There was a problem hiding this comment.
Multi-Model Code Review — 6 AI Models
Verdict: REQUEST CHANGES
Critical: 1 | Major: 7 | Minor: 4 | Nits: 2
Reviewed by Claude Opus 4.6, Kimi K2.5, Grok 4, MiniMax M2.5, GLM-5, Qwen 3.5 Plus — converged in 1 round.
Critical Issues
1. Docs claim path scrubbing + 500 char truncation, but code doesn't match
Security / Documentation — src/session/processor.ts:459, docs/docs/configure/telemetry.md:80
The telemetry docs state: "Error messages are truncated to 500 characters and scrubbed of file paths before sending."
Two problems:
(a) In processor.ts:459, the error event truncates to 1000 characters, not 500:
error_message: (e?.message ?? String(e)).slice(0, 1000), // line 459While tool_call error (line 263) and provider_error (line 488) correctly truncate to 500.
(b) There is no path scrubbing implementation anywhere. Error messages containing full filesystem paths (e.g., /Users/john/projects/secret-client/...) are sent as-is.
Fix: Standardize to 500 chars with a const MAX_ERROR_LENGTH = 500, implement a scrubPaths() helper, or update docs to reflect reality.
Flagged by: Claude, Kimi (Consensus)
Major Issues
2. agent_outcome hardcodes doom_loops: 0 and permission_denials: 0
Logic Error — src/session/prompt.ts:808-809
These fields always send zero despite doom_loop_detected and permission_denied events being emitted elsewhere. The outcome summary loses correlation with these events.
Fix: Add session-scoped counters and pass actual values, or remove these fields.
Flagged by: Kimi, MiniMax, Qwen, GLM-5 (Consensus)
3. ALTIMATE_TELEMETRY_DISABLED env var documented but not implemented
Bug / Documentation — docs/docs/configure/telemetry.md:63-64, src/telemetry/index.ts:360-383
The docs say export ALTIMATE_TELEMETRY_DISABLED=true disables telemetry, but init() only checks userConfig.telemetry?.disabled. The env var is never read.
Fix: Add || process.env.ALTIMATE_TELEMETRY_DISABLED === 'true' to the init check.
Flagged by: Claude (Unique)
4. Module-level compactionAttempt counter never resets
Logic Error — src/session/compaction.ts:156
compactionAttempt persists across sessions. If Session A triggers 3 compactions, Session B's first compaction reports attempt: 4.
Fix: Make the counter session-scoped.
Flagged by: MiniMax, Kimi, Qwen, GLM-5 (Consensus)
5. generation_number in context_utilization uses tool call count
Logic Error — src/session/processor.ts:359
generation_number: toolCallCounter, // counts tool invocations, not generationsA single generation can produce multiple tool calls. The field name implies generation count.
Fix: Add a separate generationCounter incremented in finish-step.
Flagged by: Claude (Unique)
6. MCP disconnect hardcodes transport: "stdio"
Logic Error — src/mcp/index.ts:634
transport: "stdio", // Wrong for SSE/streamable-http serversThe disconnect function always reports "stdio" regardless of actual transport type.
Fix: Track the transport type when connecting and look it up on disconnect.
Flagged by: GLM-5 (Convergence fix)
7. upgrade_attempted silently loses method data
Logic Error — src/installation/index.ts:180, 201
const telemetryMethod = (["npm", "bun", "brew"].includes(method) ? method : "npm") as "npm" | "bun" | "brew"Methods like pnpm, yarn, curl, scoop, choco are silently mapped to "npm".
Fix: Expand the type to include all methods.
Flagged by: GLM-5 (Convergence fix)
8. agent_outcome never produces "error" outcome
Logic Error — src/session/prompt.ts:793-797
const outcome = abort.aborted ? "abandoned"
: sessionTotalCost === 0 && toolCallCount === 0 ? "abandoned"
: "completed" // "error" is never producedSessions ending with errors are incorrectly reported as "completed".
Fix: Track a sessionHadError flag and use it in outcome determination.
Flagged by: GLM-5 (Convergence fix)
Minor Issues
9. Session context not updated after fork
src/session/index.ts:275 — After session_forked is tracked, Telemetry.setContext() isn't called with the new session ID. Subsequent events may be attributed to the parent session. (MiniMax)
10. categorizeToolName has order-dependent keyword overlap
src/telemetry/index.ts:259-270 — A tool named warehouse_query_stats matches "query" → "sql" before "warehouse". (Claude)
11. Hardcoded session_id: "cli" in auth events
src/cli/cmd/auth.ts (10+ locations) — Should use Telemetry.getContext().sessionId || "cli" for consistency. (Kimi, Qwen, GLM-5)
12. Privacy tests verify type shapes, not runtime behavior
test/telemetry/telemetry.test.ts:174-246 — Tests construct events manually and check field absence, but don't verify actual track() call sites truncate error messages. (Claude, Kimi, Qwen)
Nits
- Redundant
.toLowerCase()atsrc/telemetry/index.ts:268—nis already lowercased on line 261. - Unnecessary
as constatsrc/session/processor.ts:214, 250— TypeScript infers literal types from string literal ternaries.
Positive Observations
- Privacy-by-design:
categorizeToolName()andbucketCount()are thoughtful privacy primitives. Tool arguments/outputs never included. - Fail-safe telemetry: Silent catch in
flush(),unref()on timer,!enabledguard — telemetry never breaks the CLI. - Strong type safety: Discriminated union
Telemetry.Eventensures compile-time correctness for all 25 event variants. - Comprehensive documentation: Event table, privacy policy, opt-out instructions, contributor guide.
- Clean App Insights integration: Proper envelope serialization, connection string parsing, batched flush model.
Missing Tests
toAppInsightsEnvelopes()serializationparseConnectionString()edge casesflush()/shutdown()behavior- Buffer overflow (> 200 events)
init()respecting disabled config / env var- Network failure silent handling
Finding Attribution
| Issue | Origin | Type |
|---|---|---|
| 1. Error truncation + path scrubbing mismatch | Claude, Kimi | Consensus |
2. Hardcoded zeros in agent_outcome |
Kimi, MiniMax, Qwen, GLM-5 | Consensus |
3. ALTIMATE_TELEMETRY_DISABLED env var not implemented |
Claude | Unique |
4. compactionAttempt counter never resets |
MiniMax, Kimi, Qwen, GLM-5 | Consensus |
5. generation_number uses wrong counter |
Claude | Unique |
6. MCP disconnect hardcodes transport: "stdio" |
GLM-5 | Convergence fix |
7. upgrade_attempted loses method data |
GLM-5 | Convergence fix |
8. agent_outcome never produces "error" |
GLM-5 | Convergence fix |
| 9. Session context not updated after fork | MiniMax | Unique |
10. categorizeToolName order-dependent |
Claude | Unique |
11. Hardcoded session_id: "cli" |
Kimi, Qwen, GLM-5 | Consensus |
| 12. Privacy tests don't verify runtime | Claude, Kimi, Qwen | Consensus |
13. Redundant .toLowerCase() |
Claude, Kimi, GLM-5 | Consensus |
14. Unnecessary as const |
Qwen | Unique |
- Standardize error message truncation to 500 chars (was 1000 in processor.ts) - Implement ALTIMATE_TELEMETRY_DISABLED env var check in telemetry init() - Track MCP transport type on connect, use correct transport in disconnect - Add sessionHadError flag so agent_outcome produces "error" outcome - Use Telemetry.getContext().sessionId in auth events instead of hardcoded "cli" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address telemetry review issues from PR #39 - Standardize error message truncation to 500 chars (was 1000 in processor.ts) - Implement ALTIMATE_TELEMETRY_DISABLED env var check in telemetry init() - Track MCP transport type on connect, use correct transport in disconnect - Add sessionHadError flag so agent_outcome produces "error" outcome - Use Telemetry.getContext().sessionId in auth events instead of hardcoded "cli" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: initialize telemetry early and buffer pre-init events Telemetry.init() was only called in SessionPrompt.next(), but track() calls from MCP connect, engine startup, and auth commands fire earlier. Events tracked before init() were silently dropped. - Buffer events in track() regardless of enabled state (pre-init events are kept and flushed once init() completes) - Clear buffer when init() determines telemetry is disabled - Call init() in main CLI middleware (covers auth, serve, etc.) - Call init() in worker thread startup (covers MCP/engine events) - Keep existing init() in session prompt as idempotent safety net Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: robust telemetry init with promise deduplication and pre-init buffering The previous commit moved init() earlier but had subtle issues: - No init-once guard when telemetry disabled (init ran repeatedly) - Events during async init() were dropped - Concurrent non-awaited init() calls could race Fix: - Use promise deduplication: init() returns same promise on repeated calls - Use initDone flag: track() buffers during init, drops after disable - Clear buffer on every disabled/error exit path in doInit() - Reset initPromise/initDone in shutdown() for test isolation - Add tests for pre-init buffering, post-disable dropping, idempotency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle Config.get() and Control.account() failures in telemetry init Config.get() throws "No context found for instance" when called before Instance.provide() (e.g. CLI middleware, worker thread startup). This caused the outer catch to clear the buffer and disable telemetry entirely. Fix: wrap Config.get() and Control.account() in individual try/catch blocks so failures don't prevent telemetry initialization. The env var ALTIMATE_TELEMETRY_DISABLED is the early-init escape hatch. Verified end-to-end: pre-init events are buffered and flushed to App Insights after init completes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hash user email before sending to telemetry Raw email is PII — replace with SHA-256 hash to preserve anonymous user correlation without leaking personally identifiable information. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Code Review: PR #39 — Comprehensive Telemetry InstrumentationVerdict: REQUEST CHANGES (2 critical, 5 major issues) Critical Issues1. PII Leakage — Raw Email in
|
| Issue | Origin | Type |
|---|---|---|
| PII: raw email in ai.user.id | MiniMax, Kimi | Consensus |
| Missing timeout for listPrompts/listResources | Gemini | Unique |
| Hardcoded App Insights key / opt-out gap | Claude, Codex, Grok, Kimi, MiniMax, GLM-5, Qwen | Consensus |
| MCP disconnect hardcodes "stdio" | Codex, Kimi, MiniMax | Consensus |
| MCP tool type detection mismatch (mcp__ prefix) | Gemini | Unique |
| MCP prompt template getter anti-pattern | Claude | Unique |
| Truncate.cleanup fault isolation | Codex | Unique |
| Empty session_id in early events | Gemini, Kimi, MiniMax, GLM-5, Qwen | Consensus |
| No session_end on process crash | GLM-5 | Unique |
| compactionAttempts unbounded | Codex, GLM-5, Qwen | Consensus |
| ai.cloud.role "altimate-code" after rename | Gemini, Kimi, MiniMax | Consensus |
| Error truncation inconsistent | GLM-5 | Unique |
| skill_count: 0 hardcoded | Codex | Unique |
| Context utilization includes cache_write | Claude | Unique |
| Buffer overflow drops silently | Claude, Kimi, Qwen | Consensus |
Reviewed by 8 models: Claude, Codex (OpenAI), Gemini (Google), Kimi (Moonshot), Grok (xAI), MiniMax, GLM-5 (Zhipu), Qwen (Alibaba). 1 convergence round with corrections incorporated.
Fixes 12 issues identified in the comprehensive code review of PR #39: Critical: - Wrap MCP listPrompts/listResources in withTimeout() to prevent CLI hangs Major: - Fix MCP tool type detection: add MCP.isMcpTool() instead of broken startsWith("mcp__") prefix check - Fix MCP prompt template getter: replace Promise constructor anti-pattern with .then() chain, throw on undefined template - Fix Truncate.cleanup: wrap Identifier.timestamp() per-entry in try/catch to prevent single malformed filename from aborting cleanup Minor: - Fix ai.cloud.role tag: "altimate-code" → "altimate" - Fix error truncation: use .message instead of .toString() to avoid leaking stack traces with file paths - Fix skill_count: populate from Skill.all() instead of hardcoded 0 - Fix context utilization: remove cache_write from token count - Fix buffer overflow: add dropped events counter for telemetry - Fix compactionAttempts: delete session entry on completion/error - Fix empty session_id: use "startup" fallback for early events - Fix session_end: add beforeExit handler for early process termination Docs: - Fix config path in telemetry docs (was ~/.config/altimate/config.json) - Add telemetry field to config schema table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address remaining code review issues from PR #39 Fixes 12 issues identified in the comprehensive code review of PR #39: Critical: - Wrap MCP listPrompts/listResources in withTimeout() to prevent CLI hangs Major: - Fix MCP tool type detection: add MCP.isMcpTool() instead of broken startsWith("mcp__") prefix check - Fix MCP prompt template getter: replace Promise constructor anti-pattern with .then() chain, throw on undefined template - Fix Truncate.cleanup: wrap Identifier.timestamp() per-entry in try/catch to prevent single malformed filename from aborting cleanup Minor: - Fix ai.cloud.role tag: "altimate-code" → "altimate" - Fix error truncation: use .message instead of .toString() to avoid leaking stack traces with file paths - Fix skill_count: populate from Skill.all() instead of hardcoded 0 - Fix context utilization: remove cache_write from token count - Fix buffer overflow: add dropped events counter for telemetry - Fix compactionAttempts: delete session entry on completion/error - Fix empty session_id: use "startup" fallback for early events - Fix session_end: add beforeExit handler for early process termination Docs: - Fix config path in telemetry docs (was ~/.config/altimate/config.json) - Add telemetry field to config schema table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve session_end coverage for process.exit() and add dedup guard Address review comment about beforeExit limitations: - Add process.once("exit") handler to cover process.exit() calls - Add dedup flag to prevent double-firing when finally block also runs - Document why SIGINT/SIGTERM are NOT handled (abort controller already triggers loop exit → finally block → normal session_end) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: comprehensive telemetry instrumentation for error visibility and data moat Instrument 25 event types across the CLI lifecycle — auth, MCP servers, Python engine, provider errors, permissions, upgrades, context utilization, agent outcomes, workflow sequencing, and environment census. Adds telemetry docs page, firewall endpoint, 25 unit tests, and privacy-safe helpers (categorizeToolName, bucketCount). No behavioral changes; all telemetry is opt-out via config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback on telemetry instrumentation - compaction.ts: scope compactionAttempt per-session via Map instead of module-level counter that leaked across sessions - prompt.ts: remove hardcoded doom_loops/permission_denials fields from agent_outcome (were always 0); doom loops and permission denials are already tracked as separate events - mcp/index.ts: make mcp_server_census fire-and-forget so listTools/ listResources never block the MCP connect critical path - processor.ts: use dedicated generationCounter for context_utilization instead of toolCallCounter which counts a different thing - telemetry/index.ts: refactor categorizeToolName to use a pattern array with explicit ordering, making the match logic order-independent and easier to maintain - installation/index.ts: add "other" to upgrade method union instead of silently casting unknown methods to "npm" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address telemetry review issues from PR #39 - Standardize error message truncation to 500 chars (was 1000 in processor.ts) - Implement ALTIMATE_TELEMETRY_DISABLED env var check in telemetry init() - Track MCP transport type on connect, use correct transport in disconnect - Add sessionHadError flag so agent_outcome produces "error" outcome - Use Telemetry.getContext().sessionId in auth events instead of hardcoded "cli" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: initialize telemetry early and buffer pre-init events Telemetry.init() was only called in SessionPrompt.next(), but track() calls from MCP connect, engine startup, and auth commands fire earlier. Events tracked before init() were silently dropped. - Buffer events in track() regardless of enabled state (pre-init events are kept and flushed once init() completes) - Clear buffer when init() determines telemetry is disabled - Call init() in main CLI middleware (covers auth, serve, etc.) - Call init() in worker thread startup (covers MCP/engine events) - Keep existing init() in session prompt as idempotent safety net Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: robust telemetry init with promise deduplication and pre-init buffering The previous commit moved init() earlier but had subtle issues: - No init-once guard when telemetry disabled (init ran repeatedly) - Events during async init() were dropped - Concurrent non-awaited init() calls could race Fix: - Use promise deduplication: init() returns same promise on repeated calls - Use initDone flag: track() buffers during init, drops after disable - Clear buffer on every disabled/error exit path in doInit() - Reset initPromise/initDone in shutdown() for test isolation - Add tests for pre-init buffering, post-disable dropping, idempotency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle Config.get() and Control.account() failures in telemetry init Config.get() throws "No context found for instance" when called before Instance.provide() (e.g. CLI middleware, worker thread startup). This caused the outer catch to clear the buffer and disable telemetry entirely. Fix: wrap Config.get() and Control.account() in individual try/catch blocks so failures don't prevent telemetry initialization. The env var ALTIMATE_TELEMETRY_DISABLED is the early-init escape hatch. Verified end-to-end: pre-init events are buffered and flushed to App Insights after init completes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: hash user email before sending to telemetry Raw email is PII — replace with SHA-256 hash to preserve anonymous user correlation without leaking personally identifiable information. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address remaining code review issues from PR #39 Fixes 12 issues identified in the comprehensive code review of PR #39: Critical: - Wrap MCP listPrompts/listResources in withTimeout() to prevent CLI hangs Major: - Fix MCP tool type detection: add MCP.isMcpTool() instead of broken startsWith("mcp__") prefix check - Fix MCP prompt template getter: replace Promise constructor anti-pattern with .then() chain, throw on undefined template - Fix Truncate.cleanup: wrap Identifier.timestamp() per-entry in try/catch to prevent single malformed filename from aborting cleanup Minor: - Fix ai.cloud.role tag: "altimate-code" → "altimate" - Fix error truncation: use .message instead of .toString() to avoid leaking stack traces with file paths - Fix skill_count: populate from Skill.all() instead of hardcoded 0 - Fix context utilization: remove cache_write from token count - Fix buffer overflow: add dropped events counter for telemetry - Fix compactionAttempts: delete session entry on completion/error - Fix empty session_id: use "startup" fallback for early events - Fix session_end: add beforeExit handler for early process termination Docs: - Fix config path in telemetry docs (was ~/.config/altimate/config.json) - Add telemetry field to config schema table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve session_end coverage for process.exit() and add dedup guard Address review comment about beforeExit limitations: - Add process.once("exit") handler to cover process.exit() calls - Add dedup flag to prevent double-firing when finally block also runs - Document why SIGINT/SIGTERM are NOT handled (abort controller already triggers loop exit → finally block → normal session_end) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
categorizeToolName()for tool classification,bucketCount()for anonymized countsFiles changed (15)
src/telemetry/index.tscategorizeToolName(),bucketCount(),sequence_index/previous_toolon tool_callsrc/session/processor.tssrc/session/prompt.tssrc/mcp/index.tssrc/bridge/engine.tssrc/cli/cmd/auth.tssrc/installation/index.tssrc/session/index.tssrc/session/compaction.tssrc/permission/next.tssrc/tool/project-scan.tstest/telemetry/telemetry.test.tsdocs/docs/configure/telemetry.mddocs/docs/network.mddocs/mkdocs.ymlTest plan
npx tsc --noEmit— no new type errors (2 pre-existing)bun test— 1413 pass, 0 fail across 85 files (including 25 new telemetry tests)domain_actionnaming conventionaltimate, trigger tool calls, verifytool_categoryandsequence_indexappear in events🤖 Generated with Claude Code