Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/ai/design/feature-claude-sessions-pid-matching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
phase: design
title: System Design & Architecture
description: Define the technical architecture, components, and data models
---

# System Design & Architecture

## Architecture Overview

The change is localised to `ClaudeCodeAdapter`. The detection flow always attempts a PID-file lookup for every process first; only processes whose PID file cannot be found fall through to the existing legacy matching step.

```mermaid
flowchart TD
A[detectAgents] --> B[listAgentProcesses - ps aux]
B --> C[enrichProcesses - lsof + ps]
C --> D[For each PID: try read ~/.claude/sessions/PID.json]
D --> E{PID file found?}
E -->|No| G[Add to legacy-fallback set]
E -->|Yes| F{startedAt within 60s\nof proc.startTime?}
F -->|No - stale| G
F -->|Yes| H[Resolve JSONL path from sessionId + cwd]
H --> I{JSONL exists?}
I -->|No| G
I -->|Yes| J[Direct match: process → session]
G --> K[discoverSessions for fallback processes]
K --> L[matchProcessesToSessions - existing algo]
J --> M[Merge direct matches + legacy matches]
L --> M
M --> N[Read sessions and build AgentInfo]
```

## Data Models

### PID file schema (`~/.claude/sessions/<pid>.json`)
```typescript
interface PidFileEntry {
pid: number;
sessionId: string; // filename without .jsonl
cwd: string; // working directory when Claude started
startedAt: number; // epoch milliseconds
kind: string; // e.g. "interactive" — not used
entrypoint: string; // e.g. "cli" — not used
}
```

### New internal type: `DirectMatch`
```typescript
interface DirectMatch {
process: ProcessInfo;
sessionFile: SessionFile; // reuse existing SessionFile shape
}
```

## Component Breakdown

### Modified: `ClaudeCodeAdapter`

**New private method**: `tryPidFileMatching(processes: ProcessInfo[]): { direct: DirectMatch[]; fallback: ProcessInfo[] }`
- For each process, attempts to read `~/.claude/sessions/<pid>.json`.
- If the file is absent or unreadable: process goes to `fallback`.
- If the file is present:
- Cross-checks `entry.startedAt` (epoch ms) against `proc.startTime.getTime()`; if delta > 60 s, file is stale → process goes to `fallback`.
- Resolves the JSONL path: `~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl` using the `cwd` from the PID file.
- Verifies the JSONL exists; if missing: process goes to `fallback`.
- If JSONL exists: process goes to `direct`.
- There is **no upfront directory-existence check** — each PID is always tried individually. Missing files are handled per-process via try/catch.

**Modified**: `detectAgents()`
- Calls `tryPidFileMatching()` after enrichment.
- Passes only `fallback` processes to the existing `discoverSessions()` + `matchProcessesToSessions()` pipeline.
- Merges `direct` matches with legacy match results before building `AgentInfo` objects.

### Unchanged
- `utils/process.ts` — process listing and enrichment unchanged.
- `utils/session.ts` — session file discovery unchanged.
- `utils/matching.ts` — matching algorithm unchanged.
- All other adapters — untouched.

## Design Decisions

| Decision | Choice | Rationale |
|----------|--------|-----------|
| Where to do PID file lookup | Inside `ClaudeCodeAdapter` as a private method | Keeps the change isolated; other adapters don't need it |
| CWD source for JSONL path encoding | PID file's `cwd` field | PID file is authoritative; lsof cwd may differ (symlinks, etc.) |
| `startedAt` type | Epoch milliseconds (`number`) | Verified from real files — not an ISO string |
| Stale file guard | Cross-check `entry.startedAt` vs `proc.startTime` (60 s tolerance) | Catches PID reuse without false positives from normal startup delays |
| `enrichProcesses()` scope | Run on all processes before the split | `proc.startTime` is needed for the stale-file guard; batched call is cheap |
| Error handling for malformed PID files | Catch + fall back to legacy | Avoids crashing; older or corrupt files handled gracefully |
| Batching PID file reads | No batching (sequential per PID) | Files are tiny JSON; overhead is negligible |
| Reuse `SessionFile` shape for direct matches | Yes | Avoids new types; existing `readSession` and `buildAgentInfo` code works unchanged |

## Non-Functional Requirements

- **No performance regression**: PID file reads add at most one `fs.readFileSync` + `fs.existsSync` per process, which is negligible.
- **Backward compatibility**: All existing behaviour is preserved when no PID files exist (older Claude Code installs). Each missing file falls through to the legacy algorithm per-process.
- **No new external dependencies**.
96 changes: 96 additions & 0 deletions docs/ai/implementation/feature-claude-sessions-pid-matching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
phase: implementation
title: Implementation Guide
description: Technical implementation notes, patterns, and code guidelines
---

# Implementation Guide

## Code Structure

All changes are in `packages/agent-manager/src/adapters/ClaudeCodeAdapter.ts`.

## Implementation Notes

### `tryPidFileMatching()`

No upfront directory check — each PID is always tried individually via try/catch.

```typescript
private tryPidFileMatching(processes: ProcessInfo[]): {
direct: Array<{ process: ProcessInfo; sessionFile: SessionFile }>;
fallback: ProcessInfo[];
} {
const sessionsDir = path.join(os.homedir(), '.claude', 'sessions');
const direct: Array<{ process: ProcessInfo; sessionFile: SessionFile }> = [];
const fallback: ProcessInfo[] = [];

for (const proc of processes) {
const pidFilePath = path.join(sessionsDir, `${proc.pid}.json`);
try {
const raw = fs.readFileSync(pidFilePath, 'utf-8');
const entry = JSON.parse(raw) as PidFileEntry;

// Stale-file guard: reject if startedAt diverges from enriched proc.startTime by > 60 s
if (proc.startTime) {
const deltaMs = Math.abs(proc.startTime.getTime() - entry.startedAt);
if (deltaMs > 60_000) {
fallback.push(proc);
continue;
}
}

const projectDir = this.getProjectDir(entry.cwd);
const jsonlPath = path.join(projectDir, `${entry.sessionId}.jsonl`);

if (!fs.existsSync(jsonlPath)) {
fallback.push(proc);
continue;
}

const sessionFile: SessionFile = {
sessionId: entry.sessionId,
filePath: jsonlPath,
projectDir,
birthtimeMs: 0, // not used for direct matches
resolvedCwd: entry.cwd,
};
direct.push({ process: proc, sessionFile });
} catch {
// PID file absent, unreadable, or malformed → fall back per-process
fallback.push(proc);
}
}

return { direct, fallback };
}
```

### `detectAgents()` changes

After `enrichProcesses(processes)`:

1. Call `tryPidFileMatching(processes)` → `{ direct, fallback }`.
2. Run existing `discoverSessions(fallback)` + `matchProcessesToSessions(fallback, sessions)` only on `fallback`.
3. Merge `direct` matches and `legacyMatches` into a single list before iterating to build `AgentInfo`.

### `PidFileEntry` interface

Add near the top of `ClaudeCodeAdapter.ts`:

```typescript
interface PidFileEntry {
pid: number;
sessionId: string;
cwd: string;
startedAt: number; // epoch milliseconds
kind: string;
entrypoint: string;
}
```

## Error Handling

- Any `fs.readFileSync` failure (file not found, permission denied) → catch → push to fallback.
- JSON parse failure → catch → push to fallback.
- `fs.existsSync` on JSONL → false → push to fallback.
47 changes: 47 additions & 0 deletions docs/ai/planning/feature-claude-sessions-pid-matching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
phase: planning
title: Project Planning & Task Breakdown
description: Break down work into actionable tasks and estimate timeline
---

# Project Planning & Task Breakdown

## Milestones

- [x] Milestone 1: Implementation — `ClaudeCodeAdapter` updated with PID-file matching
- [x] Milestone 2: Tests — unit tests for new code paths pass, existing tests remain green
- [ ] Milestone 3: Review — code review complete, ready to merge

## Task Breakdown

### Phase 1: Implementation

- [x] Task 1.1: Add `tryPidFileMatching()` private method to `ClaudeCodeAdapter`
- [x] Task 1.2: Integrate `tryPidFileMatching()` into `detectAgents()`
- [x] Task 1.3: Define `PidFileEntry` and `DirectMatch` interfaces (internal to `ClaudeCodeAdapter.ts`)

### Phase 2: Tests

- [x] Task 2.1: Unit tests for `tryPidFileMatching()` — 8 cases covering all branches
- [x] Task 2.2: Integration tests for `detectAgents()` — direct-only and mixed scenarios
- [x] Task 2.3: All 156 tests pass (145 existing + 11 new)

### Phase 3: Cleanup & Review

- [x] Task 3.1: Run `npx ai-devkit@latest lint --feature claude-sessions-pid-matching`
- [ ] Task 3.2: Code review

## Dependencies

- Tasks 1.2 and 1.3 depend on Task 1.1.
- Task 2.1 depends on Task 1.1.
- Task 2.2 depends on Tasks 1.2 + 1.3.
- Task 2.3 can run in parallel with Task 2.1/2.2 as a sanity check.

## Risks & Mitigation

| Risk | Likelihood | Mitigation |
|------|-----------|------------|
| PID file `cwd` encoding differs from lsof cwd (e.g. symlinks) | Low | Use PID file cwd for encoding; document this as the authoritative source |
| `~/.claude/sessions/` path differs across Claude Code versions | Low | Derive path from `os.homedir()` same as existing `~/.claude/projects/` |
| Race condition: process exits between ps and PID file read | Very low | `fs.existsSync` + try-catch; treat as fallback |
64 changes: 64 additions & 0 deletions docs/ai/requirements/feature-claude-sessions-pid-matching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
phase: requirements
title: Requirements & Problem Understanding
description: Clarify the problem space, gather requirements, and define success criteria
---

# Requirements & Problem Understanding

## Problem Statement
**What problem are we solving?**

- Newer versions of Claude Code write a file at `~/.claude/sessions/<pid>.json` for each running process. This file contains `{ pid, sessionId, cwd, startedAt }`.
- The current Claude adapter in agent-manager matches processes to sessions by encoding the process CWD into a `~/.claude/projects/<encoded>/` directory path and then finding the closest JSONL session file by birthtime (within a 3-minute tolerance).
- This birthtime-based heuristic can produce incorrect matches when multiple Claude processes share the same CWD, or when the session file birthtime diverges significantly from the process start time.
- Users of the agent-manager CLI (`agent list`) may see stale, mismatched, or missing session data as a result.

## Goals & Objectives
**What do we want to achieve?**

- **Primary**: Use `~/.claude/sessions/<pid>.json` as the authoritative source for process-to-session mapping when the file exists for a given PID.
- **Secondary**: Fall back to the existing CWD-encoding + birthtime heuristic for processes where no `~/.claude/sessions/<pid>.json` file is present (older Claude Code versions or sessions not yet written).
- **Non-goals**:
- Changing how session JSONL content is parsed or how status is determined.
- Modifying any adapter other than `ClaudeCodeAdapter`.
- Supporting Windows-specific paths (existing macOS/Linux conventions apply).

## User Stories & Use Cases
**How will users interact with the solution?**

- As an agent-manager user, I want `agent list` to correctly associate each running Claude process with its active session, so that I see accurate status and message summaries.
- As a developer running multiple Claude instances in the same directory, I want each instance to be matched to its own session (not mixed up), so the list output is unambiguous.

**Edge cases to consider:**
- PID file exists but references a `sessionId` whose JSONL does not exist → fall back to legacy matching for that process.
- PID file exists but `cwd` in the file differs from the process's actual CWD reported by `lsof` → trust the PID file's `sessionId` and `cwd` (it is authoritative).
- Stale PID file (process exited, PID reused by a new Claude process) → cross-check `startedAt` (epoch ms) against `proc.startTime` from enrichment; if the delta exceeds 60 seconds, treat as stale and fall back to legacy matching for that process.
- PID file absent for a given process (e.g. older Claude Code) → fall back to legacy matching for that process only. No directory-level check is needed; each PID is tried individually.
- Multiple processes; only some have PID files → use PID files for those that have them, legacy matching for the rest.

## Success Criteria
**How will we know when we're done?**

- `ClaudeCodeAdapter.detectAgents()` reads `~/.claude/sessions/<pid>.json` for each discovered PID and uses the `sessionId` from the file to locate the correct JSONL in `~/.claude/projects/`.
- Processes without a matching PID file are matched via the existing legacy algorithm without regression.
- All existing tests continue to pass.
- New unit tests cover: PID-file happy path, PID-file missing JSONL fallback, directory absent, mixed (some PIDs have files, some don't).

## Constraints & Assumptions
**What limitations do we need to work within?**

- `~/.claude/sessions/<pid>.json` schema (verified from real files):
```json
{ "pid": 81665, "sessionId": "87ada2e7-...", "cwd": "/Users/...", "startedAt": 1774598167519, "kind": "interactive", "entrypoint": "cli" }
```
- `startedAt` is **epoch milliseconds** (not an ISO string).
- `kind` and `entrypoint` fields are present but not used by this feature.
- The JSONL for a session lives at `~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl` — the same location the legacy algorithm already discovers.
- Reading individual small JSON files per PID is acceptable; no batching of the PID file reads is required (files are tiny).
- `enrichProcesses()` continues to run on all processes (direct + fallback) before the PID-file split — the batched `lsof`/`ps` call is cheap and `proc.startTime` is needed for the stale-file guard.
- The feature must remain backward-compatible with older Claude Code installs that do not write PID files.

## Questions & Open Items

- None — requirements are clear from the user's description and existing code analysis.
52 changes: 52 additions & 0 deletions docs/ai/testing/feature-claude-sessions-pid-matching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
phase: testing
title: Testing Strategy
description: Define testing approach, test cases, and quality assurance
---

# Testing Strategy

## Test Coverage Goals

- 100% branch coverage of `tryPidFileMatching()`
- `detectAgents()` integration paths for direct-match and fallback-only scenarios
- No regression in existing tests

## Unit Tests

### `tryPidFileMatching()`

- [x] PID file present + JSONL exists + `startedAt` within 60 s of `proc.startTime` → process in `direct` with correct `sessionId` and `resolvedCwd`
- [x] PID file present + JSONL missing → process in `fallback`
- [x] PID file present but `startedAt` > 60 s from `proc.startTime` (stale/reused PID) → process in `fallback`
- [x] `startedAt` within 30 s (boundary) → accepted as direct match
- [x] PID file absent for a PID (file not found) → process in `fallback`, no crash
- [x] PID file contains malformed JSON → process in `fallback` (no throw)
- [x] Sessions dir entirely absent (no PID file for any process) → all processes in `fallback`, no crash
- [x] Mixed: 2 PIDs with files, 1 without → correct split across `direct` and `fallback`
- [x] `proc.startTime` is undefined (enrichment failed) → stale-file check skipped, proceed normally

### `detectAgents()` integration

- [x] All direct matches: `discoverSessions` and `matchProcessesToSessions` not called
- [x] Mixed: direct matches merged correctly with legacy matches in final `AgentInfo` list
- [x] Direct match produces `AgentInfo` with correct `sessionId`
- [x] Direct-matched JSONL becomes unreadable after existence check → process falls back to IDLE
- [x] Legacy-matched JSONL becomes unreadable after match → process falls back to IDLE

## Test Data

Real `tmp` directories with JSON/JSONL fixtures. `jest.spyOn` used only for race-condition branches (lines 128, 141).

## Test Reporting & Coverage

Run: `cd packages/agent-manager && npm test -- --coverage --collectCoverageFrom='src/adapters/ClaudeCodeAdapter.ts'`

| Metric | Result |
|--------|--------|
| Statements | 98.73% |
| Branches | 89.79% |
| Functions | 100% |
| Lines | 99.35% |

**Remaining gap — line 314** (`return null` after `allLines.length === 0` in `readSession`): dead code. `''.trim().split('\n')` always returns `['']` (length ≥ 1), so this condition is structurally unreachable. No test can cover it without modifying the source.
Loading
Loading