Skip to content

VolanticSystems/claudecm-stack

Repository files navigation

My Claude is Better than Your Claude

At least for a week, until Anthropic adds all this to Claude.

What I Built and Why

If you use Claude Code for varied tasks, as a lot of people do, you can quickly find yourself juggling multiple projects, with no easy way to keep track of them all. I work across data engineering, UX design, Linux sysadmin, browser automation, debugger tooling, deployment pipelines, and video processing.

At any given time I've got around 20 sessions at various phases of planning and development. I like to pick one up, work for a few hours or more until a milestone is reached, then park it and come back at some point, perhaps the same or next day, or maybe a lot longer than that.

Claude Code has two problems with this workflow. The first one hits you immediately. When you exit a session, you get this:

Resume this session with:
claude --resume a17b8d20-71c1-4eb6-8e7d-438222b649fc

Maybe it finds that intuitive, but I don't. I doubt you do either.

But besides not making it easy to keep track of sessions, there's another problem. When your context window fills up, it compacts; it summarizes the conversation to free space. That summary loses things. File paths, debugging state, and probably most frustratingly, the reasoning behind decisions you made three hours ago.

You come back the next day, resume the session, and Claude has forgotten half of what you were doing. Anthropic is addressing this second problem, but there are still a number of things to improve further. Fortunately, there are public repos out there to fill these gaps.

Lastly, I added a simple prompt fix. Hey, why not? Let's make Claude as good as we can.

Here's what I built and what it actually does:

The Session Problem

Claude Code identifies sessions with GUIDs. As I mentioned above, when you exit, it says:

Resume this session with:
claude --resume a17b8d20-71c1-4eb6-8e7d-438222b649fc

Try managing 20 of those. You can't. But the GUID problem is worse than it looks, because it doesn't just affect the command line. That GUID is also what shows up in your terminal tab title, in the Claude Code app on your phone, and in every window on your screen. I have six monitors. On a busy day I might have four or five Claude sessions running at once. Before I fixed this, they all looked the same. I'd glance at a terminal, have no idea which project it was, and start typing into the wrong session. On the phone it was worse: a wall of hex strings with no way to tell which machine was even running which session.

So I wrote a wrapper called Claude Context Manager (ClaudeCM). It replaces the GUID with a name you choose, and that name follows the session everywhere.

When you launch a session, ClaudeCM sets a display name in the format machine - project. My desktop sessions show up as desktop - YouTube processor, desktop - Claude Context Manager, desktop - Octane Website. If I'm checking in from my phone, I see those names instead of GUIDs. If I had sessions running on a laptop too, they'd show as laptop - whatever. At a glance, from any device, I know what's running where.

On the terminal side, the effect is immediate. Right now I've got two windows open. One says "Claude Context Manager" in the tab. The other says "YouTube processor." I never mix them up. That sounds trivial until you've pasted a database migration command into the wrong project because both tabs said claude --resume a17b8d20.

By default, ClaudeCM launches Claude with --dangerously-skip-permissions. I make sure I have rock solid daily backups, and by trusting Claude, I avoid having to hit "yes" 400 times a day. If you prefer a more conservative approach, you can change the script to omit that flag; Claude will prompt for approval on each tool use, and you can always press Shift+Tab to toggle permissions on the fly.

ClaudeCM gives you:

  • Named sessions everywhere. You pick the name. It shows up in your terminal tab, in the Claude app on your phone, in the session list. claudecm l shows all your named sessions in most-recently-used order. No more hex strings, anywhere.

  • Resume by number. claudecm 3 gets you back into session 3. One command.

  • Session detection. Run claudecm in a project directory and it finds your existing session automatically. No session found? It offers to create one with a name you choose.

  • Orphan detection. On every resume, ClaudeCM scans for stray conversation files that don't belong. You see exactly what's in the project directory and can quarantine anything that shouldn't be there, before it causes the wrong session to load.

  • Session index sync. Claude Code's built-in /resume picker relies on an internal index that is undocumented and frequently broken. ClaudeCM validates and repairs it on every operation, so the picker stays functional even after renames, quarantines, and refreshes.

  • Inline editing. Rename, reorder, change paths, archive, or permanently delete sessions, all from one menu.

  • Archive and delete. Archive moves a session off the active list but keeps all files on disk. View and unarchive any time from the session list. Delete is permanent: it removes the session entry, the JSONL conversation file, and associated data. Requires typing "delete" to confirm. Two levels of cleanup for two different needs.

  • Transcript protection. Claude Code defaults to deleting your session transcripts after 30 days. No warning, no prompt, no way to recover them. ClaudeCM sets cleanupPeriodDays to 100,000 on first launch (about 274 years), so your conversations stay on disk until you decide otherwise. If a transcript has already been lost, picking that session gives you three options: start fresh, generate a recovery-prompt.md file for that project (ClaudeCM writes it from the surviving memory files and subagent state; you edit it if you want, then paste it as the first message of a new session), or cancel. Existing recovery-prompt.md files get rotated to .old, .old2, etc., so nothing is overwritten.

  • Concurrency-safe. Built for users who run multiple Claude sessions in parallel windows. ClaudeCM never uses CMV's --latest selector (which picks globally across all projects and silently grabs the wrong session), never scans across project directories for "the newest file," and never silently copies JSONLs between projects. Every session is identified by a specific GUID resolved through three independent layers: the ~/.claude/sessions/<pid>.json manifest, a project-scoped JSONL snapshot diff, and a project-scoped fallback. If any two layers disagree, ClaudeCM warns loudly. The sessions.txt registry uses file locking and atomic rename writes so two parallel ClaudeCM operations cannot corrupt each other's changes.

It's a PowerShell function on Windows, or a bash script on Linux. Once it is set up, you are done.

That solved the "which session was I in" problem. But it didn't solve the "Claude forgot everything" problem, so I addressed that next.

The Context Stack

After doing some research, including a review of recent Anthropic updates, I still saw value in three tools, layered on top of each other. They don't conflict because they operate at different layers: Context-Manager hooks into compaction events, CMV manages the context window size, and Claude-Mem runs a separate background worker. Different hook events, different storage, no interference. Each one handles a different failure mode.

Layer 1: Context-Manager, Compaction Insurance

This first tool was written by GitHub user DxTa (Minh Duc). When compaction is about to fire, a hook reads through the conversation transcript and extracts structured state: which files were modified, what task was in progress, what decisions were made, which commands kept failing. It saves all of this as a JSON checkpoint.

After compaction completes, another hook fires and injects a recovery summary into the fresh context. Claude comes back online knowing what it was doing, what files it touched, and what already failed.

There's also a dedup system running during normal work. It tracks how many times each file has been read in a session. Second read gets a warning. Third read gets truncated to 50 lines. Fourth read is blocked entirely. This sounds aggressive, but re-reading the same 800-line file four times is the number one way sessions burn through their context window for no reason. When you write to a file, the counter resets.

Failed commands get tracked too. If the same command fails twice, Claude gets a "try something different" advisory. Three times, it gets blocked from retrying. This stops the loop-and-retry pattern that eats context and produces nothing.

Layer 2: CMV, Virtual Memory for Context

The next tool is by GitHub user CosmoNaught. CMV treats your conversation like virtual memory. The analogy is direct: just as an OS pages memory in and out of physical RAM, CMV pages understanding in and out of the context window.

The key operation is trim. A session that's 152K tokens and 76% full can be trimmed down to 23K tokens --- an 85% reduction --- without losing a single user message or Claude response. What gets stripped is the mechanical bloat: raw file dumps from Read operations, tool call metadata, base64 image blocks, thinking signatures. Claude's actual synthesis stays. If it needs a file again, it re-reads it. That's cheaper than carrying the original read around for the rest of the session.

You can also snapshot context state (like a git commit) and branch from snapshots (like git checkout -b). This means you can build up deep understanding of an architecture, snapshot it, and then branch into multiple independent work streams that all start from that shared understanding. Build context once, reuse it everywhere.

Layer 3: Claude-Mem, Cross-Session Memory

Layer 3 is by Alex Newman. (thedotmack) The first two layers handle within-session survival. Claude-Mem handles the across-session problem.

It runs alongside every session, watching what Claude does. Every file edit, every decision, every tool call gets captured. A background worker compresses these observations using Claude's own agent SDK --- not raw transcript dumps, but AI-generated semantic summaries that capture what matters and discard what doesn't.

When you start a new session or resume an old one, Claude-Mem injects relevant compressed memories from previous sessions. You come back to a project after a week and Claude already knows the architecture, the conventions, the decisions you made last time, the bugs you hit.

Claude Code already has built-in auto-memory via MEMORY.md for basic cross-session persistence. Claude-Mem adds a deeper layer on top: AI-compressed observations of actual tool usage, file modifications, and error patterns, not just the notes the model chose to save on its own.

What This Actually Looks Like

Without the stack: You resume a session. Claude has a vague summary of what happened. You spend 10-15 minutes re-explaining context. Files need to be re-read. Previous debugging state is gone. You repeat mistakes that were already solved.

With the stack: You resume a session. The context-manager checkpoint restores your working state. CMV has trimmed the bloat so you have room to work. Claude-Mem injects compressed memories from last time. Claude picks up roughly where you left off. Not perfectly --- nothing is perfect --- but the difference between "where were we?" and "I was debugging the auth middleware and the token refresh had a race condition on line 247" is the difference between a productive morning and a frustrating one.

Why Not Just Use Volt?

The short answer is because it requires API access which is far more expensive, but in the interest of a more complete technical discussion, Volt is a research project from Voltropy that takes this idea further. It replaces Claude Code entirely with a dual-state memory architecture: an immutable store that never deletes anything, and an active context window that gets curated per-turn based on relevance.

On benchmarks, Volt running Opus 4.6 beats Claude Code at every context length from 32K to 1M tokens. The numbers are real. The paper is peer-discussed. The approach is sound.

But Volt replaces Claude Code. All of it. Your MCP servers, your hooks, your plugins, your session management, your shell integration --- gone. You're in a different ecosystem.

The three-tool stack gets you most of Volt's practical benefits while keeping everything Claude Code gives you. The one thing you can't replicate is per-turn relevance filtering (Volt actively decides what to send to the API each turn; Claude Code always sends the full compacted history). That would require access to Claude Code's message assembly pipeline, which isn't exposed.

For my money, the tradeoff is worth it. I'd rather have 90% of Volt's context management plus the full Claude Code ecosystem than 100% of Volt and nothing else.

The Numbers

I don't have controlled benchmarks. What I have is daily use across 20 projects over several weeks.

Before the stack, I'd hit compaction every long session and lose 20-30 minutes recovering context. Sessions that should have been continuous felt like they reset every couple hours.

After the stack, compaction still happens, but the recovery is measured in seconds. The dedup hooks mean I burn through context slower in the first place. The cross-session memory means coming back to a project after days doesn't start cold.

Is it quantifiable? Loosely. I'd estimate I'm saving 30-60 minutes per day that used to go to re-establishing context. On a busy day with multiple sessions, more than that.

Too Much of a Good Thing

At first I noticed the three-layer stack worked amazingly well. But then I started seeing my Claude usage creeping up more and more. I recently upgraded my subscription, and this was the first time I ran out at the new level. Obviously there was a downside, and I needed a relief valve to cut my token usage. I ended up implementing two of them, actually; one easy to use with minimal impact, and one that can cut usage way down, sometimes by orders of magnitude, but is best saved for milestones.

The three-layer stack protects you from losing context during compaction. It does not protect you from accumulating context you no longer need. Debugging output, file dumps from three days ago, sample code you read once and never referenced again. All of it rides along on every interaction.

Trim was the first fix. It uses CMV's trim feature (Layer 2 in the stack above) to strip the mechanical bloat out of a session: tool outputs, file dumps, base64 image blocks, and thinking signatures. Your actual conversation stays intact. Every user message and every Claude response is preserved. You just lose the verbose evidence of how it got there.

A great feature here is that trim tells you the session size before and after. I had sessions go from 152K tokens down to 23K. Others barely moved. Either way, you see exactly how large the context is in the first place, and whether a deeper clean is needed.

Refresh is the second step. It's far more aggressive, but reversible. Instead of trimming the existing conversation, it starts a completely fresh session and carries forward only what matters.

The original approach was simple: start a new session, tell Claude to read the old transcript and figure out what's important. It worked, but it was expensive (Claude reads millions of tokens), slow, and unreliable. Items buried deep in a long transcript got missed. File tracking was particularly weak because pure AI summarization consistently underperforms on artifact tracking.

This isn't just my observation. Three independent research efforts confirm it:

Factory.ai analyzed 36,000+ production coding agent messages and compared structured extraction against native compaction from both Anthropic and OpenAI. Their structured approach scored 3.70/5.0 for accuracy; Anthropic's native compaction scored 3.44, OpenAI 3.35. The most revealing finding: artifact tracking (which files were touched, what state they're in) scored lowest across all methods (2.19 to 2.45 out of 5.0). AI summarization alone is not reliable enough for tracking what files were modified. You need rule-based extraction for that.

Chroma's "Context Rot" research tested 18 LLMs and found that performance degrades as input length grows, even when the context window can technically hold everything. Focused inputs dramatically outperformed full context dumps. A lean, structured summary actually produces better results than feeding back the raw transcript, even if it fits.

Zylos found that context drift, not just token exhaustion, is the primary failure mode in long sessions. They recommend triggering compaction at 70% of available context budget, before reasoning quality degrades.

So refresh now uses a hybrid approach. Before starting the new session, a script mechanically extracts a structured skeleton from the old JSONL: every file modified, every file read, errors encountered, and the most recent exchanges. This skeleton is guaranteed correct because it's pulled directly from tool call records, not summarized by AI.

Alongside the skeleton, the script produces a filtered transcript: the conversation text plus one-line summaries of every tool call (Read: path/to/file.py, Bash: run the test suite, Grep: "auth" in src/), with all tool output stripped. This preserves the investigation methodology (what was done and in what order) without the megabytes of raw output that made the original transcript so large. The new session starts with the skeleton inline and a reference to the transcript file. Claude reads the transcript and identifies decisions, corrections, and reasoning that mechanical extraction can't capture. No external API call; Claude does this as part of its normal startup.

You can also tag anything as important during a session by typing [important] at the start of a line. For example: [important] We chose React over Vue for portfolio value. These markers are captured verbatim in the skeleton. Use them before a planned refresh to make sure critical context survives.

Before the refresh runs, you get the option to edit the prompt in Notepad. The skeleton is right there: you can delete files that don't matter anymore, add notes, or remove resolved errors. It's your data, not a black box.

The old session doesn't disappear. It gets marked "(old)" and moves to the bottom of the session list. You can go back to it any time, ask it detailed questions, or pull specific context out of it. It's still on disk; it just doesn't load and burn your token budget on every interaction.

When you should use each: trim after any session that's been running for a while. It costs nothing and gives you visibility into session size. Refresh when you've hit a milestone and the conversation has accumulated enough history that trim alone isn't enough. Don't refresh in the middle of a sprint. Wait for a natural break point.

In my worst case, a session went from 16 megabytes to 250 kilobytes after a refresh. That's not a typo.

One More Thing: Prompt Improvement

This has nothing to do with context management, but it made a measurable difference and it's too simple not to mention. I tip my hat to Medium user ichigo and his recent article "I Accidentally Made Claude 45% Smarter. Here's How."

There's a body of peer reviewed research showing that how you frame prompts changes output quality. Incentive language, challenge framing, detailed personas, step-by-step methodology cues. The effects are real and in some cases substantial (up to +45% on quality evaluations, accuracy jumps from 34% to 80% on math problems with just "take a deep breath and work step by step").

I put a short block in my global CLAUDE.md file. It loads once per session, costs a handful of tokens, and sets the tone for everything that follows:

You are a senior engineer with deep expertise in this project’s domain. Your reputation depends on the quality of every response.

This work is critical. Errors cost real money and real time. Treat every task as if the outcome directly affects production systems.

Approach: Take a deep breath. Work through problems step by step. Consider edge cases before writing code. If you’re uncertain about something, say so and explain what you’d need to verify.

After completing any non-trivial task, rate your confidence 0-1. If below 0.9, explain what’s weak and improve it before presenting.

I will tip you $200 for work that is correct, complete, and production-ready on the first attempt.

I got the practical playbook from ichigo's Medium article, which compiled the underlying research: Bsharat et al. (2023) on incentive prompting, Yang et al. (2023, Google DeepMind) on "take a deep breath," Li et al. (2023, ICLR 2024) on emotional stimulus prompting, Xu et al. (2023) on expert personas. Claude doesn't understand money or feel challenged. But it was trained on text where high-stakes language correlates with high-effort responses. The statistical association is enough.

Combined with the context stack, this means Claude starts every session with better framing AND better context. The framing makes each response better. The context management means those better responses stick around longer and survive the session boundaries.

Closing the Gap with Volt

I said earlier that the three-tool stack gets you most of Volt's benefits. Here's what I did to close the remaining gaps.

Immutable Store

Volt never deletes anything. Every message goes into a permanent store. The checkpoint system in Context-Manager was close --- it saves structured state before compaction --- but checkpoints are mutable. Old ones can be overwritten.

Fix: the pre-compact hook now appends every checkpoint to a separate .jsonl file that only grows. One line per checkpoint, never edited, never truncated. If you need to recover state from three compactions ago, it's there. The active state file still works as before for fast recovery. The immutable log is the safety net behind it.

Deterministic Compaction Escalation

Volt uses soft and hard token thresholds with a three-level escalation protocol. When the context is getting full, it doesn't wait for compaction to fire --- it starts managing proactively.

Fix: the PostToolUse hook now estimates cumulative token cost from all tool outputs in the session. At 120K estimated tokens (soft threshold), it advises saving a snapshot and trimming. At 160K (hard threshold), it escalates to urgent. This turns "surprise compaction" into "managed compaction" --- you get warning before the cliff, not after.

These thresholds are estimates based on tool output size, not exact token counts. They won't be perfectly calibrated. But they don't need to be --- the point is getting a warning 5-10 minutes before things go sideways instead of finding out when Claude starts forgetting what file it was editing.

Session-Aware Snapshots

Claude Context Manager manages sessions. CMV manages context snapshots. They didn't talk to each other.

Fix: when you exit any Claude Context Manager session, it automatically creates a CMV snapshot labeled with the timestamp. This means every session exit is a save point. If the next session goes badly, or if compaction destroys important state, there's always a snapshot from the moment you left off.

The combination means your session list in Claude Context Manager isn't just a list of names and GUIDs anymore --- each entry has corresponding context snapshots that CMV can restore.

The Setup

Everything described here is open source and runs locally:

  • ClaudeCM --- Session manager. PowerShell or bash. Installed to ~/.claudecm/ and dot-sourced from your shell profile.

  • Context-Manager --- Compaction checkpoint hooks + dedup tracking. Node.js MCP server + bash hooks.

  • CMV --- Context trimming, snapshots, branching. Node.js CLI tool with auto-trim hooks.

  • Claude-Mem --- Cross-session AI-compressed memory. Node.js + Bun worker service.

  • Prompt booster --- One text block in ~/.claude/CLAUDE.md.

Total install time is about 20 minutes if you have Node.js already. The prompt booster is 30 seconds.

None of these tools know about each other. They work at different layers and don't conflict. Context-Manager handles the compaction boundary. CMV manages context size. Claude-Mem bridges sessions. The prompt booster shapes output quality. They stack cleanly because they solve different problems.

TLDR: OK, How Do I Set This Up Easily?

I've put everything in one place: both ClaudeCM scripts, the install guide, the prompt booster, and a detailed comparison of all the tools I evaluated. It's all at github.com/VolanticSystems/claudecm-stack. Fork it, use it, improve it.

Links

About

Help Claude Code (and you) keep track and remember what you are working on.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors