JavaScript-native AI runtime — SSM execution + Transformer orchestration + online distillation + persistent agent memory.
SSM.js is a complete, self-contained AI runtime built directly on top of MambaCode.js. It includes the full session layer (previously @seanhogg/mambakit) as an internal layer, so you only need one package.
SSM.js is a JavaScript-native AI runtime that combines local SSM (State Space Model) inference with optional transformer bridge escalation, persistent semantic memory, and online distillation — all without leaving the browser or Node.js process.
The layered stack:
MambaCode.js → WebGPU kernels (WGSL, Mamba-1/2/3 SSM math)
SSM.js → Session layer + Runtime orchestration (this package)
├── src/session/ MambaSession, tokenizer, persistence
├── src/runtime/ SSMRuntime, routing
├── src/memory/ MemoryStore
├── src/agent/ SSMAgent
└── src/distillation/ DistillationEngine
| Capability | SSM.js |
|---|---|
| Simple session API | ✅ |
| WebGPU execution | ✅ |
| SSM variants (1/2/3/hybrid) | ✅ |
| Transformer bridge | ✅ |
| Intelligent routing | ✅ |
| Online distillation | ✅ |
| Persistent semantic memory | ✅ |
| Agent workflows | ✅ |
npm install @seanhogg/ssmjs
# or
pnpm add @seanhogg/ssmjs@seanhogg/ssmjs includes the full session layer (previously @seanhogg/mambakit).
@seanhogg/mambacode.js is a peer dependency — install it alongside:
npm install @seanhogg/ssmjs @seanhogg/mambacode.jsNode.js 18+ is required. Two additional shims are needed for Node.js:
npm install @webgpu/node fake-indexeddb@webgpu/node— Dawn-based WebGPU for Node.js; drives all WGSL compute kernelsfake-indexeddb— in-memory IndexedDB compatible with the IDB spec; used byMemoryStore
import { SSM, AnthropicBridge, SSMAgent, MemoryStore } from '@seanhogg/ssmjs';
const runtime = await SSM.create({
session: { modelSize: 'small', mambaVersion: 'mamba2' },
bridge : new AnthropicBridge({ apiKey: 'sk-ant-...' }),
});
// Generate — routes to SSM or transformer automatically
const answer = await runtime.generate('What is a state space model?');
// Streaming — always SSM for low-latency output
for await (const token of runtime.stream('function fibonacci(')) {
process.stdout.write(token);
}
// Fine-tune on your content
await runtime.adapt(myCodebase);
runtime.destroy();import { create as createGPU } from '@webgpu/node';
import { IDBFactory } from 'fake-indexeddb';
import { SSM, MemoryStore, SSMAgent } from '@seanhogg/ssmjs';
const gpuAdapter = await createGPU().requestAdapter({ powerPreference: 'high-performance' });
const idbFactory = new IDBFactory();
const runtime = await SSM.create({
session: {
gpuAdapter,
idbFactory,
modelSize: 'small',
},
});
const memory = new MemoryStore({ idbFactory });
const agent = new SSMAgent({ runtime, memory });
await agent.init(); // loads persisted history if present
const reply = await agent.think('Explain this codebase');
console.log(reply);
await agent.destroy(); // persists history, releases GPUBy default, MambaSession uses the built-in Qwen2.5-Coder BPE tokenizer. You can override this by passing any object that satisfies the Tokenizer interface:
import type { Tokenizer } from '@seanhogg/ssmjs';
const myTokenizer: Tokenizer = {
encode(text: string): number[] { /* your encode implementation */ return []; },
decode(tokens: number[]): string { /* your decode implementation */ return ''; },
get vocabSize(): number { return 32000; },
};
const runtime = await SSM.create({
session: {
tokenizer: myTokenizer, // replaces BPETokenizer entirely
modelSize: 'small',
},
});Use cases:
- HuggingFace Transformers.js tokenizer — wrap its
encode/decodein the interface - Unit testing — a stub tokenizer that maps words to sequential IDs, no network needed
- Domain-specific vocabularies — medical, legal, multilingual tokenizers
MemoryStore is a persistent, TTL-aware, tagged key-value fact store backed by IndexedDB.
import { MemoryStore } from '@seanhogg/ssmjs';
const memory = new MemoryStore({
dbName : 'my-app',
defaultTtlMs: 7 * 24 * 60 * 60 * 1000, // 7-day default TTL
idbFactory, // Node.js only
});
// Store facts
await memory.remember('author', 'Sean Hogg');
await memory.remember('stack', 'React + TypeScript', {
tags : ['tech', 'project'],
importance: 0.8,
ttlMs : 30 * 24 * 60 * 60 * 1000, // 30 days
});
// Retrieve
const entry = await memory.recall('author');
// All non-expired facts, newest first
const all = await memory.recallAll();
// N most recent non-expired facts
const recent = await memory.recallRecent(10);
// Filter by tag
const techFacts = await memory.recallByTag('tech');
// Semantic similarity search (Jaccard word-overlap; SSM embeddings in future)
const similar = await memory.recallSimilar('who built this?', 5, runtime);
// Purge expired entries from storage
const deletedCount = await memory.purgeExpired();// Export all non-expired facts from sessionA
const exported = await memoryA.exportAll();
// Import into sessionB
await memoryB.importAll(exported, 'merge');
// 'merge' — only overwrites if incoming entry is newer
// 'overwrite' — writes all entries unconditionallyawait memory.saveWeights(runtime); // saves model weights to IndexedDB
const loaded = await memory.loadWeights(runtime); // false if no checkpoint foundinterface MemoryEntry {
key : string;
content : string;
timestamp : number;
ttlMs? : number; // optional TTL; entry filtered after timestamp + ttlMs
type? : FactType; // 'text' | 'json' | 'number' | 'boolean'
tags? : string[]; // for grouping/filtering
importance?: number; // 0–1, default 0.5; higher facts appear first in prompts
}InferenceRouter decides whether each request goes to the local SSM or the transformer bridge. It is built into SSMRuntime — you don't need to instantiate it directly.
const runtime = await SSM.create({
session: { modelSize: 'nano' },
bridge : claude,
routingStrategy : 'auto', // 'auto' | 'ssm' | 'transformer'
longInputThreshold: 1200, // chars before preferring transformer (default: 1200)
perplexityThreshold: 80, // SSM perplexity cutoff (default: 80)
});Auto-routing heuristics (cheapest first):
- Complexity patterns — "step by step", "analyze", "compare and contrast" → transformer
- Input length — over threshold → transformer
- SSM perplexity — async probe; high perplexity = novel topic → transformer
route() now returns a structured RoutingDecision object:
interface RoutingDecision {
target : 'ssm' | 'transformer';
reason : 'strategy' | 'complexity' | 'length' | 'perplexity' | 'no_bridge';
confidence: number; // 0–1
details? : string; // human-readable explanation
}Every routing decision is appended to an in-memory audit log (last 500 entries):
const log = runtime.getRoutingAuditLog();
// log: RoutingAuditEntry[]
// { timestamp, inputLength, decision: RoutingDecision, durationMs }Teach the local SSM using a transformer teacher — runs entirely in the browser or Node.js.
import { DistillationEngine } from '@seanhogg/ssmjs';
const distiller = new DistillationEngine(runtime, claude);
// Single pass: claude generates → SSM adapts on output
const result = await distiller.distill('Explain WebGPU compute shaders', {
adapt : { wsla: true, epochs: 3 },
qualityGate: {
minLength : 50, // skip if teacher output < 50 chars
maxPerplexity: 15, // skip if SSM perplexity already < 15 (already learned)
},
});
console.log('skipped:', result.skipped, result.skipReason);
console.log('loss:', result.adaptResult.losses.at(-1));
// Batch distillation
const batch = await distiller.distillBatch([
'What is a Mamba block?',
'Explain WSLA adaptation.',
], { adapt: { wsla: true, epochs: 5 } });
console.log(`${batch.totalEpochs} epochs in ${batch.totalMs}ms`);| Gate option | Description |
|---|---|
minLength |
Skip if teacher output is shorter than N characters (low-quality response) |
maxPerplexity |
Skip if SSM perplexity on teacher output is already below threshold (already learned) |
const log = distiller.getLog();
// log: DistillationLog[]
// { timestamp, input, teacherOutputLength, skipped, skipReason?, finalLoss?, epochs }The log is bounded to the last 200 entries.
High-level orchestration: conversation history, routing, memory injection, and lifecycle.
import { SSMAgent, MemoryStore } from '@seanhogg/ssmjs';
const memory = new MemoryStore();
const agent = new SSMAgent({
runtime : runtime,
memory,
systemPrompt : 'You are a senior TypeScript engineer.',
maxHistoryTurns: 20,
persistHistory : true, // saves/loads history via memory on destroy/init
});
// Load persisted history from a prior session
await agent.init();
// Store project context
await agent.remember('stack', 'React 18, TypeScript 5, Vite');
// Multi-turn conversation — facts with highest importance appear first in context
const reply1 = await agent.think('What stack should I use?');
const reply2 = await agent.think('How do I handle concurrent edits?');
// Streaming
for await (const token of agent.thinkStream('Show me a WebSocket hook')) {
process.stdout.write(token);
}
// Teach the agent from content
await agent.learn(myCodebase, { wsla: true, epochs: 3 });
console.log(agent.turnCount); // 2
// Persists history to memory, then destroys runtime
await agent.destroy();When persistHistory: true (default):
- On
agent.init(): loads__history__from theMemoryStoreand restores conversation turns. - On
agent.destroy(): serialises_historyto JSON and writes it under__history__.
This enables multi-session continuity without external state management.
Facts retrieved from MemoryStore are sorted by importance descending before being injected into the prompt. Higher-importance facts appear first, giving the model the most relevant context regardless of insertion order.
@seanhogg/mambakit has been consolidated into this package. MambaSession and all related types are now exported directly from @seanhogg/ssmjs.
Before:
npm install @seanhogg/mambakit @seanhogg/ssmjsimport { MambaSession } from '@seanhogg/mambakit';
import type { MambaSessionOptions, Tokenizer } from '@seanhogg/mambakit';After:
npm install @seanhogg/ssmjsimport { MambaSession, SessionError } from '@seanhogg/ssmjs';
import type { MambaSessionOptions, Tokenizer } from '@seanhogg/ssmjs';All types are re-exported unchanged — MambaSessionOptions, CompleteOptions, AdaptOptions,
AdaptResult, SaveOptions, LoadOptions, StorageTarget, CreateCallbacks,
LayerSchedulePreset, MODEL_PRESETS, GpuMode, and Tokenizer.
No logic changes are required, only the import path.
SSM.js serves as the hippocampus layer of CoderClaw's gateway — a persistent semantic memory and local inference engine running alongside the frontier LLM (Claude/GPT) cortex.
The SsmMemoryService class in CoderClaw's src/infra/ssm-memory-service.ts wraps an SSMRuntime + SSMAgent + MemoryStore triplet:
CoderClaw gateway
├── server-startup.ts ← initSsmMemoryService() on boot
├── infra/knowledge-loop.ts ← remember() + learn() on every agent run
├── infra/ssm-memory-service.ts ← SsmMemoryService wrapper
└── coderclaw/orchestrator.ts ← recallSimilar() injected into task prompts
Data flow:
- Agent run completes →
KnowledgeLoopServicederives activity summary - Summary is stored in
.coderClaw/memory/YYYY-MM-DD.md(markdown log) - Summary is also passed to
ssmSvc.remember()(tagged + importance-weighted) - Summary is passed to
ssmSvc.learn()→ WSLA fine-tuning adapts the SSM - On next workflow task,
recallSimilar(taskDescription, 5)injects relevant memories into the prompt as a[Memory Context]block
GPU init is optional: if @webgpu/node is unavailable, the service starts in memory-only mode (gpuAvailable: false) and SSM inference is skipped. The gateway never crashes due to a missing GPU.
- Session layer:
Tokenizerinterface + pluggable injection viaMambaSessionOptions.tokenizer MemoryStore: TTL (ttlMs),defaultTtlMs,purgeExpired(),recallRecent(n)MemoryStore:FactType,tags,importancefields onMemoryEntryMemoryStore: updatedremember()acceptingRememberOptionsInferenceRouter:route()now returnsRoutingDecisionobject withtarget,reason,confidence,detailsSSMAgent:persistHistoryoption;init()loads__history__;destroy()saves itSSMAgent: fact injection sorted byimportancedescending
MemoryStore.recallSimilar(query, topK, runtime)— Jaccard similarity; SSM embedding-based search in futureMemoryStore.recallByTag(tag)— tag-based filteringMemoryStore.exportAll()/importAll(entries, strategy)— cross-session merge
SsmMemoryServiceinsrc/infra/ssm-memory-service.ts— singleton gateway serviceserver-startup.ts:initSsmMemoryService()on boot; non-fatal GPU fallbackKnowledgeLoopService:remember()+learn()on every agent run completionAgentOrchestrator:recallSimilar()injected as[Memory Context]before task dispatch
DistillationEngine:qualityGateoption (minLength,maxPerplexity)DistillResult:skipped+skipReasonfieldsDistillationEngine.getLog()— bounded in-memoryDistillationLog[]InferenceRouter:RoutingAuditEntry+getAuditLog()— bounded in-memory logSSMRuntime.getRoutingAuditLog()— delegates to routerSSMRuntime.getDistillationLog()— stub; returns empty array (inline engine future work)
| Option | Type | Default | Description |
|---|---|---|---|
session |
MambaSessionOptions |
required | Forwarded to MambaSession.create() |
bridge |
TransformerBridge |
— | Transformer backend for routing/distillation |
routingStrategy |
'auto'|'ssm'|'transformer' |
'auto' |
Routing strategy |
longInputThreshold |
number |
1200 |
Chars before auto-routing prefers transformer |
perplexityThreshold |
number |
80 |
SSM perplexity cutoff |
callbacks |
CreateCallbacks |
— | Progress callbacks |
Generates a full response. Routes to SSM or transformer per strategy. Returns Promise<string>.
AsyncIterable<string> — always uses SSM path for consistent latency.
AsyncIterable<string> — routes like generate(), streams via bridge if available.
Pass-through to session.adapt(). Returns AdaptResult.
Returns SSM perplexity. Used internally by auto-routing.
Returns RoutingAuditEntry[] — last 500 routing decisions with timing.
Returns DistillationLog[] — last 200 distillation runs (stub; use distiller.getLog() directly).
Weight persistence pass-throughs to MambaSession.
Releases GPU device and all buffers.
import { SSMError, SessionError } from '@seanhogg/ssmjs';
try {
const runtime = await SSM.create({ session: { modelSize: 'nano' } });
await runtime.generate('hello');
} catch (err) {
if (err instanceof SSMError) {
// Runtime-level error (bridge, distillation, memory)
console.error(err.code); // 'BRIDGE_REQUEST_FAILED' | 'RUNTIME_DESTROYED' | ...
}
if (err instanceof SessionError) {
// Session-level error (GPU init, tokenizer, checkpoint)
console.error(err.code); // 'GPU_UNAVAILABLE' | 'TOKENIZER_LOAD_FAILED' | ...
}
}src/
├── index.ts ← package entry + SSM namespace
├── session/ ← session layer (absorbed from @seanhogg/mambakit)
│ ├── session.ts ← MambaSession.create() — GPU, tokenizer, model, persistence
│ ├── tokenizer.ts ← Tokenizer interface (pluggable)
│ ├── presets.ts ← MODEL_PRESETS + layer schedule resolution
│ ├── persistence.ts ← IndexedDB / download / File System API helpers
│ ├── streaming.ts ← AsyncIterable token streaming
│ ├── errors.ts ← SessionError typed error class
│ └── index.ts ← barrel export
├── runtime/
│ └── SSMRuntime.ts ← core runtime, owns MambaSession
├── bridges/
│ ├── TransformerBridge.ts ← interface
│ ├── OpenAIBridge.ts ← OpenAI chat completions
│ ├── AnthropicBridge.ts ← Anthropic Messages API
│ └── FetchBridge.ts ← generic OpenAI-compatible endpoint
├── router/
│ └── InferenceRouter.ts ← SSM ↔ transformer routing + audit log
├── memory/
│ └── MemoryStore.ts ← IndexedDB fact store: TTL, tags, importance, export/import
├── distillation/
│ └── DistillationEngine.ts ← online teacher→student distillation + quality gates
├── agent/
│ └── SSMAgent.ts ← orchestration: history persistence + fact injection
└── errors/
└── SSMError.ts ← typed error class
SSM.js patterns are the architectural foundation of Builderforce.ai's Agent Runtime.
| SSM.js concept | Builderforce.ai equivalent |
|---|---|
SSMRuntime |
AgentRuntime (browser-native, ties to IDE project) |
DistillationEngine |
LLM-assisted dataset generation → in-browser LoRA training |
MemoryStore |
IndexedDB MambaAgentState + AgentPackage embedding |
SSMAgent |
Published workforce agent (Workforce Registry) |
TransformerBridge |
Cloudflare Workers AI / OpenRouter fallback |
MIT