Performance

CKB Performance

Performance characteristics and benchmarks for CKB tools.

Latency Targets

CKB tools are classified by performance budget:

v5.2 Navigation Tools

Budget	P95 Target	Tools
Cheap	< 300ms	`searchSymbols`, `explainFile`, `listEntrypoints`, `explainPath`, `getSymbol`, `explainSymbol`
Heavy	< 2000ms	`traceUsage`, `getArchitecture`, `getHotspots`, `summarizeDiff`, `recentlyRelevant`, `listKeyConcepts`, `analyzeImpact`, `getCallGraph`, `findReferences`, `justifySymbol`

v6.0 Architectural Memory Tools

Budget	P95 Target	Tools
Cheap	< 300ms	`getModuleResponsibilities`, `getOwnership`, `recordDecision`, `getDecisions`, `annotateModule`
Heavy	< 2000ms	`getArchitecture`, `getHotspots`
Heavy	< 30000ms	`refreshArchitecture`

Benchmark Results

Environment: Apple M4 Pro, Go 1.23, macOS

v7.4 Tool Discovery Token Optimization

Presets reduce the token cost of MCP tools/list by up to 83%:

Preset	Tools	Bytes	Tokens	vs Full
`core` (default)	14	6,127	~1,531	-83%
`review`	19	9,177	~2,294	-75%
`refactor`	19	8,864	~2,216	-75%
`docs`	20	8,375	~2,093	-77%
`ops`	25	9,464	~2,366	-74%
`federation`	28	12,488	~3,122	-65%
`full`	76	36,172	~9,043	baseline

Before v7.4: Every session loaded all 80+ tools (~9,000 tokens) before any work could begin.

After v7.4: Default core preset loads 14 tools (~1,500 tokens). AI can expand dynamically with expandToolset if needed.

Token estimate: bytes / 4 (conservative for structured JSON)

v7.4 SCIP Backend Optimizations

The SCIP backend uses pre-computed indexes for dramatically faster lookups:

Operation	Before	After	Improvement
FindReferences	340μs	2.5μs	136x
SearchSymbols	930μs	136μs	7x
FindSymbolLocation	70μs	28ns	2,500x
GetCachedSymbol	210ns	7.5ns	28x

Implementation Details:

Index	Purpose	Complexity
`RefIndex`	Inverted index: symbolId → occurrences	O(1) lookup vs O(n×m) scan
`ConvertedSymbols`	Pre-converted SCIPSymbol cache	Avoids repeated SCIP parsing
`ContainerIndex`	Maps occurrence positions to containing symbols	O(1) lookup vs O(n²) scan
`findSymbolLocationFast`	Definition lookup via RefIndex	O(k) where k = occurrences

These indexes are built during SCIP index load with minimal memory overhead (~20-30% increase).

v7.4 Git Backend Optimizations

The getHotspots tool was dramatically optimized by consolidating git commands:

Operation	Before	After	Improvement
getHotspots (20 files)	26.7s	498ms	53x

Problem: For each changed file, the old code ran 4 separate git commands:

git rev-list --count (commit count)
git shortlog -sn (authors)
git log (last modified)
git log --numstat (line changes)

With 100+ files changed in 30 days = 400+ process spawns.

Solution: Single git log --format=%H|%an|%aI --numstat command parses all data in one pass.

Helper Function Performance

In-memory processing functions complete in nanoseconds to microseconds:

Function	Time	Description
`classifyFileRiskLevel`	1.0 ns	Risk classification for diff files
`classifyHotspotRisk`	0.77 ns	Churn-based risk assessment
`computeDiffConfidence`	2.2 ns	Confidence calculation
`computePathConfidence`	1.0 ns	Path confidence from basis
`detectLanguage`	7.3 ns	Language from file extension
`suggestTestPath`	19 ns	Test file path generation
`titleCase`	29 ns	Simple title casing
`classifyRecency`	43 ns	Timestamp recency classification
`computeRecencyScore`	44 ns	Recency scoring
`classifyFileRole`	78 ns	File role from path patterns
`splitCamelCase`	116 ns	CamelCase word splitting
`classifyPathRole`	297 ns	Full path role classification
`categorizeConceptV52`	561 ns	Concept categorization
`buildDiffSummary`	674 ns	Diff summary text generation
`extractConcept`	903 ns	Concept extraction from names

Pipeline Performance

Simulated tool processing with multiple items:

Pipeline	Items	Time	Budget	Headroom
PathClassification	10 paths	3.0 µs	300ms	99.999%
DiffProcessing	50 files	8.9 µs	2000ms	99.999%
HotspotProcessing	50 items	10.1 µs	2000ms	99.999%
ConceptExtraction	10 names	14.9 µs	2000ms	99.999%

v6.0 Hotspot Benchmarks

Function	Time	Description
`CalculateInstability`	0.25 ns	Martin's instability metric
`ComputeCompositeScore`	0.26 ns	Weighted hotspot score
`NormalizeChurnScore`	0.25 ns	Churn normalization
`NormalizeCouplingScore`	0.25 ns	Coupling normalization
`NormalizeComplexityScore`	0.26 ns	Complexity normalization
`CalculateTrend`	295 ns	Trend analysis (30 snapshots)

Pipeline	Items	Time	Budget	Headroom
HotspotScoring	100 files	69 ns	2000ms	99.999%
TrendAnalysis	50 files × 10 snapshots	5.7 µs	2000ms	99.999%

v6.0 Ownership Benchmarks

Function	Time	Description
`normalizeAuthorKey`	10 ns	Author key normalization
`BlameOwnershipToOwners`	47 ns	Convert blame to owners
`CodeownersToOwners`	56 ns	Convert CODEOWNERS to owners
`isBot`	743 ns	Bot detection (regex)
`matchPattern`	1.9 µs	Glob pattern matching
`GetOwnersForPath`	51 µs	Resolve owners for path

Pipeline	Items	Time	Budget	Headroom
OwnershipResolution	100 files × 50 rules	9.2 ms	300ms	96.9%

v7.3 Incremental Indexing

Incremental indexing (Go only) makes ckb index O(changed files) instead of O(entire repo).

Index Time Comparison

Project Size	Full Index	Incremental (1 file)	Speedup
Small (100 files)	~2s	~0.5s	4x
Medium (1000 files)	~15s	~1-2s	10x
Large (10000 files)	~60s	~2-5s	20x

Where Incremental Time Goes

Phase	Time	Notes
Change detection	~50ms	Git diff with -z flag
scip-go execution	~1-2s	Still runs full indexer
Delta extraction	~100ms	Only process changed docs
Database updates	~50ms	Delete + insert pattern

The scip-go step is unavoidable (protobuf doesn't support partial updates), but CKB only does expensive database work for changed files.

Accuracy vs Speed Trade-off

Index Type	Speed	Forward Refs	Reverse Refs
Full (`--force`)	Slower	100% accurate	100% accurate
Incremental	Faster	100% accurate	May be stale

Use ckb index --force when reverse reference accuracy is critical.

See Incremental Indexing for detailed accuracy guarantees.

Where Time Actually Goes

In-memory processing is negligible. Real-world latency is dominated by I/O:

SCIP index lookups - Symbol search, references, call graph traversal
Git history queries - Commit history, diff stats, churn metrics
File system operations - Directory traversal, file reads

This is by design - CKB's value is in orchestrating these I/O operations efficiently and compressing results for LLM consumption.

Running Benchmarks

# Run all query benchmarks
go test ./internal/query/... -bench=. -benchmem -run=^$

# Run all v6.0 benchmarks
go test ./internal/hotspots/... ./internal/ownership/... -bench=. -benchmem -run=^$

# Run specific benchmark
go test ./internal/query/... -bench=BenchmarkClassifyPathRole -benchmem -run=^$

# Run with CPU profiling
go test ./internal/query/... -bench=BenchmarkDiffProcessingPipeline -cpuprofile=cpu.prof -run=^$

Performance

CKB Performance

Latency Targets

v5.2 Navigation Tools

v6.0 Architectural Memory Tools

Benchmark Results

v7.4 Tool Discovery Token Optimization

v7.4 SCIP Backend Optimizations

v7.4 Git Backend Optimizations

Helper Function Performance

Pipeline Performance

v6.0 Hotspot Benchmarks

v6.0 Ownership Benchmarks

v7.3 Incremental Indexing

Index Time Comparison

Where Incremental Time Goes

Accuracy vs Speed Trade-off

Where Time Actually Goes

Running Benchmarks

Optimization Tips

For Users

For Contributors

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally