Merged
Conversation
wizzomafizzo
added a commit
that referenced
this pull request
Mar 24, 2026
…rkflow Rewrite optimization-scan.md from generic 20-line prompt to full autoresearch-style experiment loop with keep/discard logic, MiSTer prediction via multiplier bands, and scope constraints. Add 4 target-specific prompts with exact file/function references, benchmark commands, x86 CI thresholds, and algorithmic ideas: - optimize-slug-search.md (#1: Search() linear scan, target 2.9ms x86) - optimize-fuzzy-matching.md (#2: worst-case full scan, target 2.9ms x86) - optimize-indexing.md (#3: pipeline 32s/10k, target 21s/10k on MiSTer) - optimize-memory.md (#4: 101MB RSS investigation, target 50MB)
wizzomafizzo
added a commit
that referenced
this pull request
Mar 24, 2026
) * feat(bench): add benchmark infrastructure and optimization targets Add comprehensive benchmarks across the critical path (slug search cache, fuzzy matching, slug generation, filename parsing, NDEF parsing, config, mappings, advargs). Add Taskfile tasks for bench workflow (bench, bench-db, bench-baseline, bench-compare). Add optimization target documentation and background agent prompts for security review, dependency audit, and optimization scanning. Add AGENTS.md sections for benchmarks and background agent mode. * feat(bench): add pipeline, title resolution, and scan-to-launch benchmarks - Batch inserter benchmarks (insert, flush cost, commit cost) with real SQLite - Slug search cache build from real DB (10k, 50k titles) - Title resolution benchmarks (cache hit, exact match, fuzzy fallback) - Scan-to-launch pipeline benchmarks (exact match, direct path, with mapping) - Shared BuildBenchFilenames helper in pkg/testing/fixtures - Refactor NewInMemoryMediaDB to accept testing.TB for benchmark use - Remove stale fuzz commands for deleted zapscript/parser package - Fix bench tasks: add -run='^$', filter baseline output for benchstat * feat(bench): add MiSTer ARM benchmark infrastructure and baselines Add cross-compile + SSH pipeline for running benchmarks on real MiSTer hardware (Cortex-A9). Reduce benchmark sizes to fit MiSTer's 492MB RAM. Suppress zerolog output during benchmarks to prevent output corruption. Generate x86 and MiSTer baseline files. * fix(bench): fix broken memory and FlushScanStateMaps benchmarks Add Size() method to SlugSearchCache for deterministic memory measurement — replaces broken HeapAlloc delta that reported ~0 MB. Add runtime.KeepAlive to FlushScanStateMaps benchmark to prevent compiler dead-code elimination (was 78x too fast on x86). Update optimization targets with MiSTer multiplier bands, resolution cache optimization opportunity, and revised targets based on production measurements. Regenerate both baselines. * docs(prompts): add agent optimization prompts with experiment loop workflow Rewrite optimization-scan.md from generic 20-line prompt to full autoresearch-style experiment loop with keep/discard logic, MiSTer prediction via multiplier bands, and scope constraints. Add 4 target-specific prompts with exact file/function references, benchmark commands, x86 CI thresholds, and algorithmic ideas: - optimize-slug-search.md (#1: Search() linear scan, target 2.9ms x86) - optimize-fuzzy-matching.md (#2: worst-case full scan, target 2.9ms x86) - optimize-indexing.md (#3: pipeline 32s/10k, target 21s/10k on MiSTer) - optimize-memory.md (#4: 101MB RSS investigation, target 50MB)
wizzomafizzo
added a commit
that referenced
this pull request
Mar 25, 2026
* feat(bench): add benchmark infrastructure and optimization targets Add comprehensive benchmarks across the critical path (slug search cache, fuzzy matching, slug generation, filename parsing, NDEF parsing, config, mappings, advargs). Add Taskfile tasks for bench workflow (bench, bench-db, bench-baseline, bench-compare). Add optimization target documentation and background agent prompts for security review, dependency audit, and optimization scanning. Add AGENTS.md sections for benchmarks and background agent mode. * feat(bench): add pipeline, title resolution, and scan-to-launch benchmarks - Batch inserter benchmarks (insert, flush cost, commit cost) with real SQLite - Slug search cache build from real DB (10k, 50k titles) - Title resolution benchmarks (cache hit, exact match, fuzzy fallback) - Scan-to-launch pipeline benchmarks (exact match, direct path, with mapping) - Shared BuildBenchFilenames helper in pkg/testing/fixtures - Refactor NewInMemoryMediaDB to accept testing.TB for benchmark use - Remove stale fuzz commands for deleted zapscript/parser package - Fix bench tasks: add -run='^$', filter baseline output for benchstat * feat(bench): add MiSTer ARM benchmark infrastructure and baselines Add cross-compile + SSH pipeline for running benchmarks on real MiSTer hardware (Cortex-A9). Reduce benchmark sizes to fit MiSTer's 492MB RAM. Suppress zerolog output during benchmarks to prevent output corruption. Generate x86 and MiSTer baseline files. * fix(bench): fix broken memory and FlushScanStateMaps benchmarks Add Size() method to SlugSearchCache for deterministic memory measurement — replaces broken HeapAlloc delta that reported ~0 MB. Add runtime.KeepAlive to FlushScanStateMaps benchmark to prevent compiler dead-code elimination (was 78x too fast on x86). Update optimization targets with MiSTer multiplier bands, resolution cache optimization opportunity, and revised targets based on production measurements. Regenerate both baselines. * docs(prompts): add agent optimization prompts with experiment loop workflow Rewrite optimization-scan.md from generic 20-line prompt to full autoresearch-style experiment loop with keep/discard logic, MiSTer prediction via multiplier bands, and scope constraints. Add 4 target-specific prompts with exact file/function references, benchmark commands, x86 CI thresholds, and algorithmic ideas: - optimize-slug-search.md (#1: Search() linear scan, target 2.9ms x86) - optimize-fuzzy-matching.md (#2: worst-case full scan, target 2.9ms x86) - optimize-indexing.md (#3: pipeline 32s/10k, target 21s/10k on MiSTer) - optimize-memory.md (#4: 101MB RSS investigation, target 50MB) * perf: eliminate redundant slugification and reduce idle memory Indexing pipeline (-22% ns/op, -46% B/op at 10k files): - Hoist reYearScene regex from per-call to package-level compilation - Call SlugifyWithTokens once in GetPathFragments, pass pre-computed tokens to new GenerateSlugMetadataFromTokens, avoiding a second full 14-stage slugification pass per unique title Memory reduction for idle RSS (targeting 50MB from 101MB on MiSTer): - Nil all ScanState maps after indexing so GC can collect backing arrays (Go maps retain bucket memory even after clear/delete) - Force runtime.GC + debug.FreeOSMemory after indexing to return pages to OS immediately instead of waiting for scavenger - Reduce SQLite page cache from 8MB to 2MB (the SQLite default), saving 6MB idle RSS Update optimization-scan prompt with before/after measurement workflow and agent commit requirements. benchstat (mediascanner, count=6): IndexingPipeline_EndToEnd/10k 427.6m → 332.2m -22.29% (p=0.002) AddMediaPath_RealDB/10k 424.8m → 332.9m -21.62% (p=0.002) GetPathFragments_Batch/10k 266.0m → 217.1m -18.36% (p=0.002) AddMediaPath_RealDB/10k B/op 110.0Mi → 59.5Mi -45.88% (p=0.002) IndexingPipeline/10k allocs 2.02M → 1.22M -39.93% (p=0.002) * perf: optimize indexing pipeline and fix VACUUM blocking scans - Pre-resolve mediaType per-system instead of per-file in AddMediaPath - Add ASCII fast paths for NormalizeSymbolsAndSeparators and slug filtering - Pre-compile regexes that were compiled per-call (episodeDot, dateDot) - Add scene marker byte scan to skip 7 regex evaluations for ROM filenames - Combine ExpandAbbreviations + ExpandNumberWords into single pass - Add GOMEMLIMIT management (suspend during indexing, restore after) - Increase MediaDB cache_size from 2MB to 8MB, add UserDB SQLite params - Trim slug search cache backing arrays with slices.Clip - Remove VACUUM from background optimization — it takes an exclusive lock on the single SQLite connection, blocking all reads (including card scans) until it completes. ANALYZE alone is sufficient. - Fix SuspendMemoryLimit not being deferred (error path skipped restore) * perf: reduce indexing CPU and allocation overhead Pre-normalize launcher root paths once before the file walk instead of re-normalizing on every file check. Reduces path normalization from 25% CPU / 862MB allocations to <2% CPU / 0MB. Total allocations cut from 961MB to 420MB. Split secondary indexes into search-critical (5 indexes on MediaTitles and Media join) created synchronously at end of indexing, and deferred (10 indexes for tags, caches) rebuilt during background optimization. Launches work immediately when indexing completes. Move post-indexing work (cache population, ANALYZE, WAL checkpoint) to background optimization steps with retry logic so game launches are not blocked. Additional changes: - Replace 11 regex patterns in filename parser with manual string ops - Cache prepared statements in BatchInserter for full-batch flushes - Add _txlock=immediate to SQLite connection params - Increase cache_size to 32MB during indexing (restored to 8MB after) - Bump MaxOpenConns to 2 for WAL concurrent read/write - Zero-allocation path prefix check in pathHasPrefixNormalized MiSTer ARM results: 22min -> ~10min indexing time for 239K files. * fix: build indexes and caches synchronously before marking indexing complete Background CREATE INDEX and cache population caused search failures and wrong results during optimization. SQLite write operations on one connection interfered with read queries on the other, despite WAL mode. Move all index creation, tags cache population, and slug search cache building into the synchronous indexing path. Background optimization now only runs ANALYZE and WAL checkpoint, which are lightweight and don't affect query correctness. Also adds IF NOT EXISTS to all CREATE INDEX statements for idempotency, and reverts the page_size=4096 experiment (no measurable difference). MiSTer ARM: 15m02s for 239K files, launches work correctly immediately after indexing completes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.