feat(opencode-plugin,aft): add optional external semantic embedding backends#11
Merged
ualtinok merged 2 commits intocortexkit:mainfrom Apr 16, 2026
Conversation
ualtinok
added a commit
that referenced
this pull request
Apr 16, 2026
P0: Fix config merge security bypass — project config can no longer leak sensitive semantic fields (backend/base_url/api_key_env) through the ...override spread when mergeSemanticConfig returns undefined. P1: Fix deep-merge — only defined safe fields from project config overwrite user values (prevents silent erasure of user model/backend). P1: Fix OpenAI index-field bug — use enumerate() fallback for providers that omit the index field in embedding responses. P1: Add cross-batch dimension tracking — error on inconsistent embedding dimensions across batches. P2: Clamp semantic timeout below bridge timeout (25s). P2: Add HTTP retry with backoff (2 retries, 5xx/429 only). P2: SSRF validation — http/https only, no-redirect policy. P2: Stale semantic cache detection via file mtime verification. P2: Normalize base_url before fingerprinting. P2: Reject unknown fastembed model names explicitly. P2: Change fallback timeout constant from 60s to 25s. 13 council audit findings addressed with 7 new tests.
Collaborator
|
Merged and shipped in v0.13.0 with additional hardening from a council security audit. Thanks @freelanceagent1 for the excellent feature work — the external backends architecture is clean and well-designed. Follow-up improvements we added on top of your PR:Security (P0):
Correctness (P1):
Robustness (P2):
All 689 Rust tests + 211 TypeScript tests pass. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds optional external semantic embedding backends to AFT while keeping the current default behavior unchanged.
Default behavior is still:
fastembed + all-MiniLM-L6-v2New opt-in backends:
openai_compatiblefor LM Studio / OpenAI-compatible embeddings endpointsollamafor Ollama native embeddings APIWhat changed
Plugin/config
semanticconfig blockconfigurefastembedRust semantic backend
fastembedopenai_compatibleollamaPersistence/status
Why there is no split query/index backend config
This PR intentionally uses one semantic backend for both:
Using one backend to build the index and another to embed queries is unsafe unless both produce vectors in the exact same embedding space. That is not guaranteed across runtimes, even for the same model family.
Validation
Automated checks
cargo test -p agent-file-tools semantic_ -- --nocapturebun test packages/opencode-plugin/src/__tests__/config.test.ts packages/opencode-plugin/src/__tests__/shared-status.test.ts packages/opencode-plugin/src/__tests__/bridge.test.tsbun run typecheckinpackages/opencode-pluginProvider validation
Compatibility note
Benchmark appendix
These numbers are for the new semantic backend feature, not a replacement for the README trigram-search benchmark.
Benchmark environments
Local benchmark machine:
Remote LM Studio host:
Backends tested:
fastembedlocal CPUollamalocal CPUollamalocal with Vulkan forcedLM Studioover LANLarge-repo semantic build comparison (
codex-fresh)fastembedlocal CPU:11m 33sollamalocal CPU:8m 15sollamalocal Vulkan GPU:3m 08sLM StudioLAN:2m 13sAll successful runs ended with:
23,638semantic entries384dimensionsREADME-style query sets
I also reran the README query sets on current local snapshots of:
opencode-aftrethChromium/baseImportant caveat:
Indexed grep vs ripgrep
Observed indexed-grep speedups remained strong:
opencode-aft: ~18xto40xreth: ~12xto20xChromium/base: ~26xto71xSemantic query latency
Cached semantic query latency favored local
fastembed:fastembedwas fastestThis suggests:
fastembedstill gives the best query-time latencySame-model note
LM Studio and Ollama were benchmarked with MiniLM-family embedding models, but not guaranteed byte-identical artifacts.
A stricter follow-up benchmark would import the same
second-stateMiniLM GGUF into both runtimes and rerun the comparison.