feat(opencode-plugin,aft): add optional external semantic embedding backends by freelanceagent1 · Pull Request #11 · cortexkit/aft

freelanceagent1 · 2026-04-15T07:40:04Z

Summary

This adds optional external semantic embedding backends to AFT while keeping the current default behavior unchanged.

Default behavior is still:

local fastembed + all-MiniLM-L6-v2

New opt-in backends:

openai_compatible for LM Studio / OpenAI-compatible embeddings endpoints
ollama for Ollama native embeddings API

What changed

Plugin/config

added nested semantic config block
forwards semantic backend/model/base_url/api_key_env/timeout_ms/max_batch_size through configure
keeps ONNX runtime setup only for fastembed
external backends do not trigger ONNX setup

Rust semantic backend

introduced backend selection for semantic embedding generation
implemented:
- fastembed
- openai_compatible
- ollama
query embedding uses the same configured backend as index build
external backend failures are explicit; there is no silent fallback

Persistence/status

semantic cache now rebuilds when backend/model/base_url/dimension changes
status now exposes semantic backend + model
preserves the existing semantic progress/status work already on this branch

Why there is no split query/index backend config

This PR intentionally uses one semantic backend for both:

index build
query embedding

Using one backend to build the index and another to embed queries is unsafe unless both produce vectors in the exact same embedding space. That is not guaranteed across runtimes, even for the same model family.

Validation

Automated checks

cargo test -p agent-file-tools semantic_ -- --nocapture
bun test packages/opencode-plugin/src/__tests__/config.test.ts packages/opencode-plugin/src/__tests__/shared-status.test.ts packages/opencode-plugin/src/__tests__/bridge.test.ts
bun run typecheck in packages/opencode-plugin

Provider validation

OpenAI-compatible provider path tested against a live LM Studio-compatible endpoint
Ollama provider path tested against:
- mock-server protocol tests
- live local Ollama endpoint

Compatibility note

Runtime-validated on Windows
External-backend implementation uses cross-platform Rust HTTP/JSON codepaths and does not introduce Windows-specific logic
Linux/macOS were not runtime-tested in this branch from this machine

Benchmark appendix

These numbers are for the new semantic backend feature, not a replacement for the README trigram-search benchmark.

Benchmark environments

Local benchmark machine:

Windows 11
AMD Radeon RX 6800M
AMD Radeon(TM) Graphics

Remote LM Studio host:

user-reported Windows 11 + RTX 5070 Ti

Backends tested:

fastembed local CPU
ollama local CPU
ollama local with Vulkan forced
LM Studio over LAN

Large-repo semantic build comparison (`codex-fresh`)

fastembed local CPU: 11m 33s
ollama local CPU: 8m 15s
ollama local Vulkan GPU: 3m 08s
LM Studio LAN: 2m 13s

All successful runs ended with:

23,638 semantic entries
384 dimensions

README-style query sets

I also reran the README query sets on current local snapshots of:

opencode-aft
reth
Chromium/base

Important caveat:

these are the same query sets, but not guaranteed the exact same repo snapshots as the published README tables

Indexed grep vs ripgrep

Observed indexed-grep speedups remained strong:

opencode-aft: ~18x to 40x
reth: ~12x to 20x
Chromium/base: ~26x to 71x

Semantic query latency

Cached semantic query latency favored local fastembed:

local fastembed was fastest
LM Studio was usually second
Ollama was slower on query latency even with Vulkan enabled

This suggests:

external backends improve semantic build throughput
local fastembed still gives the best query-time latency

Same-model note

LM Studio and Ollama were benchmarked with MiniLM-family embedding models, but not guaranteed byte-identical artifacts.

A stricter follow-up benchmark would import the same second-state MiniLM GGUF into both runtimes and rerun the comparison.

…cedence

P0: Fix config merge security bypass — project config can no longer leak sensitive semantic fields (backend/base_url/api_key_env) through the ...override spread when mergeSemanticConfig returns undefined. P1: Fix deep-merge — only defined safe fields from project config overwrite user values (prevents silent erasure of user model/backend). P1: Fix OpenAI index-field bug — use enumerate() fallback for providers that omit the index field in embedding responses. P1: Add cross-batch dimension tracking — error on inconsistent embedding dimensions across batches. P2: Clamp semantic timeout below bridge timeout (25s). P2: Add HTTP retry with backoff (2 retries, 5xx/429 only). P2: SSRF validation — http/https only, no-redirect policy. P2: Stale semantic cache detection via file mtime verification. P2: Normalize base_url before fingerprinting. P2: Reject unknown fastembed model names explicitly. P2: Change fallback timeout constant from 60s to 25s. 13 council audit findings addressed with 7 new tests.

ualtinok · 2026-04-16T12:41:53Z

Merged and shipped in v0.13.0 with additional hardening from a council security audit. Thanks @freelanceagent1 for the excellent feature work — the external backends architecture is clean and well-designed.

Follow-up improvements we added on top of your PR:

Security (P0):

Project config can no longer set semantic.backend, semantic.base_url, or semantic.api_key_env — those are user-only config now. Prevents a supply-chain attack where a malicious .opencode/aft.jsonc could exfiltrate env secrets to an attacker's server.

Correctness (P1):

Deep-merge now only overwrites defined safe fields — a project config with just { semantic: { timeout_ms: 5000 } } no longer erases the user's model/backend
OpenAI-compatible response parser uses enumerate() fallback for providers that omit the index field (was causing all items to map to slot 0)
Cross-batch embedding dimension tracking — errors on inconsistent dimensions instead of silently corrupting the index

Robustness (P2):

HTTP retry with exponential backoff (2 retries for 5xx/429 errors)
SSRF validation: only http:// and https:// schemes, no-redirect policy
Semantic timeout clamped to bridge timeout (25s max)
Stale semantic cache detection via file mtime verification on restart
Normalized base_url in fingerprints to avoid spurious rebuilds
Unknown fastembed model names now return explicit errors
7 new tests covering all the above edge cases

All 689 Rust tests + 211 TypeScript tests pass.

freelanceagent1 added 2 commits April 15, 2026 00:32

fix(semantic): report semantic build progress and Windows runtime pre…

ff483d7

…cedence

feat(aft): add optional external semantic embedding backends

305fca6

ualtinok merged commit 1414835 into cortexkit:main Apr 16, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(opencode-plugin,aft): add optional external semantic embedding backends#11

feat(opencode-plugin,aft): add optional external semantic embedding backends#11
ualtinok merged 2 commits intocortexkit:mainfrom
freelanceagent1:fix/windows-aft-semantic-runtime

freelanceagent1 commented Apr 15, 2026

Uh oh!

Uh oh!

ualtinok commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

freelanceagent1 commented Apr 15, 2026

Summary

What changed

Plugin/config

Rust semantic backend

Persistence/status

Why there is no split query/index backend config

Validation

Automated checks

Provider validation

Compatibility note

Benchmark appendix

Benchmark environments

Large-repo semantic build comparison (codex-fresh)

README-style query sets

Indexed grep vs ripgrep

Semantic query latency

Same-model note

Uh oh!

Uh oh!

ualtinok commented Apr 16, 2026

Follow-up improvements we added on top of your PR:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Large-repo semantic build comparison (`codex-fresh`)