propose: uniform YAML path expansion for embedding.model + README YAML reference#87
Conversation
Review (docs + propose)Direction: Strong approve on scope and framing — YAML/CLI/env should agree for Before merge / lock:
Overall: good draft; address (1)–(2) before treating README as contract on |
…L reference
- propose/YAML-PATH-EXPANSION-PROPOSE.md: surgical fix to apply
os.path.expanduser + os.path.expandvars to .java-codebase-rag.yml
embedding.model values that look like filesystem paths. Hub ids
(org/name shape) are passed through unchanged.
- README §2 ('Environment variables') gains a new 'Project YAML
reference' subsection: a single annotated .java-codebase-rag.yml
example covering every supported key (index_dir, embedding.{model,
device}, microservice_roots, cross_service_resolution, role_overrides,
route_overrides, http_client_overrides, async_producer_overrides),
a path-expansion compatibility table, and tips for diagnosing what
the resolver picked up via 'java-codebase-rag meta'.
Scope is intentionally minimal: only embedding.model gets the expansion
fix. Other YAML string fields (URL paths, topic names, FQNs, role
names, framework names) are not filesystem paths and are explicitly
out of scope. index_dir is documented as expanding only '~' (not
$VAR) -- a smaller existing gap noted as a separate follow-up.
15a0e74 to
77cd606
Compare
Amendments force-pushed (15a0e74 → 77cd606)All five points addressed. Status bumped Per-point response1. README vs implementation. Two changes:
2. "Silently fails on
This also forced a scope shift: the fix has to be at the resolution layer (post- 3. Pseudocode vs table. Dropped the 4. 5. CLI parity. This was the most useful point. Locking the helper at the resolution layer (decision #3) means CLI-parity comes for free: the helper runs on the value returned by
Consistency pass
Doc grew 204 → 252 lines. |
What
Two coupled changes:
propose/YAML-PATH-EXPANSION-PROPOSE.md— a surgical propose to applyos.path.expanduser+os.path.expandvarstoembedding.modelvalues from.java-codebase-rag.yml, but only when the value is path-shaped (starts with/,./,../,~, or contains$). Hub ids (org/name) are passed through unchanged. 12 locked decisions, 15-case use-case re-walk, 1 PR of implementation work.README §2 gains a new "Project YAML reference" subsection: a single annotated
.java-codebase-rag.ymlexample covering every key the project consumes (index_dir,embedding.{model,device},microservice_roots,cross_service_resolution,role_overrides,route_overrides,http_client_overrides,async_producer_overrides), a path-expansion compatibility table, and a pointer tojava-codebase-rag metafor diagnosing which source supplied each value.Why
While answering "how do I specify a path to a local embedding model via YAML?", the only honest answer turned out to be: you can't use
~/...because YAML resolution doesn't expand it. CLI and env (SBERT_MODEL) both apply full expansion;index_dirfrom YAML appliesexpanduser(but notexpandvars);embedding.modelfrom YAML applies neither. That's an inconsistency between resolution paths, not a design call.Separately, the README had YAML examples scattered across §2 (env-var precedence), §7.1 (role/route overrides), §7.2 (cross_service_resolution), and §7.4 (http_client/async_producer overrides) — with no single block showing all keys together. The new §2 sub-section is that single block.
Scope
Out of scope (deliberately):
role_overrides,route_overrides,http_client_overrides,async_producer_overridescontain URL paths, Kafka topic names, FQNs, role names, framework names — all namespace strings, not paths.microservice_roots— entries are directory names relative tosource_root, not arbitrary paths. Expansion would let one project's YAML reach outside its tree.embedding.device— devices (cpu,cuda,mps) aren't paths.index_diraddingexpandvars— already doesexpanduser; the$VARgap is smaller and tracked as a separate follow-up (noted in §5 of the propose).Status
Status: draft— not lock yet. Open for review on:models/xwithout leading./is treated as a hub id, not a path)$VAR, vs. raising)After review, status moves to
under review, thenlocked, then the doc is referenced from a follow-up implementation PR.