Skip to content

feat: add retrieval latency benchmark and embedding input truncation#3

Merged
yyiilluu merged 1 commit into
mainfrom
feat/retrieval-latency-bench
Apr 13, 2026
Merged

feat: add retrieval latency benchmark and embedding input truncation#3
yyiilluu merged 1 commit into
mainfrom
feat/retrieval-latency-bench

Conversation

@yyiilluu
Copy link
Copy Markdown
Contributor

Summary

  • Add a reusable retrieval latency benchmark suite at reflexio/benchmarks/retrieval_latency/ that measures profile, user/agent playbook, and unified cross-entity search across storage backends, corpus sizes, and layers (service vs http via FastAPI TestClient).
  • Fix a production-affecting bug in LiteLLMClient where embedding inputs longer than the provider's token cap would surface as provider 400s. Inputs are now token-truncated per model with a registry-aware limit and a once-per-model log warning.
  • Add a small skip_in_precommit smoke test + committed baseline so /test-all catches retrieval regressions past 3× p95.
  • Drive-by: correct a stale config show docstring reference in ReflexioClient.get_my_config (the command is now config storage).

Changes

Retrieval latency benchmark (reflexio/benchmarks/retrieval_latency/)

  • bench.py — CLI driver. Sweeps (retrieval_type × backend × layer × N) cells with configurable trials/warmup; supports --baseline diffing and flags cells where p95 regressed ≥20%.
  • backends.py — runs each trial against either the in-process service layer or the FastAPI TestClient (ASGI transport, no TCP) to isolate framework overhead from storage work.
  • scenarios.py — 20 fixed queries and corpus generators for each retrieval type.
  • seed.py — populates storages with deterministic hash-derived fake document vectors (query-time latency depends on row count, not vector content, so this is a sound cost optimization).
  • embed_cache.py — disk-cached real query embeddings at ~/.cache/reflexio-benchmarks/embeddings-<model>.json, so only the first run hits the embedding API.
  • report.py — renders per-retrieval-type markdown tables grouped by (backend, layer) with p50 / p95 (mean) cell format.
  • README.md — full usage, layer semantics, interpretation sanity checks, and baseline regeneration instructions.
  • One reference report committed under results/ as a baseline; results/ is gitignored so future runs don't pollute the working tree.
  • pyproject.toml — relaxes S311 (non-crypto random) for reflexio/benchmarks/** so seed generators can use random without noqa spam.

Embedding input truncation (reflexio/server/llm/litellm_client.py)

  • _get_embedding_limit(model) — LRU-cached. Consults litellm.get_model_info first (so Cohere 512, Voyage 32k, etc. are respected); falls back to the documented OpenAI 8191 cap only when the model name looks OpenAI-family (text-embedding-, openai/, azure/); returns None for unknown non-OpenAI providers so we don't over-truncate their input.
  • _get_embedding_encoding(model) — LRU-cached tiktoken encoder; non-OpenAI providers get cl100k_base as an approximate proxy (tends to over-truncate by a small fraction rather than under-truncate and cause 400s).
  • _truncate_for_embedding(text, model, max_tokens=None) — called from get_embedding and get_embeddings (per-element for batches). Emits a WARNING the first time a given model's input is truncated, then DEBUG on every subsequent hit — keeps large backfills from flooding logs.

Tests

  • tests/server/llm/test_litellm_client_unit.py — new test cases covering: known OpenAI family, unknown OpenAI-prefix fallback, Cohere small-cap, non-embedding mode rejection, unknown-provider pass-through, short/empty/long text paths, prefix preservation (mirrors how sqlite_storage._get_embedding builds search_document: …), warn-once behavior, and end-to-end assertions that the string reaching litellm.embedding is always under the cap.
  • tests/benchmarks/test_retrieval_latency_smoke.pyskip_in_precommit smoke test at N=50, trials=10, sqlite + service asserting p95 within 3× of the committed baseline.json.

Test Plan

  • ruff check / ruff format — clean
  • Unit tests for truncation helpers exercise registry hit, miss + OpenAI-family fallback, miss + unknown provider, non-embedding mode, and Cohere-style small caps
  • Run the benchmark end-to-end against sqlite: uv run python -m reflexio.benchmarks.retrieval_latency.bench --sizes 100 --trials 10 --warmup 2 --backend sqlite --layer service --retrieval profile
  • Run the smoke test: uv run pytest tests/benchmarks/test_retrieval_latency_smoke.py -v
  • Confirm warn-once log behavior during a real backfill
  • /test-all on the full suite

Introduce a reusable retrieval-latency benchmark suite and fix a long-
standing issue where oversized embedding inputs could 400 against the
provider. Also ship a small smoke test + baseline so CI catches
regressions, and correct a stale docstring reference.
@yyiilluu yyiilluu merged commit 7bfd449 into main Apr 13, 2026
@yyiilluu yyiilluu deleted the feat/retrieval-latency-bench branch April 14, 2026 07:20
yilu331 added a commit that referenced this pull request May 1, 2026
Adds an opt-in LLM relevance-judge rerank stage to search_user_profiles
(and the playbook variants), parallel to the existing cross-encoder
rerank. The new stage bridges synonym/brand→category gaps that pure
lexical/semantic models can't bridge — e.g. "Thrive Market" = grocery
service, "Suica card" = Tokyo transit, "TripIt app" = travel-organizer.
Cross-encoder upgrades (bge-reranker-v2-m3) were tested and rejected:
they don't have the retail-brand world knowledge needed.

Architecture:
- New helper score_pairs_llm() in reflexio/server/llm/rerank/llm_reranker.py
- New prompt rerank_relevance/v1.0.0 (relevance-judge with explicit
  brand→category and tool→use-case guidance, scoring rubric, and a rule
  that user-owned tools/cards/apps score 7-9 on help/tips questions)
- New tool arg llm_rerank: bool = False on SearchUserProfilesArgs and
  the playbook variants
- _maybe_rerank_hits dispatches LLM rerank → cross-encoder → hybrid
  order in fallback chain; any failure path returns None and the
  caller falls back gracefully
- Bundle wiring: search-tool handlers now receive llm_client +
  prompt_manager via _bundle_handler_with_llm

Search prompt v1.10.0 documents llm_rerank in the tool palette and adds
targeted exceptions to Patterns A, C, D, F where brand/proper-noun
profiles are likely the answer but don't share the question's literal
keywords. Pattern B explicitly OPTS OUT (recency dominates; rerank
scrambles date order). All exceptions are tightly scoped to the
question shape.

Tested:
- 16 unit tests for score_pairs_llm fallback chain
- 10 unit tests for _maybe_rerank_hits dispatch + fallback semantics
- Trip-wire test updated; semver-sort bug in _get_latest_prompt_version
  fixed (would have locked v1.10.0 → v1.9.0 lexically)
- Smoke test on gpt4_2ba83207 (grocery superlative): Thrive Market
  ranked #14 baseline → #4 with llm_rerank=True
- Smoke test on 0a34ad58 (Tokyo Suica/TripIt): TripIt missing baseline
  → #3 with llm_rerank=True
- LongMemEval tune-100 r93 vs r91: 76/100 vs 74/100 (+2 acc); macro
  81.6% vs 80.5% (+1.1pt); M-S +14pt (the target gain), SS-P +10pt;
  K-U regression observed but traced to extraction-time non-determinism
  (knowledge updates not captured during re-ingest), not the rerank
  changes

Bundled prompt-bank state catch-up:
- answer_synthesis v1.3.0/v1.4.0 (rules 13/14 from earlier rounds)
- extraction_user_profile v1.1.0/v1.1.1/v1.1.2 (relative-time
  resolution, started/finished pair preservation)
- compress_session_for_query v1.0.0–v1.3.0 (the in-tool denoiser
  introduced earlier; currently hard-disabled at the code level)
- Older prompt versions flipped to active: false

Misc:
- LiteLLMClient seeds default to "42" for benchmark reproducibility
- /api/search response now exposes rehydrated_text (set by the search
  agent when it called read_session_text)
yilu331 added a commit that referenced this pull request May 9, 2026
…nit/QMD hardening (#51)

## Summary

Five threads, surfaced across two `/test-e2e-cli --full` runs on a Linux
VPS with only `MINIMAX_API_KEY` configured. The first three were the
original PR scope; threads 4–5 were surfaced by the e2e run validating
the first three:

1. **Cache invalidation** (R3 fix) — file-mtime auto-invalidation for
self-hosted file-backed config + explicit admin endpoint/CLI/client
method. Direct edits to `~/.reflexio/configs/config_<org-id>.json` are
now picked up within 1 request instead of waiting up to an hour for the
TTL.
2. **PATCH config endpoints** (R2 fix) — `POST /api/update_config`
accepts a partial dict and shallow-merges into existing config. Removes
the "must send the entire `Config` schema to flip one field" pain.
3. **MiniMax-only e2e gaps** (R5 fix + adjacent) — auto-fall-back to the
bundled local embedder when no cloud embedder is configured, MiniMax in
`extraction_agent`/`search_agent` defaults, `supports_tool_calling`
override allowlist, and 5 regression tests locking in the auto-detect
contract.
4. **`reflexio setup init` writes `.env` to `~/.reflexio/.env`** (e2e
Issue #2) — was writing to CWD, polluting whatever directory the user
was in.
5. **DiskStorage QMD client: stale `/tmp/e2e-disk` path + 60s probe
timeouts** (e2e Issue #3) — hard-coded test path leaked into production
logs; probe queries had no usable timeout.

---

## 1. Cache invalidation

The per-org cache (`_reflexio_cache: TTLCache(maxsize=100, ttl=3600)`)
keyed by `(org_id, storage_base_dir)` was only invalidated when writes
went through `/api/set_config` or (newly) `/api/update_config`. Anything
that mutated config out-of-band — hand-edited config file,
sibling-replica write, direct DB UPDATE — was invisible until the
entry's hour-long TTL expired.

### Phase 1 — file-mtime auto-invalidation

`Reflexio.current_config_version()` returns a tuple — `('file', mtime)`
for `LocalFileConfigStorage`, `None` when probing isn't supported. The
cache stamps the version at load time; on every cache hit it re-probes
(cheap `os.stat`, ~10 µs) and evicts on mismatch. The check is
implemented outside the cache lock to keep contention low.

```python
# reflexio/server/services/configurator/local_file_config_storage.py
def get_version(self) -> tuple[str, float] | None:
    try:
        return ("file", os.stat(self._config_path).st_mtime)
    except OSError:
        return None
```

### Phase 2 — explicit admin invalidate endpoint + CLI + client method

For backends without cheap version probing, or any "I just SQL'd the DB
and want my replica to re-read":

```bash
reflexio admin cache invalidate              # caller's own org
reflexio admin cache invalidate --org-id ID  # explicit (must match auth-resolved org)
```

```python
client.invalidate_cache()              # caller's own org
client.invalidate_cache(org_id="...")  # explicit
```

```http
POST /api/admin/cache/invalidate
{"org_id": "..."}  # optional, verification-only
```

Auth uses the same `Depends(default_get_org_id)` as `/api/set_config`.
Cross-org invalidation is **intentionally NOT exposed** — request body
`org_id` must match the dep-resolved value.

### Companion: cloud DB-level versioning

Phase 3 (DB-level `config_version` column for multi-replica cloud
invalidation) ships in the parent enterprise PR
ReflexioAI/reflexio-enterprise#47 — it depends on the abstract
`ConfigStorage.get_version()` shipped here, but the concrete
`SupabaseConfigStorage.get_version()` and the migration live in the
enterprise repo.

---

## 2. PATCH config endpoints

`POST /api/set_config` requires the full `Config` schema. Sending
`{"extraction_backend": "classic"}` alone errors with `1 validation
error: storage_config Field required`. PATCH semantics close the gap
with shallow top-level merge:

```python
@core_router.post("/api/update_config")
def update_config(partial: dict[str, Any], org_id: str = Depends(default_get_org_id)) -> SetConfigResponse:
    existing = reflexio.request_context.configurator.get_config().model_dump(mode="python")
    response = reflexio.set_config(Config(**{**existing, **partial}))
    if response.success:
        invalidate_reflexio_cache(org_id=org_id)
    return response
```

```python
client.update_config({"extraction_backend": "classic"})  # no client-side validation; server is authority
```

```bash
reflexio config update --field extraction_backend=classic
reflexio config update --field extraction_backend=classic --field search_backend=agentic
reflexio config update --data '{"search_backend": "agentic"}'
reflexio config update --file partial.json
```

Live-verified: `extraction_backend` and `search_backend` can now be set
independently and atomically (e.g., one classic + one agentic).

---

## 3. MiniMax-only e2e gaps

Three independent fixes that together let a user with **only**
`MINIMAX_API_KEY` configured run `/test-e2e-cli` end-to-end and produce
non-zero profiles + playbooks.

### 3a. Embedding auto-fallback (Layer A) + setup-init choice (Layer B)

When no cloud embedder is configured, today blocks on stdin:

> Your LLM provider doesn't support text embeddings. Which provider for
embeddings? `[1] OpenAI [2] Gemini`

`chromadb>=1.5.8` is already a hard dependency, so the bundled
all-MiniLM-L6-v2 ONNX model is always physically present — it was just
gated behind `CLAUDE_SMART_USE_LOCAL_EMBEDDING=1` (the
[`claude-smart`](https://github.com/ReflexioAI/claude-smart)
public-contract env var).

Three explicit paths in `_auto_detect_model` for the embedding role:

| Path | Trigger | Picks |
|---|---|---|
| **1. claude-smart explicit** (existing, unchanged) |
`CLAUDE_SMART_USE_LOCAL_EMBEDDING=1` AND chromadb importable |
`local/minilm-l6-v2` (priority 2) |
| **2. Cloud embedder** (existing, unchanged) | `OPENAI_API_KEY` or
`GEMINI_API_KEY` set | `text-embedding-3-small` / `text-embedding-004` |
| **3. Local fallback** (NEW) | No cloud embedder configured AND
chromadb importable | `local/minilm-l6-v2` |

Path 1 wins over 2; 2 wins over 3. claude-smart contract preserved
(`is_local_embedder_available()`, `register_if_enabled()` kept as
alias). LiteLLM dispatch in `litellm_client.py` is also relaxed:
`local/` model names are routed to `LocalEmbedder.get()` whenever
chromadb is importable, no env-var requirement.

`setup init` gets a new embedding-provider step (default = local when
chromadb available) and a `--embedding {local|openai|gemini|auto}` flag
for non-interactive use. The choice is written to
`~/.reflexio/configs/config_<org-id>.json` under
`llm_config.embedding_model_name`; the existing 3-tier
`resolve_model_name` (`config_override > site_var > auto_detect`)
ensures it wins over Layer A's auto-detection.

`services.py` start gate: not-fully-configured + chromadb importable →
log `INFO` "Using local embedder as fallback" and proceed (instead of
prompting or erroring).

### 3b. MiniMax in `extraction_agent` + `search_agent` defaults

Added MiniMax M2/M2.7 to `_PROVIDER_DEFAULTS[minimax].extraction_agent`
and `.search_agent`. Without this, the agentic-v2 path silently failed
with `Model … lacks tool-calling and no fallback_schema provided` and
produced 0 profiles.

### 3c. `supports_tool_calling` override allowlist

`litellm.supports_function_calling("minimax/MiniMax-M2.7")` returns
`False` because litellm's registry only catalogues `minimax/MiniMax-M2`.
M2.7 actually [supports function
calling](https://platform.minimax.io/docs/guides/text-m2-function-call).
Added an allowlist override in `tools.py` for known-good models the
registry hasn't caught up with yet.

### 3d. `TestMinimaxOnlyEnvRegression` — auto-detect contract lock-in

Added 5 regression tests in `tests/server/llm/test_model_defaults.py`
that exercise the MiniMax-only env scenario end-to-end through
`_auto_detect_model` and `resolve_model_name`. Specifically guards
against the failure mode that surfaced as task #77 (extractor resolved
to `gpt-5-mini` despite providers list being `['minimax']` — turned out
to be benchmark contamination of the org config DB row, but the
auto-detect contract was the canary).

---

## 4. `setup init` writes `.env` to the canonical location (e2e Issue
#2)

`reflexio setup init` was writing `.env` to `Path.cwd() / ".env"`. When
the user runs setup from any directory other than `~/.reflexio/` (e.g.
inside a worktree, or via tmux with cwd=worktree-root), the resulting
`.env` lands somewhere unexpected. Subsequent `reflexio` invocations
partially mask the issue via python-dotenv's parent-walk-up, but the
worktree gets polluted.

Now: setup commands always write to `Path.home() / ".reflexio" / ".env"`
regardless of CWD. Plus a 2-test regression in
`tests/cli/test_setup_cmd.py` that monkeypatches `Path.cwd` to a
non-home dir and asserts the resulting file ends up at
`~/.reflexio/.env`.

Also touches `cli/env_loader.py` — small change to make the canonical
path centrally defined so `setup_cmd.py` and `env_loader.py` agree.

---

## 5. DiskStorage QMD client: stale path + probe timeout (e2e Issue #3)

Two bugs in
`reflexio/server/services/storage/disk_storage/_qmd_client.py`:

1. **Hard-coded `/tmp/e2e-disk` collection path**. On any non-test
deployment, the QMD client logged `Collection:
/tmp/e2e-disk/disk_<org-id> (**/*.md)` even though the actual storage
`dir_path` was `~/.reflexio/data/disk_<org-id>/`. Test scaffolding
leaked into production code paths.
2. **No usable timeout on probe queries**. Slow QMD subprocesses caused
60s waits per probe, bloating logs and slowing the publish path.

Fix: use the storage backend's `dir_path` directly (no env-var fallback
to a test path), add a configurable probe timeout (default 5s,
callsite-overridable). 8 new tests in
`tests/server/services/storage/test_disk_storage_file_io.py` covering:
stale-collection re-registration, configured-timeout-honored,
default-timeout-applied, dir_path-mismatch-detected.

---

## Behavioral diff

### Embedding paths

| User profile | Today | After |
|---|---|---|
| `CLAUDE_SMART_USE_LOCAL_EMBEDDING=1` + chromadb | local | unchanged |
| `OPENAI_API_KEY` set | OpenAI | unchanged |
| `GEMINI_API_KEY` set | Gemini | unchanged |
| MiniMax / Anthropic / no key | **interactive prompt blocks `services
start`** | **silent local fallback** |
| `setup init`, picked local | (not an option) | local (org config
override) |
| `setup init --embedding local` (CI) | (no flag) | local; no prompt |
| No chromadb importable | `RuntimeError` | `RuntimeError` with clearer
install hint |

### Config update

| Action | Today | After |
|---|---|---|
| Update one field via API | Send full `Config` schema; otherwise
validation error | `POST /api/update_config {"extraction_backend":
"classic"}` works |
| Update one field via CLI | Hand-edit JSON, then `set` | `reflexio
config update --field extraction_backend=classic` |
| Update one field via Python client | Construct full `Config` object |
`client.update_config({...})` |

### Cache freshness (file-backed self-host)

| Trigger | Today | After |
|---|---|---|
| `POST /api/set_config` / `/api/update_config` | invalidates own cache
(existing) | unchanged |
| Hand-edit config file | stale until TTL expires (~1 h) | **picked up
within 1 request** (mtime probe) |
| `reflexio admin cache invalidate` | (didn't exist) | explicit eviction
|

---

## Test plan

- [x] Full submodule suite green
- [x] **+40 new tests** across all five threads:
- **Cache** (Phase 1+2): `test_cache_evicts_when_config_file_modified`,
`test_cache_keeps_when_config_file_unchanged`,
`test_cache_handles_missing_config_path`, `ConfigStorage.get_version()`
contract test
- **Admin invalidate**: `test_admin_invalidate_endpoint_clears_cache`,
`test_admin_invalidate_returns_false_when_not_cached`,
`test_cli_admin_cache_invalidate`, `test_client_invalidate_cache_method`
- **PATCH config**: `test_update_config_shallow_merges`,
`test_update_config_invalidates_cache`, all three CLI input modes
(`--data`, `--file`, `--field`), client method behavior
- **Embedding fallback** (Layer A + B):
`test_is_chromadb_importable_when_present/absent`,
`test_register_if_chromadb_available_no_env_var`,
`test_embedding_fallback_to_local_when_no_cloud`,
`test_embedding_fallback_skipped_when_cloud_available`,
`test_embedding_explicit_opt_in_beats_cloud`,
`test_embedding_no_chromadb_raises`,
`test_prompt_embedding_provider_non_tty_picks_local`,
`test_setup_init_includes_embedding_step`,
`test_setup_init_local_is_default`,
`test_setup_init_local_choice_writes_org_config`,
`test_setup_init_embedding_flag_local`,
`test_setup_init_embedding_flag_auto_no_override`,
`test_services_start_proceeds_without_cloud_embedder_when_chromadb_present`
- **Tool-calling override**: allowlist application + non-listed models
still defer to litellm
- **MiniMax-only env auto-detect contract**
(`TestMinimaxOnlyEnvRegression`, 5 tests):
generation/evaluation/should_run/pre_retrieval all resolve to
`minimax/MiniMax-M2.7`; empty `OPENAI_API_KEY=` placeholder doesn't
promote OpenAI; embedding falls through to `local/minilm-l6-v2`
- **`setup init` `.env` location** (2 tests): writes to
`~/.reflexio/.env` regardless of CWD
- **DiskStorage QMD client** (8 tests): stale-collection
re-registration, configurable probe timeout, default 5s timeout,
dir_path mismatch detection
- [x] `ruff check` + `ruff format --check` + `pyright` clean on all
touched files
- [x] **Live-verified end-to-end** on Linux VPS with `MINIMAX_API_KEY`
only via `/test-e2e-cli --full`: all 4 storage modes (SQLite, Managed,
Disk, Self-hosted Supabase) hit Hard PASS criterion (non-zero profile
extraction for priya/marcus/elena/raj). Schema Gate green for all modes
(Mode D required `sg docker -c` workaround until Issue #1 in companion
PR ReflexioAI/reflexio-enterprise#47 fixed it at the source).

---

## Backwards compatibility

- **claude-smart users (env var set):** zero change.
`is_local_embedder_available()`, priority order, dispatch all preserved.
`register_if_enabled()` is kept as an alias to
`register_if_chromadb_available()`.
- **Cloud-embedder users:** zero change unless they re-run `setup init`.
Layer A's auto-detection still picks cloud when keys are configured.
- **`POST /api/set_config`:** semantics unchanged (full-schema replace).
`/api/update_config` is purely additive.
- **Existing cache API** — `get_reflexio`, `invalidate_reflexio_cache`,
`clear_reflexio_cache`, `get_cache_stats` — signatures unchanged. Phase
1 layered the version probe inside without breaking callers.
- **Public docs at
`public_docs/content/docs/claude-smart/configuration.mdx`** — still
accurate.

---

## Companion follow-up

After this lands and a release is cut, `reflexio-enterprise` bumps its
submodule pointer + ships Phase 3 (DB-level `config_version` column →
multi-replica cache invalidation in cloud) in
ReflexioAI/reflexio-enterprise#47.

## Out of scope

- **Reducing total install size of `reflexio-ai`** by moving heavy ML
deps to extras. Bigger refactor; revisit after a couple of weeks of
usage data.
- **Removing `CLAUDE_SMART_USE_LOCAL_EMBEDDING`.** Public contract — any
deprecation cycle would be coordinated with the claude-smart repo, not a
side effect of this fix.
- **Quality comparison: local MiniLM (384-dim, zero-padded to 512) vs
OpenAI text-embedding-3-small (1536).** Local is measurably worse;
acceptable for the fallback case, irrelevant for users with cloud keys.

---

## Notable implementation deviations from the original plan

1. The legacy `_prompt_embedding_provider` is no longer called from
`setup init`. Plan said to add a step "after" the LLM step, but calling
both back-to-back would have asked the user to pick the embedder twice.
Replaced the legacy call with the new step in `init`. Legacy helper
preserved for the `services start` first-run wizard.
2. The new `_choose_embedding_provider` step folds in the API-key prompt
for openai/gemini when the env var isn't already set. Cleaner UX.
3. Drive-by:
`tests/cli/test_service_builders.py::test_invalid_raises_bad_parameter`
was hardcoded to `"postgres"` (which became valid in PR #47); switched
to `"not-a-real-backend"`.
4. Cache invalidation: cross-storage `current_config_version()`
abstraction unified file-mtime and DB-version checks behind one
tuple-returning method, so the cache module doesn't have to know about
storage backends.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added `--embedding` CLI option to setup commands for explicit
embedding provider configuration
* Added `admin cache invalidate` command to manually clear server-side
cache
  * Added `config update` command for partial configuration updates
* Enhanced automatic cache management with out-of-band change detection
  * Extended MiniMax model support for tool-calling capabilities

* **Bug Fixes**
* Fixed local embedding provider initialization without requiring
environment variable
  * Improved tool-calling support detection for specific model variants

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant