Skip to content

feat: cache invalidation + PATCH config + MiniMax e2e fixes + setup-init/QMD hardening#51

Merged
yilu331 merged 25 commits into
mainfrom
fix/embedding-fallback-local
May 9, 2026
Merged

feat: cache invalidation + PATCH config + MiniMax e2e fixes + setup-init/QMD hardening#51
yilu331 merged 25 commits into
mainfrom
fix/embedding-fallback-local

Conversation

@yilu331
Copy link
Copy Markdown
Collaborator

@yilu331 yilu331 commented May 8, 2026

Summary

Five threads, surfaced across two /test-e2e-cli --full runs on a Linux VPS with only MINIMAX_API_KEY configured. The first three were the original PR scope; threads 4–5 were surfaced by the e2e run validating the first three:

  1. Cache invalidation (R3 fix) — file-mtime auto-invalidation for self-hosted file-backed config + explicit admin endpoint/CLI/client method. Direct edits to ~/.reflexio/configs/config_<org-id>.json are now picked up within 1 request instead of waiting up to an hour for the TTL.
  2. PATCH config endpoints (R2 fix) — POST /api/update_config accepts a partial dict and shallow-merges into existing config. Removes the "must send the entire Config schema to flip one field" pain.
  3. MiniMax-only e2e gaps (R5 fix + adjacent) — auto-fall-back to the bundled local embedder when no cloud embedder is configured, MiniMax in extraction_agent/search_agent defaults, supports_tool_calling override allowlist, and 5 regression tests locking in the auto-detect contract.
  4. reflexio setup init writes .env to ~/.reflexio/.env (e2e Issue docs: rename feedbacks to playbooks in docs site method registry #2) — was writing to CWD, polluting whatever directory the user was in.
  5. DiskStorage QMD client: stale /tmp/e2e-disk path + 60s probe timeouts (e2e Issue feat: add retrieval latency benchmark and embedding input truncation #3) — hard-coded test path leaked into production logs; probe queries had no usable timeout.

1. Cache invalidation

The per-org cache (_reflexio_cache: TTLCache(maxsize=100, ttl=3600)) keyed by (org_id, storage_base_dir) was only invalidated when writes went through /api/set_config or (newly) /api/update_config. Anything that mutated config out-of-band — hand-edited config file, sibling-replica write, direct DB UPDATE — was invisible until the entry's hour-long TTL expired.

Phase 1 — file-mtime auto-invalidation

Reflexio.current_config_version() returns a tuple — ('file', mtime) for LocalFileConfigStorage, None when probing isn't supported. The cache stamps the version at load time; on every cache hit it re-probes (cheap os.stat, ~10 µs) and evicts on mismatch. The check is implemented outside the cache lock to keep contention low.

# reflexio/server/services/configurator/local_file_config_storage.py
def get_version(self) -> tuple[str, float] | None:
    try:
        return ("file", os.stat(self._config_path).st_mtime)
    except OSError:
        return None

Phase 2 — explicit admin invalidate endpoint + CLI + client method

For backends without cheap version probing, or any "I just SQL'd the DB and want my replica to re-read":

reflexio admin cache invalidate              # caller's own org
reflexio admin cache invalidate --org-id ID  # explicit (must match auth-resolved org)
client.invalidate_cache()              # caller's own org
client.invalidate_cache(org_id="...")  # explicit
POST /api/admin/cache/invalidate
{"org_id": "..."}  # optional, verification-only

Auth uses the same Depends(default_get_org_id) as /api/set_config. Cross-org invalidation is intentionally NOT exposed — request body org_id must match the dep-resolved value.

Companion: cloud DB-level versioning

Phase 3 (DB-level config_version column for multi-replica cloud invalidation) ships in the parent enterprise PR ReflexioAI/reflexio-enterprise#47 — it depends on the abstract ConfigStorage.get_version() shipped here, but the concrete SupabaseConfigStorage.get_version() and the migration live in the enterprise repo.


2. PATCH config endpoints

POST /api/set_config requires the full Config schema. Sending {"extraction_backend": "classic"} alone errors with 1 validation error: storage_config Field required. PATCH semantics close the gap with shallow top-level merge:

@core_router.post("/api/update_config")
def update_config(partial: dict[str, Any], org_id: str = Depends(default_get_org_id)) -> SetConfigResponse:
    existing = reflexio.request_context.configurator.get_config().model_dump(mode="python")
    response = reflexio.set_config(Config(**{**existing, **partial}))
    if response.success:
        invalidate_reflexio_cache(org_id=org_id)
    return response
client.update_config({"extraction_backend": "classic"})  # no client-side validation; server is authority
reflexio config update --field extraction_backend=classic
reflexio config update --field extraction_backend=classic --field search_backend=agentic
reflexio config update --data '{"search_backend": "agentic"}'
reflexio config update --file partial.json

Live-verified: extraction_backend and search_backend can now be set independently and atomically (e.g., one classic + one agentic).


3. MiniMax-only e2e gaps

Three independent fixes that together let a user with only MINIMAX_API_KEY configured run /test-e2e-cli end-to-end and produce non-zero profiles + playbooks.

3a. Embedding auto-fallback (Layer A) + setup-init choice (Layer B)

When no cloud embedder is configured, today blocks on stdin:

Your LLM provider doesn't support text embeddings. Which provider for embeddings? [1] OpenAI [2] Gemini

chromadb>=1.5.8 is already a hard dependency, so the bundled all-MiniLM-L6-v2 ONNX model is always physically present — it was just gated behind CLAUDE_SMART_USE_LOCAL_EMBEDDING=1 (the claude-smart public-contract env var).

Three explicit paths in _auto_detect_model for the embedding role:

Path Trigger Picks
1. claude-smart explicit (existing, unchanged) CLAUDE_SMART_USE_LOCAL_EMBEDDING=1 AND chromadb importable local/minilm-l6-v2 (priority 2)
2. Cloud embedder (existing, unchanged) OPENAI_API_KEY or GEMINI_API_KEY set text-embedding-3-small / text-embedding-004
3. Local fallback (NEW) No cloud embedder configured AND chromadb importable local/minilm-l6-v2

Path 1 wins over 2; 2 wins over 3. claude-smart contract preserved (is_local_embedder_available(), register_if_enabled() kept as alias). LiteLLM dispatch in litellm_client.py is also relaxed: local/ model names are routed to LocalEmbedder.get() whenever chromadb is importable, no env-var requirement.

setup init gets a new embedding-provider step (default = local when chromadb available) and a --embedding {local|openai|gemini|auto} flag for non-interactive use. The choice is written to ~/.reflexio/configs/config_<org-id>.json under llm_config.embedding_model_name; the existing 3-tier resolve_model_name (config_override > site_var > auto_detect) ensures it wins over Layer A's auto-detection.

services.py start gate: not-fully-configured + chromadb importable → log INFO "Using local embedder as fallback" and proceed (instead of prompting or erroring).

3b. MiniMax in extraction_agent + search_agent defaults

Added MiniMax M2/M2.7 to _PROVIDER_DEFAULTS[minimax].extraction_agent and .search_agent. Without this, the agentic-v2 path silently failed with Model … lacks tool-calling and no fallback_schema provided and produced 0 profiles.

3c. supports_tool_calling override allowlist

litellm.supports_function_calling("minimax/MiniMax-M2.7") returns False because litellm's registry only catalogues minimax/MiniMax-M2. M2.7 actually supports function calling. Added an allowlist override in tools.py for known-good models the registry hasn't caught up with yet.

3d. TestMinimaxOnlyEnvRegression — auto-detect contract lock-in

Added 5 regression tests in tests/server/llm/test_model_defaults.py that exercise the MiniMax-only env scenario end-to-end through _auto_detect_model and resolve_model_name. Specifically guards against the failure mode that surfaced as task #77 (extractor resolved to gpt-5-mini despite providers list being ['minimax'] — turned out to be benchmark contamination of the org config DB row, but the auto-detect contract was the canary).


4. setup init writes .env to the canonical location (e2e Issue #2)

reflexio setup init was writing .env to Path.cwd() / ".env". When the user runs setup from any directory other than ~/.reflexio/ (e.g. inside a worktree, or via tmux with cwd=worktree-root), the resulting .env lands somewhere unexpected. Subsequent reflexio invocations partially mask the issue via python-dotenv's parent-walk-up, but the worktree gets polluted.

Now: setup commands always write to Path.home() / ".reflexio" / ".env" regardless of CWD. Plus a 2-test regression in tests/cli/test_setup_cmd.py that monkeypatches Path.cwd to a non-home dir and asserts the resulting file ends up at ~/.reflexio/.env.

Also touches cli/env_loader.py — small change to make the canonical path centrally defined so setup_cmd.py and env_loader.py agree.


5. DiskStorage QMD client: stale path + probe timeout (e2e Issue #3)

Two bugs in reflexio/server/services/storage/disk_storage/_qmd_client.py:

  1. Hard-coded /tmp/e2e-disk collection path. On any non-test deployment, the QMD client logged Collection: /tmp/e2e-disk/disk_<org-id> (**/*.md) even though the actual storage dir_path was ~/.reflexio/data/disk_<org-id>/. Test scaffolding leaked into production code paths.
  2. No usable timeout on probe queries. Slow QMD subprocesses caused 60s waits per probe, bloating logs and slowing the publish path.

Fix: use the storage backend's dir_path directly (no env-var fallback to a test path), add a configurable probe timeout (default 5s, callsite-overridable). 8 new tests in tests/server/services/storage/test_disk_storage_file_io.py covering: stale-collection re-registration, configured-timeout-honored, default-timeout-applied, dir_path-mismatch-detected.


Behavioral diff

Embedding paths

User profile Today After
CLAUDE_SMART_USE_LOCAL_EMBEDDING=1 + chromadb local unchanged
OPENAI_API_KEY set OpenAI unchanged
GEMINI_API_KEY set Gemini unchanged
MiniMax / Anthropic / no key interactive prompt blocks services start silent local fallback
setup init, picked local (not an option) local (org config override)
setup init --embedding local (CI) (no flag) local; no prompt
No chromadb importable RuntimeError RuntimeError with clearer install hint

Config update

Action Today After
Update one field via API Send full Config schema; otherwise validation error POST /api/update_config {"extraction_backend": "classic"} works
Update one field via CLI Hand-edit JSON, then set reflexio config update --field extraction_backend=classic
Update one field via Python client Construct full Config object client.update_config({...})

Cache freshness (file-backed self-host)

Trigger Today After
POST /api/set_config / /api/update_config invalidates own cache (existing) unchanged
Hand-edit config file stale until TTL expires (~1 h) picked up within 1 request (mtime probe)
reflexio admin cache invalidate (didn't exist) explicit eviction

Test plan

  • Full submodule suite green
  • +40 new tests across all five threads:
    • Cache (Phase 1+2): test_cache_evicts_when_config_file_modified, test_cache_keeps_when_config_file_unchanged, test_cache_handles_missing_config_path, ConfigStorage.get_version() contract test
    • Admin invalidate: test_admin_invalidate_endpoint_clears_cache, test_admin_invalidate_returns_false_when_not_cached, test_cli_admin_cache_invalidate, test_client_invalidate_cache_method
    • PATCH config: test_update_config_shallow_merges, test_update_config_invalidates_cache, all three CLI input modes (--data, --file, --field), client method behavior
    • Embedding fallback (Layer A + B): test_is_chromadb_importable_when_present/absent, test_register_if_chromadb_available_no_env_var, test_embedding_fallback_to_local_when_no_cloud, test_embedding_fallback_skipped_when_cloud_available, test_embedding_explicit_opt_in_beats_cloud, test_embedding_no_chromadb_raises, test_prompt_embedding_provider_non_tty_picks_local, test_setup_init_includes_embedding_step, test_setup_init_local_is_default, test_setup_init_local_choice_writes_org_config, test_setup_init_embedding_flag_local, test_setup_init_embedding_flag_auto_no_override, test_services_start_proceeds_without_cloud_embedder_when_chromadb_present
    • Tool-calling override: allowlist application + non-listed models still defer to litellm
    • MiniMax-only env auto-detect contract (TestMinimaxOnlyEnvRegression, 5 tests): generation/evaluation/should_run/pre_retrieval all resolve to minimax/MiniMax-M2.7; empty OPENAI_API_KEY= placeholder doesn't promote OpenAI; embedding falls through to local/minilm-l6-v2
    • setup init .env location (2 tests): writes to ~/.reflexio/.env regardless of CWD
    • DiskStorage QMD client (8 tests): stale-collection re-registration, configurable probe timeout, default 5s timeout, dir_path mismatch detection
  • ruff check + ruff format --check + pyright clean on all touched files
  • Live-verified end-to-end on Linux VPS with MINIMAX_API_KEY only via /test-e2e-cli --full: all 4 storage modes (SQLite, Managed, Disk, Self-hosted Supabase) hit Hard PASS criterion (non-zero profile extraction for priya/marcus/elena/raj). Schema Gate green for all modes (Mode D required sg docker -c workaround until Issue feat: add timestamps to profile/playbook dedup prompts for conflict resolution #1 in companion PR ReflexioAI/reflexio-enterprise#47 fixed it at the source).

Backwards compatibility

  • claude-smart users (env var set): zero change. is_local_embedder_available(), priority order, dispatch all preserved. register_if_enabled() is kept as an alias to register_if_chromadb_available().
  • Cloud-embedder users: zero change unless they re-run setup init. Layer A's auto-detection still picks cloud when keys are configured.
  • POST /api/set_config: semantics unchanged (full-schema replace). /api/update_config is purely additive.
  • Existing cache APIget_reflexio, invalidate_reflexio_cache, clear_reflexio_cache, get_cache_stats — signatures unchanged. Phase 1 layered the version probe inside without breaking callers.
  • Public docs at public_docs/content/docs/claude-smart/configuration.mdx — still accurate.

Companion follow-up

After this lands and a release is cut, reflexio-enterprise bumps its submodule pointer + ships Phase 3 (DB-level config_version column → multi-replica cache invalidation in cloud) in ReflexioAI/reflexio-enterprise#47.

Out of scope

  • Reducing total install size of reflexio-ai by moving heavy ML deps to extras. Bigger refactor; revisit after a couple of weeks of usage data.
  • Removing CLAUDE_SMART_USE_LOCAL_EMBEDDING. Public contract — any deprecation cycle would be coordinated with the claude-smart repo, not a side effect of this fix.
  • Quality comparison: local MiniLM (384-dim, zero-padded to 512) vs OpenAI text-embedding-3-small (1536). Local is measurably worse; acceptable for the fallback case, irrelevant for users with cloud keys.

Notable implementation deviations from the original plan

  1. The legacy _prompt_embedding_provider is no longer called from setup init. Plan said to add a step "after" the LLM step, but calling both back-to-back would have asked the user to pick the embedder twice. Replaced the legacy call with the new step in init. Legacy helper preserved for the services start first-run wizard.
  2. The new _choose_embedding_provider step folds in the API-key prompt for openai/gemini when the env var isn't already set. Cleaner UX.
  3. Drive-by: tests/cli/test_service_builders.py::test_invalid_raises_bad_parameter was hardcoded to "postgres" (which became valid in PR feat(storage): add StorageConfigPostgres model #47); switched to "not-a-real-backend".
  4. Cache invalidation: cross-storage current_config_version() abstraction unified file-mtime and DB-version checks behind one tuple-returning method, so the cache module doesn't have to know about storage backends.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added --embedding CLI option to setup commands for explicit embedding provider configuration
    • Added admin cache invalidate command to manually clear server-side cache
    • Added config update command for partial configuration updates
    • Enhanced automatic cache management with out-of-band change detection
    • Extended MiniMax model support for tool-calling capabilities
  • Bug Fixes

    • Fixed local embedding provider initialization without requiring environment variable
    • Improved tool-calling support detection for specific model variants

yilu331 added 5 commits May 8, 2026 21:46
…ilable helpers

Adds two new env-var-independent helpers in local_embedding_provider:

- is_chromadb_importable() — pure check, no env-var dependency
- register_if_chromadb_available() — registers the LiteLLM dispatch
  hook whenever chromadb imports

register_if_enabled() is preserved as a thin alias delegating to the
new function so existing callers keep working. is_local_embedder_available()
is unchanged — CLAUDE_SMART_USE_LOCAL_EMBEDDING continues to drive
provider-priority ordering (the public claude-smart contract).

This is Layer A foundation for the auto-fallback to local embedder
when no cloud embedder is configured.
Adds Path 3 to embedding-role auto-detection in `_auto_detect_model`:
when no provider in `providers` advertises an embedding model AND
`chromadb` is importable, return `local/minilm-l6-v2` instead of
raising. `validate_llm_availability` mirrors the same fallback so
startup no longer blocks users with only a non-embedding LLM key
(Anthropic, MiniMax, etc.).

Path priority is unchanged for existing users:
- claude-smart explicit (CLAUDE_SMART_USE_LOCAL_EMBEDDING=1) → local
- cloud embedder (OPENAI_API_KEY / GEMINI_API_KEY)            → cloud
- new fallback (no cloud + chromadb importable)               → local
- no chromadb                                                 → RuntimeError

The new error message tells the user to either set a cloud key or run
`pip install chromadb` (chromadb is already a hard dep of reflexio,
so this branch is defensive).
…e gate)

`LiteLLMClient.get_embedding(s)` previously short-circuited to
`LocalEmbedder` only when `is_enabled()` was True — which required
`CLAUDE_SMART_USE_LOCAL_EMBEDDING=1`. With Layer A's auto-detection
fallback in place (model_defaults can return `local/minilm-l6-v2`
without the env var), the dispatch needs to honour those resolutions.

Now: any `local/*` model name routes to `LocalEmbedder` whenever
`chromadb` is importable, and raises a clear `pip install chromadb`
error otherwise. The module-init registration call is updated to use
`register_if_chromadb_available` so the local provider is wired up
regardless of the env var.

The nomic-embedder dispatch is unchanged (separate provider with its
own claude-smart contract). The `CLAUDE_SMART_USE_LOCAL_EMBEDDING`
env var still drives provider-priority ordering through
`is_local_embedder_available` — only the dispatch gate moves.
… + --embedding flag)

Add an upfront 'Choose embedding provider' step to 'reflexio setup init'
that defaults to the in-process local MiniLM-L6-v2 embedder when chromadb
is importable. The choice is persisted to the org config file at
~/.reflexio/configs/config_self-host-org.json under
llm_config.embedding_model_name, where it takes priority over runtime
auto-detection so it survives later cloud-key changes.

A new --embedding {local|openai|gemini|auto} flag lets CI / scripted
setups skip the prompt entirely. Default 'auto' means 'ask interactively,
or fall back to runtime auto-detection (Layer A) when there's no TTY.'
Honors REFLEXIO_NONINTERACTIVE=1 the same way.

The legacy _prompt_embedding_provider helper is now non-interactive-aware
(returns the first available choice without prompting when stdin is not a
TTY) and includes 'local' as choice [1] when chromadb is importable. It
still backs the services-start first-run wizard.
…vailable

The first-run gate in 'reflexio services start' used to demand a cloud
embedder (OpenAI / Gemini) and fall through to an interactive prompt when
none was configured — blocking scripted / CI starts on hosts that only
had a non-embedding LLM key (Anthropic, MiniMax, etc.).

Now that Layer A's auto-detection can silently fall back to the in-process
local MiniLM-L6-v2 embedder when chromadb is importable, the gate just
logs 'Using local embedder as fallback' and returns. The existing prompt
and exit-1 paths are preserved for the case where chromadb isn't
importable, so installations missing the dependency still get a clear
error rather than a uvicorn lifespan traceback.

Drive-by: fix a pre-existing test_service_builders test that asserted
'postgres' is an invalid storage backend — it became valid in #47.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors local embedder enablement to depend on chromadb importability, routes local/* embeddings to a chromadb-backed local embedder when available, updates model-auto-detection and startup validation, adds interactive/non-interactive embedding setup with org-config persistence, and introduces config update and admin cache-invalidate APIs plus versioned Reflexio caching.

Changes

Chromadb-Driven Local Embedding Provider

Layer / File(s) Summary
Provider API Contracts
reflexio/server/llm/providers/local_embedding_provider.py
Introduces is_chromadb_importable() and register_if_chromadb_available() and exports them.
Core Provider Implementation
reflexio/server/llm/providers/local_embedding_provider.py
Refactors register_if_enabled() into an alias delegating to chromadb-driven registration; is_local_embedder_available() now reuses the chromadb importability predicate.
LLM Routing
reflexio/server/llm/litellm_client.py
get_embedding/get_embeddings dispatch local/* requests based on chromadb importability and raise LiteLLMClientError when chromadb is unavailable; registration wiring uses register_if_chromadb_available.
Model Defaults & Auto-Detection
reflexio/server/llm/model_defaults.py
Embedding auto-detection prefers cloud embedding-capable providers and falls back to the local ONNX embedder only when chromadb is importable; startup validation requires a generation-capable provider and embedding fallback rules updated.
CLI Setup & Services Integration
reflexio/cli/commands/setup_cmd.py, reflexio/cli/commands/services.py
Adds runtime-built embedding menu (include local only when chromadb importable), non-interactive detection, upfront --embedding handling (auto/local/cloud flags) with org-config persistence, and services startup short-circuit logging when chromadb fallback is available.
CLI Admin & Config Commands
reflexio/cli/app.py, reflexio/cli/commands/admin_cmd.py, reflexio/cli/commands/config_cmd.py
Register new admin sub-app and implement admin cache invalidate; add config update that parses `--data
Client / Server API / Cache / Storage
reflexio/client/client.py, reflexio/server/api.py, reflexio/server/cache/reflexio_cache.py, reflexio/lib/_config.py, reflexio/server/services/configurator/*, reflexio/models/api_schema/domain/entities.py
Add ReflexioClient.update_config and invalidate_cache; add POST /api/update_config (shallow merge + validate + persist + invalidate) and /api/admin/cache/invalidate (org-id verification + eviction); implement versioned Reflexio cache with safe probing; add ConfigStorage.get_version() and LocalFileConfigStorage.get_version(); add ConfigMixin.current_config_version(); add admin request/response models.
Tools Override
reflexio/server/llm/tools.py
Adds _TOOL_CALLING_OVERRIDES and returns True for matching prefixes when litellm reports False; logs debug on override.
Tests / Coverage
tests/cli/*, tests/server/llm/*, tests/server/api_endpoints/*, tests/server/cache/*, tests/client/*
New and updated tests cover chromadb importability checks, provider registration/idempotency, local/* routing and error paths, setup init embedding flows (interactive and non-interactive), services-start fallback logging, model-default resolution, supports_tool_calling overrides, client update/invalidate behaviors, API endpoints, and version-based cache invalidation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • ReflexioAI/reflexio#28: Both PRs modify the same LLM/local-embedding code paths (litellm_client, model_defaults, and the local embedding provider).
  • ReflexioAI/reflexio#30: Related CLI setup embedding-provider flows and flag handling.
  • ReflexioAI/reflexio#27: Related client/server cache invalidation and caching behavior changes.

"🐰 I hopped through code with careful paws,
Chromadb checked, local embedder applause,
Setup writes config, services skip the prompt,
Clients call update, cache invalidates on stomp,
Tests nibble bugs and everything now thaws."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.51% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the three main features: cache invalidation, PATCH config endpoints, and MiniMax e2e fixes, plus setup CLI hardening. It is concise, specific, and reflects the primary changes throughout the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/embedding-fallback-local

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
tests/server/llm/test_local_embedding_provider.py (1)

341-344: ⚡ Quick win

Assert the client-facing exception type here.

This test is supposed to lock down the new local/* failure contract, but allowing either LiteLLMClientError or RuntimeError means a regression to an unwrapped runtime error would still pass. Since LiteLLMClientError is already imported, it's better to pin that explicitly and assert the "chromadb" message on it.

Suggested fix
         with (
             patch.object(litellm_client.litellm, "embedding") as mock_embedding,
-            pytest.raises((LiteLLMClientError, RuntimeError), match="chromadb"),
+            pytest.raises(LiteLLMClientError, match="chromadb"),
         ):
             client.get_embedding("hello", model="local/minilm-l6-v2")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/server/llm/test_local_embedding_provider.py` around lines 341 - 344,
Replace the broad exception tuple in the context manager with the specific
client-facing exception: change pytest.raises((LiteLLMClientError,
RuntimeError), match="chromadb") to pytest.raises(LiteLLMClientError,
match="chromadb") so the test asserts that the code path raises the wrapped
LiteLLMClientError (the patch is around patch.object(litellm_client.litellm,
"embedding") as mock_embedding and the pytest.raises context).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@reflexio/cli/commands/services.py`:
- Around line 69-82: The early-return logic in services.start uses
detect_available_providers() and only checks EMBEDDING_CAPABLE_PROVIDERS, so a
result like ["local"] (embedder-only) causes startup to skip setup even though
no GENERATION provider is present; update the guard to require a
generation-capable provider as well: compute has_generation = any(p in
GENERATION_CAPABLE_PROVIDERS for p in providers) and only return when providers
and has_embedding and has_generation; apply the same additional generation check
to the chromadb fallback branch that currently logs "Using local embedder as
fallback" (use is_chromadb_importable() only to avoid prompting when a
generation provider exists), referencing detect_available_providers,
EMBEDDING_CAPABLE_PROVIDERS, GENERATION_CAPABLE_PROVIDERS,
is_chromadb_importable, and the logging branch to locate the code.

In `@reflexio/cli/commands/setup_cmd.py`:
- Around line 347-350: The code unconditionally persists the "local" embedding
override; before calling _write_embedding_model_to_org_config and returning
_provider_display, validate that embedding_flag == "local" is actually supported
(i.e., the local/Chroma provider is available) using the same
provider-availability check used by the interactive flow (call the existing
provider availability helper such as the function that lists available embedding
providers or an is_chromadb_available helper). If the local provider is not
available, do not write the config; instead surface an error/exit or return the
same failure path the interactive flow uses so users cannot persist an unusable
"local" override. Ensure you reference embedding_flag, _EMBEDDING_MODEL_NAMES,
_write_embedding_model_to_org_config and _provider_display when locating and
changing the code.
- Around line 799-808: The new upfront embedding step is only skipped for
managed servers but should also be skipped for self-hosted servers; update the
conditional around the _choose_embedding_provider call so it uses the same
server-mode detection as claude_code_setup() (i.e., treat self-hosted like
remote/managed) and only prompt for/embed provider when the server is truly
local, not when is_managed is False but the server is self-hosted. Locate the if
guarded by is_managed and extend or replace it to check the self-hosted
detection used in claude_code_setup() so self-hosted runs do not persist a local
embedding override.
- Around line 374-384: The org-config write is happening before validating the
cloud provider API key; move the call to _write_embedding_model_to_org_config so
it occurs only after successful key input/validation. Specifically, after
resolving provider_key/env_var from choices[choice - 1], if env_var is not None
check os.environ.get(env_var) and prompt via typer.prompt and validate
non-empty, calling _set_env_var(env_path, env_var, api_key) first; only after
successful _set_env_var (or when env_var is None or already present) call
_write_embedding_model_to_org_config(_EMBEDDING_MODEL_NAMES[provider_key]) to
avoid mutating llm_config.embedding_model_name on failed key entry.

In `@reflexio/server/llm/model_defaults.py`:
- Around line 432-457: The current validation lets an embedding-only provider
(e.g., providers == ["local"]) pass because the embedding fallback path returns
even when no generation-capable provider exists; change
validate_llm_availability so it first checks for a generation-capable provider
(e.g., look for p where _PROVIDER_DEFAULTS[p].generation) and if none is found
raise a RuntimeError immediately; only after confirming at least one
generation-capable provider should you evaluate embedding availability (using
providers, _PROVIDER_DEFAULTS, _LOCAL_EMBEDDING_PROVIDER and
is_chromadb_importable) to decide whether to fall back to the local ONNX
embedder.

---

Nitpick comments:
In `@tests/server/llm/test_local_embedding_provider.py`:
- Around line 341-344: Replace the broad exception tuple in the context manager
with the specific client-facing exception: change
pytest.raises((LiteLLMClientError, RuntimeError), match="chromadb") to
pytest.raises(LiteLLMClientError, match="chromadb") so the test asserts that the
code path raises the wrapped LiteLLMClientError (the patch is around
patch.object(litellm_client.litellm, "embedding") as mock_embedding and the
pytest.raises context).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e5419f64-cda8-48f7-9627-3b5d4e8d0482

📥 Commits

Reviewing files that changed from the base of the PR and between ecf8edb and 0907e29.

📒 Files selected for processing (11)
  • reflexio/cli/commands/services.py
  • reflexio/cli/commands/setup_cmd.py
  • reflexio/server/llm/litellm_client.py
  • reflexio/server/llm/model_defaults.py
  • reflexio/server/llm/providers/local_embedding_provider.py
  • tests/cli/test_service_builders.py
  • tests/cli/test_services_first_run.py
  • tests/cli/test_setup_cmd.py
  • tests/server/llm/test_litellm_client_unit.py
  • tests/server/llm/test_local_embedding_provider.py
  • tests/server/llm/test_model_defaults.py

Comment thread reflexio/cli/commands/services.py
Comment thread reflexio/cli/commands/setup_cmd.py
Comment thread reflexio/cli/commands/setup_cmd.py
Comment thread reflexio/cli/commands/setup_cmd.py Outdated
Comment thread reflexio/server/llm/model_defaults.py
yilu331 added 2 commits May 8, 2026 22:42
Reviewer noticed that PR #51's deviation #1 only updated the
@app.command("init") flow, leaving @app.command("openclaw") and
@app.command("claude-code") on the legacy _prompt_embedding_provider
path. Inconsistent UX: depending on which setup subcommand the user
runs, they'd see either the new local-default upfront step (with org
config persistence + --embedding flag) or the older 'only OpenAI/Gemini'
prompt.

Apply the _choose_embedding_provider upgrade to both. Both subcommands
now accept the same --embedding {local|openai|gemini|auto} flag for
non-interactive use. claude-code retains its is_remote skip — managed
storage handles embeddings server-side, so no embedding step is
needed in that branch.

The legacy _prompt_embedding_provider helper itself stays in place
for services.py's first-run wizard.

No new tests required: existing test_setup_cmd.py coverage is for the
shared _choose_embedding_provider helper; subcommand wiring is verified
via 'reflexio setup {init,openclaw,claude-code} --help' showing the
--embedding flag.
Surfaced by an e2e run on a Linux VPS with only MINIMAX_API_KEY set.
After PR #51's embedding fallback unblocked publish, the server still
emitted:

  Warning: extraction_agent[default_playbook_extractor] failed:
  No provider in ['minimax'] supports role=extraction_agent.
  Warning: extraction_agent[locomo_facts] failed:
  No provider in ['minimax'] supports role=extraction_agent.

and 'Extraction: 0 profile(s) added'. The publish call returned
success but profiles were silently never created.

_PROVIDER_DEFAULTS['minimax'] previously had no entries for
extraction_agent or search_agent (defaulting to None). Setting both
to 'minimax/MiniMax-M2.7' — the same model already in use for
generation, evaluation, should_run, and pre_retrieval — closes the
gap. MiniMax-M2.7 is MiniMax's flagship general-purpose model and
handles the multi-step tool-calling orchestration the agentic-v2
extraction + search loops require.

Tests: 4 new tests under TestMinimaxAgenticRoles asserting both
roles are set and that _auto_detect_model with providers=['minimax']
resolves to the MiniMax model for both EXTRACTION_AGENT and
SEARCH_AGENT roles. Full test_model_defaults.py: 45 passed (was 41).
Ruff + pyright clean.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/server/llm/test_model_defaults.py (1)

239-243: ⚡ Quick win

Deduplicate chromadb importability mocking with a shared fixture.

The repeated lep + find_spec monkeypatch setup appears in multiple tests and is starting to drift-prone. A small fixture/helper will keep these cases consistent and easier to extend.

♻️ Proposed refactor
+@pytest.fixture
+def set_chromadb_importable(monkeypatch: pytest.MonkeyPatch):
+    from reflexio.server.llm.providers import local_embedding_provider as lep
+
+    def _set(is_importable: bool) -> None:
+        monkeypatch.setattr(
+            lep.importlib.util,
+            "find_spec",
+            lambda name: object() if (is_importable and name == "chromadb") else None,
+        )
+
+    return _set
-        from reflexio.server.llm.providers import local_embedding_provider as lep
-        monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: object())
+        set_chromadb_importable(True)
-        from reflexio.server.llm.providers import local_embedding_provider as lep
-        monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: None)
+        set_chromadb_importable(False)

Also applies to: 250-255, 262-267, 274-278, 296-300, 306-310

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/server/llm/test_model_defaults.py` around lines 239 - 243, Extract the
repeated chromadb importability monkeypatch
(monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: object()))
into a reusable pytest fixture (e.g., mock_chromadb_available) and use it in
tests that import local_embedding_provider as lep and call
resolve_model_name(ModelRole.EMBEDDING); update tests at the locations that
currently call monkeypatch.setattr to call the fixture instead so the chromadb
importability mocking is centralized and consistent across tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@reflexio/cli/commands/setup_cmd.py`:
- Around line 688-693: In openclaw(), the call to
_choose_embedding_provider(env_path, embedding_flag=embedding) runs before the
user selects storage mode and can cause a local embedding choice to be persisted
incorrectly; move that call to after the storage selection logic (the branch
that handles Local vs Managed/Self-hosted) so embeddings are only
prompted/persisted when storage is Local, and ensure any other pre-existing
calls around lines 695-696 are similarly moved/removed; update references to
env_path and embedding_flag accordingly so the embedding prompt happens only in
the Local storage branch.
- Around line 691-693: The integration commands call
_choose_embedding_provider() with the raw embedding flag and currently treat
unknown values as "auto"; add an explicit validation step before calling
_choose_embedding_provider(): check the embedding variable against the
known/allowed embedding provider names (the same set used by init() or by
_choose_embedding_provider's internal mapping) and if the value is not in that
set raise/exit with a clear error message (fail fast) so typos like
"--embedding=opneai" are rejected; update the places where embedding_label =
_choose_embedding_provider(env_path, embedding_flag=embedding) (and the
analogous occurrences around lines 1281-1283) to perform this validation first
and only call _choose_embedding_provider() for valid values.

---

Nitpick comments:
In `@tests/server/llm/test_model_defaults.py`:
- Around line 239-243: Extract the repeated chromadb importability monkeypatch
(monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: object()))
into a reusable pytest fixture (e.g., mock_chromadb_available) and use it in
tests that import local_embedding_provider as lep and call
resolve_model_name(ModelRole.EMBEDDING); update tests at the locations that
currently call monkeypatch.setattr to call the fixture instead so the chromadb
importability mocking is centralized and consistent across tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b192d78e-b249-41f8-928d-557ce26b454c

📥 Commits

Reviewing files that changed from the base of the PR and between 0907e29 and ddf29ec.

📒 Files selected for processing (3)
  • reflexio/cli/commands/setup_cmd.py
  • reflexio/server/llm/model_defaults.py
  • tests/server/llm/test_model_defaults.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • reflexio/server/llm/model_defaults.py

Comment thread reflexio/cli/commands/setup_cmd.py Outdated
Comment thread reflexio/cli/commands/setup_cmd.py Outdated
…llm hasn't catalogued

Surfaced by an e2e run with MINIMAX_API_KEY only: agentic-v2 extraction
emitted 'Model minimax/MiniMax-M2.7 lacks tool-calling and no
fallback_schema provided' and silently skipped profile creation.

Root cause: litellm 1.80.x's model_cost registry has 'minimax/MiniMax-M2'
(supports_function_calling=True) but not 'minimax/MiniMax-M2.7' (the
version the agentic loop actually requests). When litellm's registry
doesn't know a model it returns False, which our reflexio code treated
as authoritative.

MiniMax's vendor docs explicitly say M2.7 supports OpenAI-compatible
function calling — https://platform.minimax.io/docs/guides/text-m2-function-call.
Verified by a live tool-call round-trip:

    litellm.completion(
        model='minimax/MiniMax-M2.7',
        messages=[{'role': 'user', 'content': "What's the weather in Tokyo?"}],
        tools=[{'type': 'function', 'function': {...}}],
        tool_choice='auto',
    )
    # → assistant message with tool_calls=[{name=get_weather, args={location: Tokyo}}]

So the registry is the gap, not the model. Add a small allowlist of model
prefixes (currently just 'minimax/MiniMax-M2') we know support tools, and
override litellm's False to True for those. The list is documented with
per-entry justification (vendor doc URL + verified round-trip) so
maintainers can drop entries when litellm catches up.

Tests: 5 new under TestSupportsToolCallingOverrides covering happy path,
unknown-model False, override True, prefix scoping, and exception fallback.
Total tests/server/llm: 304 passed, 60 skipped. Ruff + pyright clean.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
reflexio/server/llm/tools.py (1)

150-150: ⚡ Quick win

Tighten override prefix matching to avoid accidental family overmatch.

Line 150 uses raw startswith, which can match unintended adjacent model families. Consider boundary-aware matching (==, . or - after prefix) so only the intended family is overridden.

Proposed patch
-        if any(model.startswith(prefix) for prefix in _TOOL_CALLING_OVERRIDES):
+        def _matches_override(model_name: str, prefix: str) -> bool:
+            return (
+                model_name == prefix
+                or model_name.startswith(f"{prefix}.")
+                or model_name.startswith(f"{prefix}-")
+            )
+
+        if any(_matches_override(model, prefix) for prefix in _TOOL_CALLING_OVERRIDES):
             logger.debug(
                 "litellm.supports_function_calling returned False for %s; "
                 "applying override (see _TOOL_CALLING_OVERRIDES)",
                 model,
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@reflexio/server/llm/tools.py` at line 150, The current override check using
any(model.startswith(prefix) for prefix in _TOOL_CALLING_OVERRIDES) can
overmatch (e.g., "gpt-4o" matching "gpt-4"). Replace it with a boundary-aware
test: for each prefix in _TOOL_CALLING_OVERRIDES ensure model equals the prefix
or the prefix is followed by a boundary character (e.g., '.' or '-') — e.g.,
test model == prefix or model.startswith(prefix + ".") or
model.startswith(prefix + "-"), or use a regex like
r'^{re.escape(prefix)}($|[.-])' — update the conditional around that check
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@reflexio/server/llm/tools.py`:
- Line 150: The current override check using any(model.startswith(prefix) for
prefix in _TOOL_CALLING_OVERRIDES) can overmatch (e.g., "gpt-4o" matching
"gpt-4"). Replace it with a boundary-aware test: for each prefix in
_TOOL_CALLING_OVERRIDES ensure model equals the prefix or the prefix is followed
by a boundary character (e.g., '.' or '-') — e.g., test model == prefix or
model.startswith(prefix + ".") or model.startswith(prefix + "-"), or use a regex
like r'^{re.escape(prefix)}($|[.-])' — update the conditional around that check
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 52e33410-1e4e-4275-b2ca-a0076257b636

📥 Commits

Reviewing files that changed from the base of the PR and between ddf29ec and ca33158.

📒 Files selected for processing (2)
  • reflexio/server/llm/tools.py
  • tests/server/llm/test_tools.py

yilu331 added 3 commits May 9, 2026 03:51
Adds a partial-update endpoint that shallow-merges a partial dict over
the org's existing config and round-trips through Config(**merged) to
reuse Pydantic validation for unknown / malformed fields. Nested objects
(e.g. storage_config) are replaced wholesale; deep merge is intentionally
out of scope.

Unblocks single-field flips like {"extraction_backend": "classic"} that
currently fail strict Config(**...) construction in client.set_config()
because callers don't have storage_config in scope.

The existing POST /api/set_config remains the strict full-PUT path and is
untouched.
ReflexioClient.update_config(partial: dict) sends a partial dict
straight to POST /api/update_config without round-tripping through
Config(**...) like set_config does. This is the whole point: callers
can flip a single field without re-supplying required fields like
storage_config.

Rejects non-dict input with TypeError so misuse fails locally with a
clear message instead of a cryptic 422 from the server.
Adds a PATCH-style sibling to `reflexio config set`:

  reflexio config update --data '{"extraction_backend":"classic"}'
  reflexio config update --file partial.json
  reflexio config update --field extraction_backend=classic
  reflexio config update --field window_size=:json:10
  reflexio config update --field llm_config.embedding_model_name=local/x

`--field` is repeatable; the `:json:` value prefix opts into JSON
parsing for non-string scalars; one level of dotted nesting is
supported (multiple dotted entries with the same top key merge).
The command rejects no-args, mixed-source, multi-dot, and missing-`=`
inputs with EXIT_VALIDATION before any HTTP call.

`reflexio config set` is unchanged.
@yilu331 yilu331 changed the title fix(embedder): auto-fallback to local + setup-init embedding step fix: MiniMax-only e2e gaps + PATCH config updates May 9, 2026
yilu331 added 2 commits May 9, 2026 04:45
Detects out-of-band edits to ~/.reflexio/configs/config_<org>.json
(operator edits the file directly while the server is running) and
evicts the stale Reflexio cache entry on the next request.

The check is delegated through ConfigStorage.get_version() so the
cache module stays storage-agnostic. LocalFileConfigStorage stamps
('file', mtime); the abstract base returns None which the cache
treats as 'still fresh' (relying on TTL + explicit invalidation).
DB-backed storages will plug in ('db', version) in a follow-up.
POST /api/admin/cache/invalidate evicts the caller's per-org Reflexio
cache entry. Cross-org invalidation is intentionally not exposed —
the optional org_id body field is a verification token that must
match the caller's authenticated org or the server returns 403.

Use cases:
- Sibling-replica writes the shared DB; this replica's cache is stale
- Operator hand-edits a self-host config file on a backend that does
  not support cheap version probing
- Manual debugging escape hatch for cache-related issues

Wired up through ReflexioClient.invalidate_cache() and the
new 'reflexio admin cache invalidate' CLI command.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
tests/server/cache/test_reflexio_cache.py (1)

299-315: 💤 Low value

Test asserts an internal short-circuit.

assert first.current_config_version.call_count == 1 pins the cache to the "skip re-probe when stamped value is None" optimization. That's a fine optimization, but if the cache is later made strictly conservative (always re-probe on hit), this test will fail despite no observable behavior change — the public contract is just "no auto-eviction when probe returns None". Consider relaxing to >= 1 (or asserting that a is b, which the test already does) so the test tracks contract rather than implementation strategy.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/server/cache/test_reflexio_cache.py` around lines 299 - 315, The test
test_cache_handles_none_version pins implementation by asserting
first.current_config_version.call_count == 1; relax this to assert >= 1 (or
remove that specific call_count check) so the test verifies the public contract
(a is b and no auto-eviction when probe returns None) instead of the internal
short-circuit; update the assertion referencing
first.current_config_version.call_count and/or mock_reflexio_cls.call_count
accordingly while keeping the existing equality check a is b and the intent that
the backend returning None does not cause eviction.
reflexio/server/services/configurator/local_file_config_storage.py (1)

110-128: 💤 Low value

Consider st_mtime_ns for tighter mtime resolution.

stat().st_mtime returns a float (seconds), which on filesystems with 1-second mtime granularity (ext3, FAT, HFS+) and on modern filesystems with nanosecond granularity but float-storage precision loss, can leave back-to-back writes within the same quantum indistinguishable — the cache then keeps serving the old Reflexio for that interval. st_mtime_ns returns an int with full kernel precision, costs the same stat() syscall, and would also let tests/server/services/configurator/test_config_storage_contract.py::test_get_version_changes_after_save drop the 1.1s sleep.

📋 Proposed change
         try:
-            return ("file", Path(self.config_file).stat().st_mtime)
+            return ("file", Path(self.config_file).stat().st_mtime_ns)
         except OSError:
             return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@reflexio/server/services/configurator/local_file_config_storage.py` around
lines 110 - 128, get_version currently returns ("file", st_mtime) as a float
which loses sub-second precision; change the implementation in
local_file_config_storage.py (function get_version) to use
Path(self.config_file).stat().st_mtime_ns and return the integer nanosecond
mtime instead (i.e. ("file", st_mtime_ns)), update the documented return type in
the docstring to tuple[str, int] | None, and keep the same OSError exception
handling so missing/stat-failure still returns None.
reflexio/client/client.py (1)

1293-1322: 💤 Low value

Optional: return the typed AdminInvalidateCacheResponse.

Most client methods return the matching Pydantic response model (e.g. WhoamiResponse, MyConfigResponse, BulkDeleteResponse); invalidate_cache returns a raw dict instead. Since AdminInvalidateCacheResponse already exists in reflexio/models/api_schema/domain/entities.py, returning it would give callers attribute access (resp.invalidated, resp.org_id) and surface schema drift early. set_config is the existing precedent for returning a raw dict, so this is a consistency call rather than a defect — flagging only as an optional refactor.

📋 Proposed change
-    def invalidate_cache(self, org_id: str | None = None) -> dict:
+    def invalidate_cache(
+        self, org_id: str | None = None
+    ) -> AdminInvalidateCacheResponse:
         """..."""
         body: dict[str, str] = {}
         if org_id is not None:
             body["org_id"] = org_id
-        return self._make_request("POST", "/api/admin/cache/invalidate", json=body)
+        response = self._make_request(
+            "POST", "/api/admin/cache/invalidate", json=body
+        )
+        return AdminInvalidateCacheResponse(**response)

(Add AdminInvalidateCacheResponse to the existing from reflexio.models.api_schema.domain.entities import (...) block.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@reflexio/client/client.py` around lines 1293 - 1322, The invalidate_cache
method currently returns a raw dict; import AdminInvalidateCacheResponse into
the existing from reflexio.models.api_schema.domain.entities import ... block
and change invalidate_cache to return an AdminInvalidateCacheResponse by
parsing/constructing the Pydantic model from the dict returned by
self._make_request("POST", "/api/admin/cache/invalidate", json=body) (mirror how
other methods like set_config/WhoamiResponse convert responses to their Pydantic
models) so callers can use resp.invalidated and resp.org_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/server/api_endpoints/test_api_routes.py`:
- Line 186: Replace the hardcoded "/tmp/existing.db" with a platform-aware
temporary path: update the test to construct db_path using a temp directory
(e.g., the pytest tmp_path/tmp_path_factory fixture or tempfile.gettempdir())
and pass that value into
Config(storage_config=StorageConfigSQLite(db_path=...)); keep references to
Config and StorageConfigSQLite so the change only swaps how db_path is obtained.
- Around line 239-247: The test currently allows any >=400 and can hide 5xx
regressions; update the else branch to assert the response is a client error
only: replace "assert response.status_code >= 400" with a stricter check such as
"assert 400 <= response.status_code < 500" (or explicitly list expected codes
like {400, 422}) and keep "mock_reflexio.set_config.assert_not_called()"; locate
this change around the response.status_code check and the
mock_reflexio.set_config usage where merged and Config are referenced.

---

Nitpick comments:
In `@reflexio/client/client.py`:
- Around line 1293-1322: The invalidate_cache method currently returns a raw
dict; import AdminInvalidateCacheResponse into the existing from
reflexio.models.api_schema.domain.entities import ... block and change
invalidate_cache to return an AdminInvalidateCacheResponse by
parsing/constructing the Pydantic model from the dict returned by
self._make_request("POST", "/api/admin/cache/invalidate", json=body) (mirror how
other methods like set_config/WhoamiResponse convert responses to their Pydantic
models) so callers can use resp.invalidated and resp.org_id.

In `@reflexio/server/services/configurator/local_file_config_storage.py`:
- Around line 110-128: get_version currently returns ("file", st_mtime) as a
float which loses sub-second precision; change the implementation in
local_file_config_storage.py (function get_version) to use
Path(self.config_file).stat().st_mtime_ns and return the integer nanosecond
mtime instead (i.e. ("file", st_mtime_ns)), update the documented return type in
the docstring to tuple[str, int] | None, and keep the same OSError exception
handling so missing/stat-failure still returns None.

In `@tests/server/cache/test_reflexio_cache.py`:
- Around line 299-315: The test test_cache_handles_none_version pins
implementation by asserting first.current_config_version.call_count == 1; relax
this to assert >= 1 (or remove that specific call_count check) so the test
verifies the public contract (a is b and no auto-eviction when probe returns
None) instead of the internal short-circuit; update the assertion referencing
first.current_config_version.call_count and/or mock_reflexio_cls.call_count
accordingly while keeping the existing equality check a is b and the intent that
the backend returning None does not cause eviction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9153d290-136c-4c2a-a50d-1e217d1d1893

📥 Commits

Reviewing files that changed from the base of the PR and between ca33158 and ad0559a.

📒 Files selected for processing (18)
  • reflexio/cli/app.py
  • reflexio/cli/commands/admin_cmd.py
  • reflexio/cli/commands/config_cmd.py
  • reflexio/client/client.py
  • reflexio/lib/_config.py
  • reflexio/models/api_schema/domain/entities.py
  • reflexio/server/api.py
  • reflexio/server/cache/reflexio_cache.py
  • reflexio/server/services/configurator/config_storage.py
  • reflexio/server/services/configurator/local_file_config_storage.py
  • tests/cli/test_admin_cmd.py
  • tests/cli/test_config_cmd.py
  • tests/client/test_invalidate_cache.py
  • tests/client/test_update_config.py
  • tests/server/api_endpoints/test_admin_cache.py
  • tests/server/api_endpoints/test_api_routes.py
  • tests/server/cache/test_reflexio_cache.py
  • tests/server/services/configurator/test_config_storage_contract.py
✅ Files skipped from review due to trivial changes (1)
  • tests/client/test_invalidate_cache.py

Comment thread reflexio/server/cache/reflexio_cache.py Outdated
Comment thread tests/server/api_endpoints/test_api_routes.py Outdated
Comment thread tests/server/api_endpoints/test_api_routes.py
@yilu331 yilu331 changed the title fix: MiniMax-only e2e gaps + PATCH config updates feat: cache invalidation + PATCH config endpoints + MiniMax e2e fixes May 9, 2026
yyiilluu and others added 7 commits May 8, 2026 22:58
…n split

Snapshot the source interaction window for each agent playbook at aggregation
time so optimizer evaluation stays replayable when source user playbooks are
later archived, deduped, or replaced by reflection. GEPA now holds out a capped
validation set (max_validation_windows, default 2) and uses the remaining
windows as the training pool, with reflection_minibatch_size controlling
mutation sampling.
… ready

`detect_available_providers()` can legally return `["local"]` (embedder
only) when chromadb is importable but no LLM key is set. The first-run
guard in `services start` and `validate_llm_availability()` both treated
that as a configured state, so the FastAPI lifespan would later crash
deep inside the role resolver with "no provider supports
role=generation".

Add `GENERATION_CAPABLE_PROVIDERS` to model_defaults, require it from
both call sites, and message the user clearly when only an embedding
role is satisfied.

Addresses CodeRabbit comments 3211641364 and 3211641388.
Five related fixes to the upfront embedding-provider step:

- ``--embedding=local`` now refuses to persist when chromadb isn't
  importable, matching the interactive flow which hides that option.
- The org-config write moves AFTER cloud-key validation, so a blank
  API key no longer leaves ``llm_config.embedding_model_name``
  mutated to a provider that still isn't configured.
- ``setup init`` skips the embedding step for Self-hosted Reflexio
  (in addition to Managed) — the remote server owns its own embedding
  model config, and a local override would just shadow the operator's
  choice.
- ``setup openclaw`` runs storage selection BEFORE the embedding step
  so the same Self-hosted skip applies there too.
- ``setup openclaw`` and ``setup claude-code`` now validate
  ``--embedding`` against the same set ``setup init`` does, so a typo
  (``--embedding=opneai``) fails fast instead of falling through to
  ``auto``.

Addresses CodeRabbit comments 3211641370, 3211641376, 3211641386,
3211861475, and 3211861477.
``_probe_version_safe`` returned ``None`` for both real failures and
backends without a version probe, which collapsed two very different
states into the same "skip future probes" branch. A single transient
error at construction time would permanently disable version-based
auto-eviction for that cache entry, leaving it stale until TTL or
explicit invalidation kicked in.

The fix:

- Introduce a ``_PROBE_FAILED`` sentinel for transient errors.
- On a hit-time probe failure, keep the cached instance and DON'T
  mutate the stamp — the next request retries the probe.
- On a construction-time probe failure, serve the instance once but
  skip caching it. The next request rebuilds and re-probes; once the
  backend recovers, the entry is cached with a real stamp and
  eviction works as designed.

Existing tests for "no probe available" (``cached_version is None``)
are unchanged — that branch still means "permanently unprobeable".

Addresses CodeRabbit comment 3212521575.
Two small hardening fixes to ``tests/server/api_endpoints/test_api_routes.py``:

- Replace hardcoded ``/tmp/existing.db`` with
  ``Path(tempfile.gettempdir()) / "existing.db"`` so the test is
  platform-aware and clears Ruff S108.
- Tighten ``test_unknown_field_does_not_leak_to_set_config`` from
  ``response.status_code >= 400`` to ``response.status_code in {400,
  422}`` so a 5xx regression on the unknown-field path no longer
  silently passes the assertion.

Addresses CodeRabbit comments 3212521581 and 3212521586.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/cli/test_services_first_run.py (1)

217-217: ⚡ Quick win

Move import logging to module level.

The logging import on line 217 should be moved to the top of the file with other imports to follow PEP 8 conventions. While this works, module-level imports improve readability and are the standard practice in Python.

📦 Proposed refactor

At the top of the file, add the import with other standard library imports:

 from pathlib import Path
 from unittest.mock import patch
+import logging

 import pytest

Then remove line 217 from inside the test function.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cli/test_services_first_run.py` at line 217, Move the "import logging"
statement out of the test body and into the module-level imports at the top of
the file alongside other standard-library imports, then remove the in-test
"import logging" (the one shown in the diff) so the test uses the module-level
logging import per PEP 8.
tests/cli/test_setup_cmd.py (2)

769-801: 💤 Low value

Consider verifying the exit code for consistency.

Line 795 uses pytest.raises(typer.Exit) without checking the exit code, whereas test_local_flag_without_chromadb_exits (line 763) explicitly asserts exc.value.exit_code == 1. For consistency and stronger validation, consider capturing the exception and verifying exit_code == 1 here as well.

This pattern also applies to lines 852 and 859.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cli/test_setup_cmd.py` around lines 769 - 801, Wrap the
pytest.raises(typer.Exit) context in a variable (e.g., as exc) when calling
_choose_embedding_provider in test_blank_cloud_key_does_not_persist_org_config,
then assert exc.value.exit_code == 1 after the block to ensure the Exit raised
has the expected code; apply the same pattern to the other tests mentioned (the
ones at the other occurrences) for consistent validation.

612-669: ⚡ Quick win

Consider consolidating duplicate test logic.

Lines 612-641 (test_setup_init_local_is_default) and 642-669 (test_setup_init_local_choice_writes_org_config) appear to test the same behavior with identical mock responses [1, 2, "sk-ant-test", 1]. Both verify that selecting local (choice 1) persists "local/minilm-l6-v2" to the org config. The only difference is that one uses the _read_org_config helper while the other reads JSON directly.

If both tests are intended to verify the same outcome, consider consolidating them into a single test. If they're meant to cover distinct scenarios (e.g., default vs. explicit selection), clarify the distinction in test names and docstrings, or use different prompt sequences.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cli/test_setup_cmd.py` around lines 612 - 669, Two tests
(test_setup_init_local_is_default and
test_setup_init_local_choice_writes_org_config) duplicate the same mocked prompt
sequence and assertion for init/llm_config.embedding_model_name; either
consolidate them into one test or make their intent explicit. Fix by either (A)
removing one test and keeping a single test that calls init(...) and asserts via
_read_org_config or direct JSON read, or (B) if you want to cover "press Enter
for default" vs "explicit choice", change test_setup_init_local_is_default to
simulate Enter (e.g. have the typer.prompt side_effect return the empty/default
input that your prompt handling maps to 1) while keeping
test_setup_init_local_choice_writes_org_config using the explicit [1, 2,
"sk-ant-test", 1] sequence; reference init, test_setup_init_local_is_default,
test_setup_init_local_choice_writes_org_config, and _read_org_config to locate
and update the tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/cli/test_setup_cmd.py`:
- Around line 802-847: The test test_setup_init_self_hosted_skips_embedding_step
supplies four prompt responses in the typer.prompt side_effect ([3,
"http://localhost:8081", "rflx-key", 1]) even though init (from
reflexio.cli.commands.setup_cmd) will only prompt three times for Self-hosted
storage; remove the unused fourth value (1) from the side_effect array or
alternatively update the inline comment to state that the embedding prompt is
skipped because is_remote=True for Self-hosted storage (so only three prompts
occur) to make intent clear.

---

Nitpick comments:
In `@tests/cli/test_services_first_run.py`:
- Line 217: Move the "import logging" statement out of the test body and into
the module-level imports at the top of the file alongside other standard-library
imports, then remove the in-test "import logging" (the one shown in the diff) so
the test uses the module-level logging import per PEP 8.

In `@tests/cli/test_setup_cmd.py`:
- Around line 769-801: Wrap the pytest.raises(typer.Exit) context in a variable
(e.g., as exc) when calling _choose_embedding_provider in
test_blank_cloud_key_does_not_persist_org_config, then assert
exc.value.exit_code == 1 after the block to ensure the Exit raised has the
expected code; apply the same pattern to the other tests mentioned (the ones at
the other occurrences) for consistent validation.
- Around line 612-669: Two tests (test_setup_init_local_is_default and
test_setup_init_local_choice_writes_org_config) duplicate the same mocked prompt
sequence and assertion for init/llm_config.embedding_model_name; either
consolidate them into one test or make their intent explicit. Fix by either (A)
removing one test and keeping a single test that calls init(...) and asserts via
_read_org_config or direct JSON read, or (B) if you want to cover "press Enter
for default" vs "explicit choice", change test_setup_init_local_is_default to
simulate Enter (e.g. have the typer.prompt side_effect return the empty/default
input that your prompt handling maps to 1) while keeping
test_setup_init_local_choice_writes_org_config using the explicit [1, 2,
"sk-ant-test", 1] sequence; reference init, test_setup_init_local_is_default,
test_setup_init_local_choice_writes_org_config, and _read_org_config to locate
and update the tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 76089a7c-4578-42cc-a9c1-cd38e5cb38c1

📥 Commits

Reviewing files that changed from the base of the PR and between ad0559a and 13dcbee.

📒 Files selected for processing (10)
  • reflexio/cli/commands/services.py
  • reflexio/cli/commands/setup_cmd.py
  • reflexio/models/api_schema/domain/entities.py
  • reflexio/server/cache/reflexio_cache.py
  • reflexio/server/llm/model_defaults.py
  • tests/cli/test_services_first_run.py
  • tests/cli/test_setup_cmd.py
  • tests/server/api_endpoints/test_api_routes.py
  • tests/server/cache/test_reflexio_cache.py
  • tests/server/llm/test_model_defaults.py
🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/server/api_endpoints/test_api_routes.py
  • reflexio/server/llm/model_defaults.py
  • reflexio/models/api_schema/domain/entities.py
  • tests/server/cache/test_reflexio_cache.py
  • tests/server/llm/test_model_defaults.py
  • reflexio/server/cache/reflexio_cache.py
  • reflexio/cli/commands/services.py
  • reflexio/cli/commands/setup_cmd.py

Comment thread tests/cli/test_setup_cmd.py
yyiilluu and others added 2 commits May 9, 2026 00:11
…by adoption path

Removes the unreleased allow_single_window_commit flag — single-window
eligibility is now governed by min_commit_windows vs. the validation
holdout size. Skips optimization runs upfront when the target kind has
no auto-update path (no point producing a winner that cannot adopt).
Always passes cache_evaluation=True to GEPA so the same accepted
candidate/window pair is never re-scored across iterations. Reuses the
trainset as valset when train and validation windows are identical
(single-window mode). Lowers default GEPA budget (max_metric_calls
40 -> 20, max_turns 5 -> 4) for cheaper nightly runs.
Reproduces the e2e bug where a fresh setup-init with only
MINIMAX_API_KEY in env saw the extractor pick gpt-5-mini at runtime.
The defaults code itself (PR #51) is correct; this locks that down
so a future provider-priority edit can't silently regress
MiniMax-only deployments without a test failure.

Covers:
- generation/should_run/evaluation/pre_retrieval/extraction_agent/
  search_agent all resolve to minimax/MiniMax-M2.7 with only
  MINIMAX_API_KEY in env.
- empty OPENAI_API_KEY= placeholder line (.env.example default)
  must NOT promote OpenAI in detect_available_providers.
- chromadb-importable path falls through embedding role to local
  ONNX embedder — the e2e Mode A scenario.

The user-visible bug surfaced through these tests passing while a
contaminated org configuration_json (persisted from a prior LoCoMo
benchmark run) overrode the auto-detect at runtime; tracked
separately.
yilu331 added 3 commits May 9, 2026 09:32
`load_reflexio_env` deliberately honors a CWD-local `./.env` first so
`services start` can pick up project-level overrides. That's the wrong
default for `setup init` / `setup openclaw` / `setup claude-code`: when
run from a worktree or any directory with an unrelated `.env`, the
wizard wrote `REFLEXIO_*` keys into that file, polluting it and
breaking the documented `~/.reflexio/.env` invariant.

Add `ensure_user_env_for_setup`, which always targets `~/.reflexio/.env`
(creating from the bundled template if missing). Use it in all three
setup entry points. Existing tests' monkeypatch targets updated; new
regression test asserts an unrelated CWD `.env` is left untouched.
…meout

Two related papercuts in the QMD-backed DiskStorage:

1. QMD's collection registry is global. A previous test run that
   registered the same collection name against ``/tmp/e2e-disk/...``
   left a stale entry behind, and ``_ensure_collection`` accepted it
   purely by name match — every search ran against the wrong directory
   and logs showed the misleading hardcoded path. Detect path mismatch
   in ``_ensure_collection`` and ``collection remove`` + re-add when
   the registered path differs from ``self.collection_path``.

2. ``status()`` and ``collection list`` are fast probes but were using
   the subcommand-default 60s timeout. When the qmd subprocess wedges,
   each probe blocked the request thread for a full minute and bloated
   logs. Add a configurable ``probe_timeout_seconds`` (default 5s) and
   plumb it through the probe call sites.

Both call-site signatures stay backward compatible.
@yilu331 yilu331 changed the title feat: cache invalidation + PATCH config endpoints + MiniMax e2e fixes feat: cache invalidation + PATCH config + MiniMax e2e fixes + setup-init/QMD hardening May 9, 2026
@yilu331 yilu331 merged commit cea6bdd into main May 9, 2026
1 check passed
yilu331 added a commit that referenced this pull request May 10, 2026
…update_config validation + concurrent-publish fix (#57)

## Summary

Four issues surfaced by `/test-backend-pipeline` (12-phase real-LLM run
against the recently-merged main). All four fixed.

| # | Severity | Issue | Action |
|---|---|---|---|
| R1 | P2 | Profile rerun dedup deletes CURRENT-status profiles | Fixed
(`9fbb059`) |
| R3 | P3 | `clustering_similarity` default of 0.5 too tight for local
minilm-l6-v2 | Fixed (`6060e5e`) |
| R4 | P3 | `update_config` returned 500 on bad partial input | Fixed
(`5c6efba`) |
| R2 | P2 | Concurrent publishes lose playbook extraction (=
reflexio-enterprise#59) | Repro test (`89ab5b5`) + fix (`20f951c`) |

---

## R1 — Profile rerun dedup must search PENDING, not CURRENT profiles

**Bug**: `test_rerun_profile_generation_end_to_end` was failing 3/3
attempts. Server log showed `Profile updates after deduplication: 2
profiles, 2 existing to delete` — the 2 status=None CURRENT profiles got
deleted during a rerun, which is supposed to produce PENDING-status
profiles for preview.

**Root cause**: when `output_pending_status=True` (rerun mode),
`ProfileDeduplicator._retrieve_existing_profiles()` was called with
`status_filter=[None]`, pulling CURRENT profiles. Dedup decided newly
extracted profiles were duplicates of the current ones →
`_process_results` then called `delete_user_profile()` on the
existing-current set, collapsing CURRENT count from N to 0.

**Fix**: when in rerun mode, retrieve only existing PENDING profiles for
dedup (`status_filter=["pending"]`). Rerun is now idempotent against
existing pending state and leaves CURRENT alone.

**Verification**: `test_rerun_profile_generation_end_to_end` now PASSES
(76s, 1/1).

**Tests added** (4 in
`tests/server/services/profile/test_profile_deduplicator.py::TestRetrieveExistingProfilesStatusFilter`):
- `test_normal_mode_filters_to_current` (status_filter=[None])
- `test_rerun_mode_filters_to_pending` (status_filter=["pending"])
- `test_default_value_when_unspecified`
- `test_rerun_dedup_against_pending_only`

---

## R3 — Aggregator `clustering_similarity` default 0.5 → 0.3

**Symptom**: with 4 thematically-related but topically-diverse raw
feedbacks + local minilm-l6-v2 embeddings (384-dim, zero-padded to 512),
default 0.5 cosine threshold produced 0 clusters during
`run_playbook_aggregation`. Lowering to 0.15 produced 1 high-quality
cluster.

**Investigation**: the threshold is genuinely embedding-dependent.
OpenAI/Gemini embeddings have a wider dynamic range than minilm-l6-v2 —
same threshold isn't right for both.

**Fix (option a from the triage)**: lower the global default to 0.3 — a
compromise between cloud and local embeddings. Users can still override
per-pattern.

Considered but rejected: option (b) embedding-provider-aware default.
The sentinel-default + late binding added serialization complexity for
marginal benefit when users can already override.

**Tests**: existing aggregator tests still pass. `test_validators.py`
updated to assert the new default.

---

## R4 — `update_config` returns 422 (not 500) for invalid partial; doc
nested-list semantics

**Symptom from the test report**: PATCHing
`playbook_configs[*].playbook_aggregator_config.{min_cluster_size,
clustering_similarity}` returned `success: true` but follow-up
`get_config` showed only originally-set fields.

**Investigation findings**: the original test report was misleading —
shallow merge IS what `/api/update_config` is intended to do (per the
docstring shipped in #51). The "success:true but get_config showed only
originally-set fields" is consistent with the partial not actually
including `playbook_configs` (sibling preservation = correct behavior).
The uvicorn auto-reload between the failing PATCH and the verifying GET
may have masked a different issue but was not reproducible in isolation.

**Found a real adjacent bug**: invalid partial input (e.g. wrong type
for a field) caused a `ValidationError` → uncaught → 500 Internal Server
Error. Fixed: catch `ValidationError`, return 422 with a structured
detail.

**Hardened CLI docs**: `reflexio config update --field` help block now
warns that nested-list deep-key syntax is not supported by the
shallow-merge endpoint — use `set_config` (full replace) for nested
updates.

**Tests added** (2 in `tests/server/api_endpoints/test_api_routes.py`):
- `test_update_config_invalid_partial_returns_422`
- `test_update_config_preserves_unspecified_top_level_fields`

---

## R2 — Concurrent playbook extraction loss (= reflexio-enterprise#59),
now FIXED

**Symptom**: 3 publishes ~2s apart for distinct users → only 2 of 3
produced raw playbooks. Per-org `playbook_generation::{org_id}::lock`
serializes them; subsequent batches set `pending_request_id`. When the
lock releases, the rerun fires but hits the stride pre-filter (`new=0,
stride_size=5`) because the bookmark has advanced past the held
interactions.

**Root cause** — 3 confounding factors:
1. Per-org playbook lock — not per-user
2. Single-slot `pending_request_id` (last-wins) — earlier held requests
silently dropped
3. Rerun fires with the *current bookmark*, not the original held
request payload — by the time the rerun runs, `new=0`

**Fix shipped (option b: pending-request queue + payload-aware drain)**
in commit `20f951c`. Replaces the single-slot `pending_request_id` with
an ordered queue at
`_operation_state.operation_state.pending_request_queue` (JSON column).
On lock release, drain the queue in FIFO order, each entry carrying its
original `request_id` so the rerun extracts the right interactions
regardless of bookmark position. Lock stays per-org (cross-user dedup
invariants preserved); the queue serializes drains within the org.

**Repro test** flipped from `@xfail` to passing:
`test_concurrent_publishes_distinct_users_all_produce_playbooks PASSED
[100%]` — 192s real-LLM run, 3 concurrent publishes, all 3 produce ≥1
raw playbook.

**Storage layer changes**:
- Abstract `try_acquire_in_progress_lock` signature gains `payload: dict
| None` (forwarded as JSON)
- SQLite + Disk: queue mutation in the operation-state row
- Supabase + Postgres: new migration
`20260509190000_pending_request_queue.sql` (re-creates the RPC with
`p_payload` jsonb param + queue mutation under `FOR UPDATE`)
- Companion enterprise PR ReflexioAI/reflexio-enterprise#NEW carries the
migration + adapter

**Durability**: queue is **persistent** across
SQLite/Disk/Supabase/Postgres. On server crash mid-drain, the existing
5-min stale-lock detector reclaims the lock and the queue resets on
re-acquire. Acceptable because the underlying interactions are
persisted; the next publish runs a fresh extraction over the union of
pending interactions.

**Tests added** (15):
- 8 unit tests in `tests/server/services/test_operation_state_utils.py`
(acquire-with-payload, queue ordering, drain, holder-retry idempotency,
payload preservation, drain-empty, etc.)
- 5 contract tests in
`tests/server/services/storage/test_storage_contract_operations.py`
(parametrized across SQLite + Disk; Supabase covered in enterprise
tests)
- 2 in
`tests/server/services/test_base_generation_service.py::TestPayloadAwareDrain`

**Legacy fallback**: `pending_request_id` legacy field is mirrored to
the most-recent enqueued request_id for one release window so a partial
deploy (queue-aware writers + non-queue-aware readers) still drains via
the legacy fallback path.

---

## Test plan

- [x] All 4 new R1 unit tests pass
- [x] R3 default change: `tests/models/test_validators.py` updated, all
model-validator tests pass
- [x] R4: 2 new API endpoint tests pass; 422 returned for invalid
partial
- [x] R2: 15 new tests (8 unit + 5 contract + 2 base-service); xfail
flipped to passing
- [x] **`test_rerun_profile_generation_end_to_end` now PASSES** (76s,
1/1; was 0/3 pre-fix)
- [x] **`test_concurrent_publishes_distinct_users_all_produce_playbooks`
now PASSES** (192s real-LLM, 3 users × 3 concurrent publishes, all
produce ≥1 raw playbook)
- [x] `ruff check` + `ruff format --check` + `pyright` clean on touched
files

## Pre-existing issues surfaced (NOT addressed here)

The investigation also surfaced two pre-existing issues on `cea6bdd`
that are not regressions caused by this work:
1. `reflexio/server/services/profile/profile_generation_service.py:512`
pyright error — `processed_user_ids` may contain `None` but
`get_user_profile` requires `str`
2. `tests/server/services/playbook/test_playbook_aggregator.py::TestRun`
— 12 tests fail when run as a suite (test ordering / shared MagicMock
state)

Both predate this branch. Worth filing separately.

## Companion PR

ReflexioAI/reflexio-enterprise#NEW will bump the submodule pointer and
ship the R5 doc fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants