feat: cache invalidation + PATCH config + MiniMax e2e fixes + setup-init/QMD hardening#51
Conversation
…ilable helpers Adds two new env-var-independent helpers in local_embedding_provider: - is_chromadb_importable() — pure check, no env-var dependency - register_if_chromadb_available() — registers the LiteLLM dispatch hook whenever chromadb imports register_if_enabled() is preserved as a thin alias delegating to the new function so existing callers keep working. is_local_embedder_available() is unchanged — CLAUDE_SMART_USE_LOCAL_EMBEDDING continues to drive provider-priority ordering (the public claude-smart contract). This is Layer A foundation for the auto-fallback to local embedder when no cloud embedder is configured.
Adds Path 3 to embedding-role auto-detection in `_auto_detect_model`: when no provider in `providers` advertises an embedding model AND `chromadb` is importable, return `local/minilm-l6-v2` instead of raising. `validate_llm_availability` mirrors the same fallback so startup no longer blocks users with only a non-embedding LLM key (Anthropic, MiniMax, etc.). Path priority is unchanged for existing users: - claude-smart explicit (CLAUDE_SMART_USE_LOCAL_EMBEDDING=1) → local - cloud embedder (OPENAI_API_KEY / GEMINI_API_KEY) → cloud - new fallback (no cloud + chromadb importable) → local - no chromadb → RuntimeError The new error message tells the user to either set a cloud key or run `pip install chromadb` (chromadb is already a hard dep of reflexio, so this branch is defensive).
…e gate) `LiteLLMClient.get_embedding(s)` previously short-circuited to `LocalEmbedder` only when `is_enabled()` was True — which required `CLAUDE_SMART_USE_LOCAL_EMBEDDING=1`. With Layer A's auto-detection fallback in place (model_defaults can return `local/minilm-l6-v2` without the env var), the dispatch needs to honour those resolutions. Now: any `local/*` model name routes to `LocalEmbedder` whenever `chromadb` is importable, and raises a clear `pip install chromadb` error otherwise. The module-init registration call is updated to use `register_if_chromadb_available` so the local provider is wired up regardless of the env var. The nomic-embedder dispatch is unchanged (separate provider with its own claude-smart contract). The `CLAUDE_SMART_USE_LOCAL_EMBEDDING` env var still drives provider-priority ordering through `is_local_embedder_available` — only the dispatch gate moves.
… + --embedding flag)
Add an upfront 'Choose embedding provider' step to 'reflexio setup init'
that defaults to the in-process local MiniLM-L6-v2 embedder when chromadb
is importable. The choice is persisted to the org config file at
~/.reflexio/configs/config_self-host-org.json under
llm_config.embedding_model_name, where it takes priority over runtime
auto-detection so it survives later cloud-key changes.
A new --embedding {local|openai|gemini|auto} flag lets CI / scripted
setups skip the prompt entirely. Default 'auto' means 'ask interactively,
or fall back to runtime auto-detection (Layer A) when there's no TTY.'
Honors REFLEXIO_NONINTERACTIVE=1 the same way.
The legacy _prompt_embedding_provider helper is now non-interactive-aware
(returns the first available choice without prompting when stdin is not a
TTY) and includes 'local' as choice [1] when chromadb is importable. It
still backs the services-start first-run wizard.
…vailable The first-run gate in 'reflexio services start' used to demand a cloud embedder (OpenAI / Gemini) and fall through to an interactive prompt when none was configured — blocking scripted / CI starts on hosts that only had a non-embedding LLM key (Anthropic, MiniMax, etc.). Now that Layer A's auto-detection can silently fall back to the in-process local MiniLM-L6-v2 embedder when chromadb is importable, the gate just logs 'Using local embedder as fallback' and returns. The existing prompt and exit-1 paths are preserved for the case where chromadb isn't importable, so installations missing the dependency still get a clear error rather than a uvicorn lifespan traceback. Drive-by: fix a pre-existing test_service_builders test that asserted 'postgres' is an invalid storage backend — it became valid in #47.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors local embedder enablement to depend on chromadb importability, routes ChangesChromadb-Driven Local Embedding Provider
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
tests/server/llm/test_local_embedding_provider.py (1)
341-344: ⚡ Quick winAssert the client-facing exception type here.
This test is supposed to lock down the new
local/*failure contract, but allowing eitherLiteLLMClientErrororRuntimeErrormeans a regression to an unwrapped runtime error would still pass. SinceLiteLLMClientErroris already imported, it's better to pin that explicitly and assert the"chromadb"message on it.Suggested fix
with ( patch.object(litellm_client.litellm, "embedding") as mock_embedding, - pytest.raises((LiteLLMClientError, RuntimeError), match="chromadb"), + pytest.raises(LiteLLMClientError, match="chromadb"), ): client.get_embedding("hello", model="local/minilm-l6-v2")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/llm/test_local_embedding_provider.py` around lines 341 - 344, Replace the broad exception tuple in the context manager with the specific client-facing exception: change pytest.raises((LiteLLMClientError, RuntimeError), match="chromadb") to pytest.raises(LiteLLMClientError, match="chromadb") so the test asserts that the code path raises the wrapped LiteLLMClientError (the patch is around patch.object(litellm_client.litellm, "embedding") as mock_embedding and the pytest.raises context).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@reflexio/cli/commands/services.py`:
- Around line 69-82: The early-return logic in services.start uses
detect_available_providers() and only checks EMBEDDING_CAPABLE_PROVIDERS, so a
result like ["local"] (embedder-only) causes startup to skip setup even though
no GENERATION provider is present; update the guard to require a
generation-capable provider as well: compute has_generation = any(p in
GENERATION_CAPABLE_PROVIDERS for p in providers) and only return when providers
and has_embedding and has_generation; apply the same additional generation check
to the chromadb fallback branch that currently logs "Using local embedder as
fallback" (use is_chromadb_importable() only to avoid prompting when a
generation provider exists), referencing detect_available_providers,
EMBEDDING_CAPABLE_PROVIDERS, GENERATION_CAPABLE_PROVIDERS,
is_chromadb_importable, and the logging branch to locate the code.
In `@reflexio/cli/commands/setup_cmd.py`:
- Around line 347-350: The code unconditionally persists the "local" embedding
override; before calling _write_embedding_model_to_org_config and returning
_provider_display, validate that embedding_flag == "local" is actually supported
(i.e., the local/Chroma provider is available) using the same
provider-availability check used by the interactive flow (call the existing
provider availability helper such as the function that lists available embedding
providers or an is_chromadb_available helper). If the local provider is not
available, do not write the config; instead surface an error/exit or return the
same failure path the interactive flow uses so users cannot persist an unusable
"local" override. Ensure you reference embedding_flag, _EMBEDDING_MODEL_NAMES,
_write_embedding_model_to_org_config and _provider_display when locating and
changing the code.
- Around line 799-808: The new upfront embedding step is only skipped for
managed servers but should also be skipped for self-hosted servers; update the
conditional around the _choose_embedding_provider call so it uses the same
server-mode detection as claude_code_setup() (i.e., treat self-hosted like
remote/managed) and only prompt for/embed provider when the server is truly
local, not when is_managed is False but the server is self-hosted. Locate the if
guarded by is_managed and extend or replace it to check the self-hosted
detection used in claude_code_setup() so self-hosted runs do not persist a local
embedding override.
- Around line 374-384: The org-config write is happening before validating the
cloud provider API key; move the call to _write_embedding_model_to_org_config so
it occurs only after successful key input/validation. Specifically, after
resolving provider_key/env_var from choices[choice - 1], if env_var is not None
check os.environ.get(env_var) and prompt via typer.prompt and validate
non-empty, calling _set_env_var(env_path, env_var, api_key) first; only after
successful _set_env_var (or when env_var is None or already present) call
_write_embedding_model_to_org_config(_EMBEDDING_MODEL_NAMES[provider_key]) to
avoid mutating llm_config.embedding_model_name on failed key entry.
In `@reflexio/server/llm/model_defaults.py`:
- Around line 432-457: The current validation lets an embedding-only provider
(e.g., providers == ["local"]) pass because the embedding fallback path returns
even when no generation-capable provider exists; change
validate_llm_availability so it first checks for a generation-capable provider
(e.g., look for p where _PROVIDER_DEFAULTS[p].generation) and if none is found
raise a RuntimeError immediately; only after confirming at least one
generation-capable provider should you evaluate embedding availability (using
providers, _PROVIDER_DEFAULTS, _LOCAL_EMBEDDING_PROVIDER and
is_chromadb_importable) to decide whether to fall back to the local ONNX
embedder.
---
Nitpick comments:
In `@tests/server/llm/test_local_embedding_provider.py`:
- Around line 341-344: Replace the broad exception tuple in the context manager
with the specific client-facing exception: change
pytest.raises((LiteLLMClientError, RuntimeError), match="chromadb") to
pytest.raises(LiteLLMClientError, match="chromadb") so the test asserts that the
code path raises the wrapped LiteLLMClientError (the patch is around
patch.object(litellm_client.litellm, "embedding") as mock_embedding and the
pytest.raises context).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e5419f64-cda8-48f7-9627-3b5d4e8d0482
📒 Files selected for processing (11)
reflexio/cli/commands/services.pyreflexio/cli/commands/setup_cmd.pyreflexio/server/llm/litellm_client.pyreflexio/server/llm/model_defaults.pyreflexio/server/llm/providers/local_embedding_provider.pytests/cli/test_service_builders.pytests/cli/test_services_first_run.pytests/cli/test_setup_cmd.pytests/server/llm/test_litellm_client_unit.pytests/server/llm/test_local_embedding_provider.pytests/server/llm/test_model_defaults.py
Reviewer noticed that PR #51's deviation #1 only updated the @app.command("init") flow, leaving @app.command("openclaw") and @app.command("claude-code") on the legacy _prompt_embedding_provider path. Inconsistent UX: depending on which setup subcommand the user runs, they'd see either the new local-default upfront step (with org config persistence + --embedding flag) or the older 'only OpenAI/Gemini' prompt. Apply the _choose_embedding_provider upgrade to both. Both subcommands now accept the same --embedding {local|openai|gemini|auto} flag for non-interactive use. claude-code retains its is_remote skip — managed storage handles embeddings server-side, so no embedding step is needed in that branch. The legacy _prompt_embedding_provider helper itself stays in place for services.py's first-run wizard. No new tests required: existing test_setup_cmd.py coverage is for the shared _choose_embedding_provider helper; subcommand wiring is verified via 'reflexio setup {init,openclaw,claude-code} --help' showing the --embedding flag.
Surfaced by an e2e run on a Linux VPS with only MINIMAX_API_KEY set. After PR #51's embedding fallback unblocked publish, the server still emitted: Warning: extraction_agent[default_playbook_extractor] failed: No provider in ['minimax'] supports role=extraction_agent. Warning: extraction_agent[locomo_facts] failed: No provider in ['minimax'] supports role=extraction_agent. and 'Extraction: 0 profile(s) added'. The publish call returned success but profiles were silently never created. _PROVIDER_DEFAULTS['minimax'] previously had no entries for extraction_agent or search_agent (defaulting to None). Setting both to 'minimax/MiniMax-M2.7' — the same model already in use for generation, evaluation, should_run, and pre_retrieval — closes the gap. MiniMax-M2.7 is MiniMax's flagship general-purpose model and handles the multi-step tool-calling orchestration the agentic-v2 extraction + search loops require. Tests: 4 new tests under TestMinimaxAgenticRoles asserting both roles are set and that _auto_detect_model with providers=['minimax'] resolves to the MiniMax model for both EXTRACTION_AGENT and SEARCH_AGENT roles. Full test_model_defaults.py: 45 passed (was 41). Ruff + pyright clean.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/server/llm/test_model_defaults.py (1)
239-243: ⚡ Quick winDeduplicate chromadb importability mocking with a shared fixture.
The repeated
lep + find_specmonkeypatch setup appears in multiple tests and is starting to drift-prone. A small fixture/helper will keep these cases consistent and easier to extend.♻️ Proposed refactor
+@pytest.fixture +def set_chromadb_importable(monkeypatch: pytest.MonkeyPatch): + from reflexio.server.llm.providers import local_embedding_provider as lep + + def _set(is_importable: bool) -> None: + monkeypatch.setattr( + lep.importlib.util, + "find_spec", + lambda name: object() if (is_importable and name == "chromadb") else None, + ) + + return _set- from reflexio.server.llm.providers import local_embedding_provider as lep - monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: object()) + set_chromadb_importable(True)- from reflexio.server.llm.providers import local_embedding_provider as lep - monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: None) + set_chromadb_importable(False)Also applies to: 250-255, 262-267, 274-278, 296-300, 306-310
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/llm/test_model_defaults.py` around lines 239 - 243, Extract the repeated chromadb importability monkeypatch (monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: object())) into a reusable pytest fixture (e.g., mock_chromadb_available) and use it in tests that import local_embedding_provider as lep and call resolve_model_name(ModelRole.EMBEDDING); update tests at the locations that currently call monkeypatch.setattr to call the fixture instead so the chromadb importability mocking is centralized and consistent across tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@reflexio/cli/commands/setup_cmd.py`:
- Around line 688-693: In openclaw(), the call to
_choose_embedding_provider(env_path, embedding_flag=embedding) runs before the
user selects storage mode and can cause a local embedding choice to be persisted
incorrectly; move that call to after the storage selection logic (the branch
that handles Local vs Managed/Self-hosted) so embeddings are only
prompted/persisted when storage is Local, and ensure any other pre-existing
calls around lines 695-696 are similarly moved/removed; update references to
env_path and embedding_flag accordingly so the embedding prompt happens only in
the Local storage branch.
- Around line 691-693: The integration commands call
_choose_embedding_provider() with the raw embedding flag and currently treat
unknown values as "auto"; add an explicit validation step before calling
_choose_embedding_provider(): check the embedding variable against the
known/allowed embedding provider names (the same set used by init() or by
_choose_embedding_provider's internal mapping) and if the value is not in that
set raise/exit with a clear error message (fail fast) so typos like
"--embedding=opneai" are rejected; update the places where embedding_label =
_choose_embedding_provider(env_path, embedding_flag=embedding) (and the
analogous occurrences around lines 1281-1283) to perform this validation first
and only call _choose_embedding_provider() for valid values.
---
Nitpick comments:
In `@tests/server/llm/test_model_defaults.py`:
- Around line 239-243: Extract the repeated chromadb importability monkeypatch
(monkeypatch.setattr(lep.importlib.util, "find_spec", lambda _name: object()))
into a reusable pytest fixture (e.g., mock_chromadb_available) and use it in
tests that import local_embedding_provider as lep and call
resolve_model_name(ModelRole.EMBEDDING); update tests at the locations that
currently call monkeypatch.setattr to call the fixture instead so the chromadb
importability mocking is centralized and consistent across tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b192d78e-b249-41f8-928d-557ce26b454c
📒 Files selected for processing (3)
reflexio/cli/commands/setup_cmd.pyreflexio/server/llm/model_defaults.pytests/server/llm/test_model_defaults.py
🚧 Files skipped from review as they are similar to previous changes (1)
- reflexio/server/llm/model_defaults.py
…llm hasn't catalogued Surfaced by an e2e run with MINIMAX_API_KEY only: agentic-v2 extraction emitted 'Model minimax/MiniMax-M2.7 lacks tool-calling and no fallback_schema provided' and silently skipped profile creation. Root cause: litellm 1.80.x's model_cost registry has 'minimax/MiniMax-M2' (supports_function_calling=True) but not 'minimax/MiniMax-M2.7' (the version the agentic loop actually requests). When litellm's registry doesn't know a model it returns False, which our reflexio code treated as authoritative. MiniMax's vendor docs explicitly say M2.7 supports OpenAI-compatible function calling — https://platform.minimax.io/docs/guides/text-m2-function-call. Verified by a live tool-call round-trip: litellm.completion( model='minimax/MiniMax-M2.7', messages=[{'role': 'user', 'content': "What's the weather in Tokyo?"}], tools=[{'type': 'function', 'function': {...}}], tool_choice='auto', ) # → assistant message with tool_calls=[{name=get_weather, args={location: Tokyo}}] So the registry is the gap, not the model. Add a small allowlist of model prefixes (currently just 'minimax/MiniMax-M2') we know support tools, and override litellm's False to True for those. The list is documented with per-entry justification (vendor doc URL + verified round-trip) so maintainers can drop entries when litellm catches up. Tests: 5 new under TestSupportsToolCallingOverrides covering happy path, unknown-model False, override True, prefix scoping, and exception fallback. Total tests/server/llm: 304 passed, 60 skipped. Ruff + pyright clean.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
reflexio/server/llm/tools.py (1)
150-150: ⚡ Quick winTighten override prefix matching to avoid accidental family overmatch.
Line 150 uses raw
startswith, which can match unintended adjacent model families. Consider boundary-aware matching (==,.or-after prefix) so only the intended family is overridden.Proposed patch
- if any(model.startswith(prefix) for prefix in _TOOL_CALLING_OVERRIDES): + def _matches_override(model_name: str, prefix: str) -> bool: + return ( + model_name == prefix + or model_name.startswith(f"{prefix}.") + or model_name.startswith(f"{prefix}-") + ) + + if any(_matches_override(model, prefix) for prefix in _TOOL_CALLING_OVERRIDES): logger.debug( "litellm.supports_function_calling returned False for %s; " "applying override (see _TOOL_CALLING_OVERRIDES)", model, )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@reflexio/server/llm/tools.py` at line 150, The current override check using any(model.startswith(prefix) for prefix in _TOOL_CALLING_OVERRIDES) can overmatch (e.g., "gpt-4o" matching "gpt-4"). Replace it with a boundary-aware test: for each prefix in _TOOL_CALLING_OVERRIDES ensure model equals the prefix or the prefix is followed by a boundary character (e.g., '.' or '-') — e.g., test model == prefix or model.startswith(prefix + ".") or model.startswith(prefix + "-"), or use a regex like r'^{re.escape(prefix)}($|[.-])' — update the conditional around that check accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@reflexio/server/llm/tools.py`:
- Line 150: The current override check using any(model.startswith(prefix) for
prefix in _TOOL_CALLING_OVERRIDES) can overmatch (e.g., "gpt-4o" matching
"gpt-4"). Replace it with a boundary-aware test: for each prefix in
_TOOL_CALLING_OVERRIDES ensure model equals the prefix or the prefix is followed
by a boundary character (e.g., '.' or '-') — e.g., test model == prefix or
model.startswith(prefix + ".") or model.startswith(prefix + "-"), or use a regex
like r'^{re.escape(prefix)}($|[.-])' — update the conditional around that check
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 52e33410-1e4e-4275-b2ca-a0076257b636
📒 Files selected for processing (2)
reflexio/server/llm/tools.pytests/server/llm/test_tools.py
Adds a partial-update endpoint that shallow-merges a partial dict over
the org's existing config and round-trips through Config(**merged) to
reuse Pydantic validation for unknown / malformed fields. Nested objects
(e.g. storage_config) are replaced wholesale; deep merge is intentionally
out of scope.
Unblocks single-field flips like {"extraction_backend": "classic"} that
currently fail strict Config(**...) construction in client.set_config()
because callers don't have storage_config in scope.
The existing POST /api/set_config remains the strict full-PUT path and is
untouched.
ReflexioClient.update_config(partial: dict) sends a partial dict straight to POST /api/update_config without round-tripping through Config(**...) like set_config does. This is the whole point: callers can flip a single field without re-supplying required fields like storage_config. Rejects non-dict input with TypeError so misuse fails locally with a clear message instead of a cryptic 422 from the server.
Adds a PATCH-style sibling to `reflexio config set`:
reflexio config update --data '{"extraction_backend":"classic"}'
reflexio config update --file partial.json
reflexio config update --field extraction_backend=classic
reflexio config update --field window_size=:json:10
reflexio config update --field llm_config.embedding_model_name=local/x
`--field` is repeatable; the `:json:` value prefix opts into JSON
parsing for non-string scalars; one level of dotted nesting is
supported (multiple dotted entries with the same top key merge).
The command rejects no-args, mixed-source, multi-dot, and missing-`=`
inputs with EXIT_VALIDATION before any HTTP call.
`reflexio config set` is unchanged.
Detects out-of-band edits to ~/.reflexio/configs/config_<org>.json
(operator edits the file directly while the server is running) and
evicts the stale Reflexio cache entry on the next request.
The check is delegated through ConfigStorage.get_version() so the
cache module stays storage-agnostic. LocalFileConfigStorage stamps
('file', mtime); the abstract base returns None which the cache
treats as 'still fresh' (relying on TTL + explicit invalidation).
DB-backed storages will plug in ('db', version) in a follow-up.
POST /api/admin/cache/invalidate evicts the caller's per-org Reflexio cache entry. Cross-org invalidation is intentionally not exposed — the optional org_id body field is a verification token that must match the caller's authenticated org or the server returns 403. Use cases: - Sibling-replica writes the shared DB; this replica's cache is stale - Operator hand-edits a self-host config file on a backend that does not support cheap version probing - Manual debugging escape hatch for cache-related issues Wired up through ReflexioClient.invalidate_cache() and the new 'reflexio admin cache invalidate' CLI command.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
tests/server/cache/test_reflexio_cache.py (1)
299-315: 💤 Low valueTest asserts an internal short-circuit.
assert first.current_config_version.call_count == 1pins the cache to the "skip re-probe when stamped value isNone" optimization. That's a fine optimization, but if the cache is later made strictly conservative (always re-probe on hit), this test will fail despite no observable behavior change — the public contract is just "no auto-eviction when probe returnsNone". Consider relaxing to>= 1(or asserting thata is b, which the test already does) so the test tracks contract rather than implementation strategy.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/cache/test_reflexio_cache.py` around lines 299 - 315, The test test_cache_handles_none_version pins implementation by asserting first.current_config_version.call_count == 1; relax this to assert >= 1 (or remove that specific call_count check) so the test verifies the public contract (a is b and no auto-eviction when probe returns None) instead of the internal short-circuit; update the assertion referencing first.current_config_version.call_count and/or mock_reflexio_cls.call_count accordingly while keeping the existing equality check a is b and the intent that the backend returning None does not cause eviction.reflexio/server/services/configurator/local_file_config_storage.py (1)
110-128: 💤 Low valueConsider
st_mtime_nsfor tighter mtime resolution.
stat().st_mtimereturns a float (seconds), which on filesystems with 1-second mtime granularity (ext3, FAT, HFS+) and on modern filesystems with nanosecond granularity but float-storage precision loss, can leave back-to-back writes within the same quantum indistinguishable — the cache then keeps serving the oldReflexiofor that interval.st_mtime_nsreturns an int with full kernel precision, costs the samestat()syscall, and would also lettests/server/services/configurator/test_config_storage_contract.py::test_get_version_changes_after_savedrop the 1.1s sleep.📋 Proposed change
try: - return ("file", Path(self.config_file).stat().st_mtime) + return ("file", Path(self.config_file).stat().st_mtime_ns) except OSError: return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@reflexio/server/services/configurator/local_file_config_storage.py` around lines 110 - 128, get_version currently returns ("file", st_mtime) as a float which loses sub-second precision; change the implementation in local_file_config_storage.py (function get_version) to use Path(self.config_file).stat().st_mtime_ns and return the integer nanosecond mtime instead (i.e. ("file", st_mtime_ns)), update the documented return type in the docstring to tuple[str, int] | None, and keep the same OSError exception handling so missing/stat-failure still returns None.reflexio/client/client.py (1)
1293-1322: 💤 Low valueOptional: return the typed
AdminInvalidateCacheResponse.Most client methods return the matching Pydantic response model (e.g.
WhoamiResponse,MyConfigResponse,BulkDeleteResponse);invalidate_cachereturns a rawdictinstead. SinceAdminInvalidateCacheResponsealready exists inreflexio/models/api_schema/domain/entities.py, returning it would give callers attribute access (resp.invalidated,resp.org_id) and surface schema drift early.set_configis the existing precedent for returning a raw dict, so this is a consistency call rather than a defect — flagging only as an optional refactor.📋 Proposed change
- def invalidate_cache(self, org_id: str | None = None) -> dict: + def invalidate_cache( + self, org_id: str | None = None + ) -> AdminInvalidateCacheResponse: """...""" body: dict[str, str] = {} if org_id is not None: body["org_id"] = org_id - return self._make_request("POST", "/api/admin/cache/invalidate", json=body) + response = self._make_request( + "POST", "/api/admin/cache/invalidate", json=body + ) + return AdminInvalidateCacheResponse(**response)(Add
AdminInvalidateCacheResponseto the existingfrom reflexio.models.api_schema.domain.entities import (...)block.)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@reflexio/client/client.py` around lines 1293 - 1322, The invalidate_cache method currently returns a raw dict; import AdminInvalidateCacheResponse into the existing from reflexio.models.api_schema.domain.entities import ... block and change invalidate_cache to return an AdminInvalidateCacheResponse by parsing/constructing the Pydantic model from the dict returned by self._make_request("POST", "/api/admin/cache/invalidate", json=body) (mirror how other methods like set_config/WhoamiResponse convert responses to their Pydantic models) so callers can use resp.invalidated and resp.org_id.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/server/api_endpoints/test_api_routes.py`:
- Line 186: Replace the hardcoded "/tmp/existing.db" with a platform-aware
temporary path: update the test to construct db_path using a temp directory
(e.g., the pytest tmp_path/tmp_path_factory fixture or tempfile.gettempdir())
and pass that value into
Config(storage_config=StorageConfigSQLite(db_path=...)); keep references to
Config and StorageConfigSQLite so the change only swaps how db_path is obtained.
- Around line 239-247: The test currently allows any >=400 and can hide 5xx
regressions; update the else branch to assert the response is a client error
only: replace "assert response.status_code >= 400" with a stricter check such as
"assert 400 <= response.status_code < 500" (or explicitly list expected codes
like {400, 422}) and keep "mock_reflexio.set_config.assert_not_called()"; locate
this change around the response.status_code check and the
mock_reflexio.set_config usage where merged and Config are referenced.
---
Nitpick comments:
In `@reflexio/client/client.py`:
- Around line 1293-1322: The invalidate_cache method currently returns a raw
dict; import AdminInvalidateCacheResponse into the existing from
reflexio.models.api_schema.domain.entities import ... block and change
invalidate_cache to return an AdminInvalidateCacheResponse by
parsing/constructing the Pydantic model from the dict returned by
self._make_request("POST", "/api/admin/cache/invalidate", json=body) (mirror how
other methods like set_config/WhoamiResponse convert responses to their Pydantic
models) so callers can use resp.invalidated and resp.org_id.
In `@reflexio/server/services/configurator/local_file_config_storage.py`:
- Around line 110-128: get_version currently returns ("file", st_mtime) as a
float which loses sub-second precision; change the implementation in
local_file_config_storage.py (function get_version) to use
Path(self.config_file).stat().st_mtime_ns and return the integer nanosecond
mtime instead (i.e. ("file", st_mtime_ns)), update the documented return type in
the docstring to tuple[str, int] | None, and keep the same OSError exception
handling so missing/stat-failure still returns None.
In `@tests/server/cache/test_reflexio_cache.py`:
- Around line 299-315: The test test_cache_handles_none_version pins
implementation by asserting first.current_config_version.call_count == 1; relax
this to assert >= 1 (or remove that specific call_count check) so the test
verifies the public contract (a is b and no auto-eviction when probe returns
None) instead of the internal short-circuit; update the assertion referencing
first.current_config_version.call_count and/or mock_reflexio_cls.call_count
accordingly while keeping the existing equality check a is b and the intent that
the backend returning None does not cause eviction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9153d290-136c-4c2a-a50d-1e217d1d1893
📒 Files selected for processing (18)
reflexio/cli/app.pyreflexio/cli/commands/admin_cmd.pyreflexio/cli/commands/config_cmd.pyreflexio/client/client.pyreflexio/lib/_config.pyreflexio/models/api_schema/domain/entities.pyreflexio/server/api.pyreflexio/server/cache/reflexio_cache.pyreflexio/server/services/configurator/config_storage.pyreflexio/server/services/configurator/local_file_config_storage.pytests/cli/test_admin_cmd.pytests/cli/test_config_cmd.pytests/client/test_invalidate_cache.pytests/client/test_update_config.pytests/server/api_endpoints/test_admin_cache.pytests/server/api_endpoints/test_api_routes.pytests/server/cache/test_reflexio_cache.pytests/server/services/configurator/test_config_storage_contract.py
✅ Files skipped from review due to trivial changes (1)
- tests/client/test_invalidate_cache.py
…n split Snapshot the source interaction window for each agent playbook at aggregation time so optimizer evaluation stays replayable when source user playbooks are later archived, deduped, or replaced by reflection. GEPA now holds out a capped validation set (max_validation_windows, default 2) and uses the remaining windows as the training pool, with reflection_minibatch_size controlling mutation sampling.
… ready `detect_available_providers()` can legally return `["local"]` (embedder only) when chromadb is importable but no LLM key is set. The first-run guard in `services start` and `validate_llm_availability()` both treated that as a configured state, so the FastAPI lifespan would later crash deep inside the role resolver with "no provider supports role=generation". Add `GENERATION_CAPABLE_PROVIDERS` to model_defaults, require it from both call sites, and message the user clearly when only an embedding role is satisfied. Addresses CodeRabbit comments 3211641364 and 3211641388.
Five related fixes to the upfront embedding-provider step: - ``--embedding=local`` now refuses to persist when chromadb isn't importable, matching the interactive flow which hides that option. - The org-config write moves AFTER cloud-key validation, so a blank API key no longer leaves ``llm_config.embedding_model_name`` mutated to a provider that still isn't configured. - ``setup init`` skips the embedding step for Self-hosted Reflexio (in addition to Managed) — the remote server owns its own embedding model config, and a local override would just shadow the operator's choice. - ``setup openclaw`` runs storage selection BEFORE the embedding step so the same Self-hosted skip applies there too. - ``setup openclaw`` and ``setup claude-code`` now validate ``--embedding`` against the same set ``setup init`` does, so a typo (``--embedding=opneai``) fails fast instead of falling through to ``auto``. Addresses CodeRabbit comments 3211641370, 3211641376, 3211641386, 3211861475, and 3211861477.
``_probe_version_safe`` returned ``None`` for both real failures and backends without a version probe, which collapsed two very different states into the same "skip future probes" branch. A single transient error at construction time would permanently disable version-based auto-eviction for that cache entry, leaving it stale until TTL or explicit invalidation kicked in. The fix: - Introduce a ``_PROBE_FAILED`` sentinel for transient errors. - On a hit-time probe failure, keep the cached instance and DON'T mutate the stamp — the next request retries the probe. - On a construction-time probe failure, serve the instance once but skip caching it. The next request rebuilds and re-probes; once the backend recovers, the entry is cached with a real stamp and eviction works as designed. Existing tests for "no probe available" (``cached_version is None``) are unchanged — that branch still means "permanently unprobeable". Addresses CodeRabbit comment 3212521575.
Two small hardening fixes to ``tests/server/api_endpoints/test_api_routes.py``:
- Replace hardcoded ``/tmp/existing.db`` with
``Path(tempfile.gettempdir()) / "existing.db"`` so the test is
platform-aware and clears Ruff S108.
- Tighten ``test_unknown_field_does_not_leak_to_set_config`` from
``response.status_code >= 400`` to ``response.status_code in {400,
422}`` so a 5xx regression on the unknown-field path no longer
silently passes the assertion.
Addresses CodeRabbit comments 3212521581 and 3212521586.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/cli/test_services_first_run.py (1)
217-217: ⚡ Quick winMove
import loggingto module level.The
loggingimport on line 217 should be moved to the top of the file with other imports to follow PEP 8 conventions. While this works, module-level imports improve readability and are the standard practice in Python.📦 Proposed refactor
At the top of the file, add the import with other standard library imports:
from pathlib import Path from unittest.mock import patch +import logging import pytestThen remove line 217 from inside the test function.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cli/test_services_first_run.py` at line 217, Move the "import logging" statement out of the test body and into the module-level imports at the top of the file alongside other standard-library imports, then remove the in-test "import logging" (the one shown in the diff) so the test uses the module-level logging import per PEP 8.tests/cli/test_setup_cmd.py (2)
769-801: 💤 Low valueConsider verifying the exit code for consistency.
Line 795 uses
pytest.raises(typer.Exit)without checking the exit code, whereastest_local_flag_without_chromadb_exits(line 763) explicitly assertsexc.value.exit_code == 1. For consistency and stronger validation, consider capturing the exception and verifyingexit_code == 1here as well.This pattern also applies to lines 852 and 859.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cli/test_setup_cmd.py` around lines 769 - 801, Wrap the pytest.raises(typer.Exit) context in a variable (e.g., as exc) when calling _choose_embedding_provider in test_blank_cloud_key_does_not_persist_org_config, then assert exc.value.exit_code == 1 after the block to ensure the Exit raised has the expected code; apply the same pattern to the other tests mentioned (the ones at the other occurrences) for consistent validation.
612-669: ⚡ Quick winConsider consolidating duplicate test logic.
Lines 612-641 (
test_setup_init_local_is_default) and 642-669 (test_setup_init_local_choice_writes_org_config) appear to test the same behavior with identical mock responses[1, 2, "sk-ant-test", 1]. Both verify that selecting local (choice 1) persists"local/minilm-l6-v2"to the org config. The only difference is that one uses the_read_org_confighelper while the other reads JSON directly.If both tests are intended to verify the same outcome, consider consolidating them into a single test. If they're meant to cover distinct scenarios (e.g., default vs. explicit selection), clarify the distinction in test names and docstrings, or use different prompt sequences.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cli/test_setup_cmd.py` around lines 612 - 669, Two tests (test_setup_init_local_is_default and test_setup_init_local_choice_writes_org_config) duplicate the same mocked prompt sequence and assertion for init/llm_config.embedding_model_name; either consolidate them into one test or make their intent explicit. Fix by either (A) removing one test and keeping a single test that calls init(...) and asserts via _read_org_config or direct JSON read, or (B) if you want to cover "press Enter for default" vs "explicit choice", change test_setup_init_local_is_default to simulate Enter (e.g. have the typer.prompt side_effect return the empty/default input that your prompt handling maps to 1) while keeping test_setup_init_local_choice_writes_org_config using the explicit [1, 2, "sk-ant-test", 1] sequence; reference init, test_setup_init_local_is_default, test_setup_init_local_choice_writes_org_config, and _read_org_config to locate and update the tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/cli/test_setup_cmd.py`:
- Around line 802-847: The test test_setup_init_self_hosted_skips_embedding_step
supplies four prompt responses in the typer.prompt side_effect ([3,
"http://localhost:8081", "rflx-key", 1]) even though init (from
reflexio.cli.commands.setup_cmd) will only prompt three times for Self-hosted
storage; remove the unused fourth value (1) from the side_effect array or
alternatively update the inline comment to state that the embedding prompt is
skipped because is_remote=True for Self-hosted storage (so only three prompts
occur) to make intent clear.
---
Nitpick comments:
In `@tests/cli/test_services_first_run.py`:
- Line 217: Move the "import logging" statement out of the test body and into
the module-level imports at the top of the file alongside other standard-library
imports, then remove the in-test "import logging" (the one shown in the diff) so
the test uses the module-level logging import per PEP 8.
In `@tests/cli/test_setup_cmd.py`:
- Around line 769-801: Wrap the pytest.raises(typer.Exit) context in a variable
(e.g., as exc) when calling _choose_embedding_provider in
test_blank_cloud_key_does_not_persist_org_config, then assert
exc.value.exit_code == 1 after the block to ensure the Exit raised has the
expected code; apply the same pattern to the other tests mentioned (the ones at
the other occurrences) for consistent validation.
- Around line 612-669: Two tests (test_setup_init_local_is_default and
test_setup_init_local_choice_writes_org_config) duplicate the same mocked prompt
sequence and assertion for init/llm_config.embedding_model_name; either
consolidate them into one test or make their intent explicit. Fix by either (A)
removing one test and keeping a single test that calls init(...) and asserts via
_read_org_config or direct JSON read, or (B) if you want to cover "press Enter
for default" vs "explicit choice", change test_setup_init_local_is_default to
simulate Enter (e.g. have the typer.prompt side_effect return the empty/default
input that your prompt handling maps to 1) while keeping
test_setup_init_local_choice_writes_org_config using the explicit [1, 2,
"sk-ant-test", 1] sequence; reference init, test_setup_init_local_is_default,
test_setup_init_local_choice_writes_org_config, and _read_org_config to locate
and update the tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 76089a7c-4578-42cc-a9c1-cd38e5cb38c1
📒 Files selected for processing (10)
reflexio/cli/commands/services.pyreflexio/cli/commands/setup_cmd.pyreflexio/models/api_schema/domain/entities.pyreflexio/server/cache/reflexio_cache.pyreflexio/server/llm/model_defaults.pytests/cli/test_services_first_run.pytests/cli/test_setup_cmd.pytests/server/api_endpoints/test_api_routes.pytests/server/cache/test_reflexio_cache.pytests/server/llm/test_model_defaults.py
🚧 Files skipped from review as they are similar to previous changes (8)
- tests/server/api_endpoints/test_api_routes.py
- reflexio/server/llm/model_defaults.py
- reflexio/models/api_schema/domain/entities.py
- tests/server/cache/test_reflexio_cache.py
- tests/server/llm/test_model_defaults.py
- reflexio/server/cache/reflexio_cache.py
- reflexio/cli/commands/services.py
- reflexio/cli/commands/setup_cmd.py
…by adoption path Removes the unreleased allow_single_window_commit flag — single-window eligibility is now governed by min_commit_windows vs. the validation holdout size. Skips optimization runs upfront when the target kind has no auto-update path (no point producing a winner that cannot adopt). Always passes cache_evaluation=True to GEPA so the same accepted candidate/window pair is never re-scored across iterations. Reuses the trainset as valset when train and validation windows are identical (single-window mode). Lowers default GEPA budget (max_metric_calls 40 -> 20, max_turns 5 -> 4) for cheaper nightly runs.
Reproduces the e2e bug where a fresh setup-init with only MINIMAX_API_KEY in env saw the extractor pick gpt-5-mini at runtime. The defaults code itself (PR #51) is correct; this locks that down so a future provider-priority edit can't silently regress MiniMax-only deployments without a test failure. Covers: - generation/should_run/evaluation/pre_retrieval/extraction_agent/ search_agent all resolve to minimax/MiniMax-M2.7 with only MINIMAX_API_KEY in env. - empty OPENAI_API_KEY= placeholder line (.env.example default) must NOT promote OpenAI in detect_available_providers. - chromadb-importable path falls through embedding role to local ONNX embedder — the e2e Mode A scenario. The user-visible bug surfaced through these tests passing while a contaminated org configuration_json (persisted from a prior LoCoMo benchmark run) overrode the auto-detect at runtime; tracked separately.
`load_reflexio_env` deliberately honors a CWD-local `./.env` first so `services start` can pick up project-level overrides. That's the wrong default for `setup init` / `setup openclaw` / `setup claude-code`: when run from a worktree or any directory with an unrelated `.env`, the wizard wrote `REFLEXIO_*` keys into that file, polluting it and breaking the documented `~/.reflexio/.env` invariant. Add `ensure_user_env_for_setup`, which always targets `~/.reflexio/.env` (creating from the bundled template if missing). Use it in all three setup entry points. Existing tests' monkeypatch targets updated; new regression test asserts an unrelated CWD `.env` is left untouched.
…meout Two related papercuts in the QMD-backed DiskStorage: 1. QMD's collection registry is global. A previous test run that registered the same collection name against ``/tmp/e2e-disk/...`` left a stale entry behind, and ``_ensure_collection`` accepted it purely by name match — every search ran against the wrong directory and logs showed the misleading hardcoded path. Detect path mismatch in ``_ensure_collection`` and ``collection remove`` + re-add when the registered path differs from ``self.collection_path``. 2. ``status()`` and ``collection list`` are fast probes but were using the subcommand-default 60s timeout. When the qmd subprocess wedges, each probe blocked the request thread for a full minute and bloated logs. Add a configurable ``probe_timeout_seconds`` (default 5s) and plumb it through the probe call sites. Both call-site signatures stay backward compatible.
…update_config validation + concurrent-publish fix (#57) ## Summary Four issues surfaced by `/test-backend-pipeline` (12-phase real-LLM run against the recently-merged main). All four fixed. | # | Severity | Issue | Action | |---|---|---|---| | R1 | P2 | Profile rerun dedup deletes CURRENT-status profiles | Fixed (`9fbb059`) | | R3 | P3 | `clustering_similarity` default of 0.5 too tight for local minilm-l6-v2 | Fixed (`6060e5e`) | | R4 | P3 | `update_config` returned 500 on bad partial input | Fixed (`5c6efba`) | | R2 | P2 | Concurrent publishes lose playbook extraction (= reflexio-enterprise#59) | Repro test (`89ab5b5`) + fix (`20f951c`) | --- ## R1 — Profile rerun dedup must search PENDING, not CURRENT profiles **Bug**: `test_rerun_profile_generation_end_to_end` was failing 3/3 attempts. Server log showed `Profile updates after deduplication: 2 profiles, 2 existing to delete` — the 2 status=None CURRENT profiles got deleted during a rerun, which is supposed to produce PENDING-status profiles for preview. **Root cause**: when `output_pending_status=True` (rerun mode), `ProfileDeduplicator._retrieve_existing_profiles()` was called with `status_filter=[None]`, pulling CURRENT profiles. Dedup decided newly extracted profiles were duplicates of the current ones → `_process_results` then called `delete_user_profile()` on the existing-current set, collapsing CURRENT count from N to 0. **Fix**: when in rerun mode, retrieve only existing PENDING profiles for dedup (`status_filter=["pending"]`). Rerun is now idempotent against existing pending state and leaves CURRENT alone. **Verification**: `test_rerun_profile_generation_end_to_end` now PASSES (76s, 1/1). **Tests added** (4 in `tests/server/services/profile/test_profile_deduplicator.py::TestRetrieveExistingProfilesStatusFilter`): - `test_normal_mode_filters_to_current` (status_filter=[None]) - `test_rerun_mode_filters_to_pending` (status_filter=["pending"]) - `test_default_value_when_unspecified` - `test_rerun_dedup_against_pending_only` --- ## R3 — Aggregator `clustering_similarity` default 0.5 → 0.3 **Symptom**: with 4 thematically-related but topically-diverse raw feedbacks + local minilm-l6-v2 embeddings (384-dim, zero-padded to 512), default 0.5 cosine threshold produced 0 clusters during `run_playbook_aggregation`. Lowering to 0.15 produced 1 high-quality cluster. **Investigation**: the threshold is genuinely embedding-dependent. OpenAI/Gemini embeddings have a wider dynamic range than minilm-l6-v2 — same threshold isn't right for both. **Fix (option a from the triage)**: lower the global default to 0.3 — a compromise between cloud and local embeddings. Users can still override per-pattern. Considered but rejected: option (b) embedding-provider-aware default. The sentinel-default + late binding added serialization complexity for marginal benefit when users can already override. **Tests**: existing aggregator tests still pass. `test_validators.py` updated to assert the new default. --- ## R4 — `update_config` returns 422 (not 500) for invalid partial; doc nested-list semantics **Symptom from the test report**: PATCHing `playbook_configs[*].playbook_aggregator_config.{min_cluster_size, clustering_similarity}` returned `success: true` but follow-up `get_config` showed only originally-set fields. **Investigation findings**: the original test report was misleading — shallow merge IS what `/api/update_config` is intended to do (per the docstring shipped in #51). The "success:true but get_config showed only originally-set fields" is consistent with the partial not actually including `playbook_configs` (sibling preservation = correct behavior). The uvicorn auto-reload between the failing PATCH and the verifying GET may have masked a different issue but was not reproducible in isolation. **Found a real adjacent bug**: invalid partial input (e.g. wrong type for a field) caused a `ValidationError` → uncaught → 500 Internal Server Error. Fixed: catch `ValidationError`, return 422 with a structured detail. **Hardened CLI docs**: `reflexio config update --field` help block now warns that nested-list deep-key syntax is not supported by the shallow-merge endpoint — use `set_config` (full replace) for nested updates. **Tests added** (2 in `tests/server/api_endpoints/test_api_routes.py`): - `test_update_config_invalid_partial_returns_422` - `test_update_config_preserves_unspecified_top_level_fields` --- ## R2 — Concurrent playbook extraction loss (= reflexio-enterprise#59), now FIXED **Symptom**: 3 publishes ~2s apart for distinct users → only 2 of 3 produced raw playbooks. Per-org `playbook_generation::{org_id}::lock` serializes them; subsequent batches set `pending_request_id`. When the lock releases, the rerun fires but hits the stride pre-filter (`new=0, stride_size=5`) because the bookmark has advanced past the held interactions. **Root cause** — 3 confounding factors: 1. Per-org playbook lock — not per-user 2. Single-slot `pending_request_id` (last-wins) — earlier held requests silently dropped 3. Rerun fires with the *current bookmark*, not the original held request payload — by the time the rerun runs, `new=0` **Fix shipped (option b: pending-request queue + payload-aware drain)** in commit `20f951c`. Replaces the single-slot `pending_request_id` with an ordered queue at `_operation_state.operation_state.pending_request_queue` (JSON column). On lock release, drain the queue in FIFO order, each entry carrying its original `request_id` so the rerun extracts the right interactions regardless of bookmark position. Lock stays per-org (cross-user dedup invariants preserved); the queue serializes drains within the org. **Repro test** flipped from `@xfail` to passing: `test_concurrent_publishes_distinct_users_all_produce_playbooks PASSED [100%]` — 192s real-LLM run, 3 concurrent publishes, all 3 produce ≥1 raw playbook. **Storage layer changes**: - Abstract `try_acquire_in_progress_lock` signature gains `payload: dict | None` (forwarded as JSON) - SQLite + Disk: queue mutation in the operation-state row - Supabase + Postgres: new migration `20260509190000_pending_request_queue.sql` (re-creates the RPC with `p_payload` jsonb param + queue mutation under `FOR UPDATE`) - Companion enterprise PR ReflexioAI/reflexio-enterprise#NEW carries the migration + adapter **Durability**: queue is **persistent** across SQLite/Disk/Supabase/Postgres. On server crash mid-drain, the existing 5-min stale-lock detector reclaims the lock and the queue resets on re-acquire. Acceptable because the underlying interactions are persisted; the next publish runs a fresh extraction over the union of pending interactions. **Tests added** (15): - 8 unit tests in `tests/server/services/test_operation_state_utils.py` (acquire-with-payload, queue ordering, drain, holder-retry idempotency, payload preservation, drain-empty, etc.) - 5 contract tests in `tests/server/services/storage/test_storage_contract_operations.py` (parametrized across SQLite + Disk; Supabase covered in enterprise tests) - 2 in `tests/server/services/test_base_generation_service.py::TestPayloadAwareDrain` **Legacy fallback**: `pending_request_id` legacy field is mirrored to the most-recent enqueued request_id for one release window so a partial deploy (queue-aware writers + non-queue-aware readers) still drains via the legacy fallback path. --- ## Test plan - [x] All 4 new R1 unit tests pass - [x] R3 default change: `tests/models/test_validators.py` updated, all model-validator tests pass - [x] R4: 2 new API endpoint tests pass; 422 returned for invalid partial - [x] R2: 15 new tests (8 unit + 5 contract + 2 base-service); xfail flipped to passing - [x] **`test_rerun_profile_generation_end_to_end` now PASSES** (76s, 1/1; was 0/3 pre-fix) - [x] **`test_concurrent_publishes_distinct_users_all_produce_playbooks` now PASSES** (192s real-LLM, 3 users × 3 concurrent publishes, all produce ≥1 raw playbook) - [x] `ruff check` + `ruff format --check` + `pyright` clean on touched files ## Pre-existing issues surfaced (NOT addressed here) The investigation also surfaced two pre-existing issues on `cea6bdd` that are not regressions caused by this work: 1. `reflexio/server/services/profile/profile_generation_service.py:512` pyright error — `processed_user_ids` may contain `None` but `get_user_profile` requires `str` 2. `tests/server/services/playbook/test_playbook_aggregator.py::TestRun` — 12 tests fail when run as a suite (test ordering / shared MagicMock state) Both predate this branch. Worth filing separately. ## Companion PR ReflexioAI/reflexio-enterprise#NEW will bump the submodule pointer and ship the R5 doc fix.
Summary
Five threads, surfaced across two
/test-e2e-cli --fullruns on a Linux VPS with onlyMINIMAX_API_KEYconfigured. The first three were the original PR scope; threads 4–5 were surfaced by the e2e run validating the first three:~/.reflexio/configs/config_<org-id>.jsonare now picked up within 1 request instead of waiting up to an hour for the TTL.POST /api/update_configaccepts a partial dict and shallow-merges into existing config. Removes the "must send the entireConfigschema to flip one field" pain.extraction_agent/search_agentdefaults,supports_tool_callingoverride allowlist, and 5 regression tests locking in the auto-detect contract.reflexio setup initwrites.envto~/.reflexio/.env(e2e Issue docs: rename feedbacks to playbooks in docs site method registry #2) — was writing to CWD, polluting whatever directory the user was in./tmp/e2e-diskpath + 60s probe timeouts (e2e Issue feat: add retrieval latency benchmark and embedding input truncation #3) — hard-coded test path leaked into production logs; probe queries had no usable timeout.1. Cache invalidation
The per-org cache (
_reflexio_cache: TTLCache(maxsize=100, ttl=3600)) keyed by(org_id, storage_base_dir)was only invalidated when writes went through/api/set_configor (newly)/api/update_config. Anything that mutated config out-of-band — hand-edited config file, sibling-replica write, direct DB UPDATE — was invisible until the entry's hour-long TTL expired.Phase 1 — file-mtime auto-invalidation
Reflexio.current_config_version()returns a tuple —('file', mtime)forLocalFileConfigStorage,Nonewhen probing isn't supported. The cache stamps the version at load time; on every cache hit it re-probes (cheapos.stat, ~10 µs) and evicts on mismatch. The check is implemented outside the cache lock to keep contention low.Phase 2 — explicit admin invalidate endpoint + CLI + client method
For backends without cheap version probing, or any "I just SQL'd the DB and want my replica to re-read":
Auth uses the same
Depends(default_get_org_id)as/api/set_config. Cross-org invalidation is intentionally NOT exposed — request bodyorg_idmust match the dep-resolved value.Companion: cloud DB-level versioning
Phase 3 (DB-level
config_versioncolumn for multi-replica cloud invalidation) ships in the parent enterprise PR ReflexioAI/reflexio-enterprise#47 — it depends on the abstractConfigStorage.get_version()shipped here, but the concreteSupabaseConfigStorage.get_version()and the migration live in the enterprise repo.2. PATCH config endpoints
POST /api/set_configrequires the fullConfigschema. Sending{"extraction_backend": "classic"}alone errors with1 validation error: storage_config Field required. PATCH semantics close the gap with shallow top-level merge:reflexio config update --field extraction_backend=classic reflexio config update --field extraction_backend=classic --field search_backend=agentic reflexio config update --data '{"search_backend": "agentic"}' reflexio config update --file partial.jsonLive-verified:
extraction_backendandsearch_backendcan now be set independently and atomically (e.g., one classic + one agentic).3. MiniMax-only e2e gaps
Three independent fixes that together let a user with only
MINIMAX_API_KEYconfigured run/test-e2e-cliend-to-end and produce non-zero profiles + playbooks.3a. Embedding auto-fallback (Layer A) + setup-init choice (Layer B)
When no cloud embedder is configured, today blocks on stdin:
chromadb>=1.5.8is already a hard dependency, so the bundled all-MiniLM-L6-v2 ONNX model is always physically present — it was just gated behindCLAUDE_SMART_USE_LOCAL_EMBEDDING=1(theclaude-smartpublic-contract env var).Three explicit paths in
_auto_detect_modelfor the embedding role:CLAUDE_SMART_USE_LOCAL_EMBEDDING=1AND chromadb importablelocal/minilm-l6-v2(priority 2)OPENAI_API_KEYorGEMINI_API_KEYsettext-embedding-3-small/text-embedding-004local/minilm-l6-v2Path 1 wins over 2; 2 wins over 3. claude-smart contract preserved (
is_local_embedder_available(),register_if_enabled()kept as alias). LiteLLM dispatch inlitellm_client.pyis also relaxed:local/model names are routed toLocalEmbedder.get()whenever chromadb is importable, no env-var requirement.setup initgets a new embedding-provider step (default = local when chromadb available) and a--embedding {local|openai|gemini|auto}flag for non-interactive use. The choice is written to~/.reflexio/configs/config_<org-id>.jsonunderllm_config.embedding_model_name; the existing 3-tierresolve_model_name(config_override > site_var > auto_detect) ensures it wins over Layer A's auto-detection.services.pystart gate: not-fully-configured + chromadb importable → logINFO"Using local embedder as fallback" and proceed (instead of prompting or erroring).3b. MiniMax in
extraction_agent+search_agentdefaultsAdded MiniMax M2/M2.7 to
_PROVIDER_DEFAULTS[minimax].extraction_agentand.search_agent. Without this, the agentic-v2 path silently failed withModel … lacks tool-calling and no fallback_schema providedand produced 0 profiles.3c.
supports_tool_callingoverride allowlistlitellm.supports_function_calling("minimax/MiniMax-M2.7")returnsFalsebecause litellm's registry only cataloguesminimax/MiniMax-M2. M2.7 actually supports function calling. Added an allowlist override intools.pyfor known-good models the registry hasn't caught up with yet.3d.
TestMinimaxOnlyEnvRegression— auto-detect contract lock-inAdded 5 regression tests in
tests/server/llm/test_model_defaults.pythat exercise the MiniMax-only env scenario end-to-end through_auto_detect_modelandresolve_model_name. Specifically guards against the failure mode that surfaced as task #77 (extractor resolved togpt-5-minidespite providers list being['minimax']— turned out to be benchmark contamination of the org config DB row, but the auto-detect contract was the canary).4.
setup initwrites.envto the canonical location (e2e Issue #2)reflexio setup initwas writing.envtoPath.cwd() / ".env". When the user runs setup from any directory other than~/.reflexio/(e.g. inside a worktree, or via tmux with cwd=worktree-root), the resulting.envlands somewhere unexpected. Subsequentreflexioinvocations partially mask the issue via python-dotenv's parent-walk-up, but the worktree gets polluted.Now: setup commands always write to
Path.home() / ".reflexio" / ".env"regardless of CWD. Plus a 2-test regression intests/cli/test_setup_cmd.pythat monkeypatchesPath.cwdto a non-home dir and asserts the resulting file ends up at~/.reflexio/.env.Also touches
cli/env_loader.py— small change to make the canonical path centrally defined sosetup_cmd.pyandenv_loader.pyagree.5. DiskStorage QMD client: stale path + probe timeout (e2e Issue #3)
Two bugs in
reflexio/server/services/storage/disk_storage/_qmd_client.py:/tmp/e2e-diskcollection path. On any non-test deployment, the QMD client loggedCollection: /tmp/e2e-disk/disk_<org-id> (**/*.md)even though the actual storagedir_pathwas~/.reflexio/data/disk_<org-id>/. Test scaffolding leaked into production code paths.Fix: use the storage backend's
dir_pathdirectly (no env-var fallback to a test path), add a configurable probe timeout (default 5s, callsite-overridable). 8 new tests intests/server/services/storage/test_disk_storage_file_io.pycovering: stale-collection re-registration, configured-timeout-honored, default-timeout-applied, dir_path-mismatch-detected.Behavioral diff
Embedding paths
CLAUDE_SMART_USE_LOCAL_EMBEDDING=1+ chromadbOPENAI_API_KEYsetGEMINI_API_KEYsetservices startsetup init, picked localsetup init --embedding local(CI)RuntimeErrorRuntimeErrorwith clearer install hintConfig update
Configschema; otherwise validation errorPOST /api/update_config {"extraction_backend": "classic"}workssetreflexio config update --field extraction_backend=classicConfigobjectclient.update_config({...})Cache freshness (file-backed self-host)
POST /api/set_config//api/update_configreflexio admin cache invalidateTest plan
test_cache_evicts_when_config_file_modified,test_cache_keeps_when_config_file_unchanged,test_cache_handles_missing_config_path,ConfigStorage.get_version()contract testtest_admin_invalidate_endpoint_clears_cache,test_admin_invalidate_returns_false_when_not_cached,test_cli_admin_cache_invalidate,test_client_invalidate_cache_methodtest_update_config_shallow_merges,test_update_config_invalidates_cache, all three CLI input modes (--data,--file,--field), client method behaviortest_is_chromadb_importable_when_present/absent,test_register_if_chromadb_available_no_env_var,test_embedding_fallback_to_local_when_no_cloud,test_embedding_fallback_skipped_when_cloud_available,test_embedding_explicit_opt_in_beats_cloud,test_embedding_no_chromadb_raises,test_prompt_embedding_provider_non_tty_picks_local,test_setup_init_includes_embedding_step,test_setup_init_local_is_default,test_setup_init_local_choice_writes_org_config,test_setup_init_embedding_flag_local,test_setup_init_embedding_flag_auto_no_override,test_services_start_proceeds_without_cloud_embedder_when_chromadb_presentTestMinimaxOnlyEnvRegression, 5 tests): generation/evaluation/should_run/pre_retrieval all resolve tominimax/MiniMax-M2.7; emptyOPENAI_API_KEY=placeholder doesn't promote OpenAI; embedding falls through tolocal/minilm-l6-v2setup init.envlocation (2 tests): writes to~/.reflexio/.envregardless of CWDruff check+ruff format --check+pyrightclean on all touched filesMINIMAX_API_KEYonly via/test-e2e-cli --full: all 4 storage modes (SQLite, Managed, Disk, Self-hosted Supabase) hit Hard PASS criterion (non-zero profile extraction for priya/marcus/elena/raj). Schema Gate green for all modes (Mode D requiredsg docker -cworkaround until Issue feat: add timestamps to profile/playbook dedup prompts for conflict resolution #1 in companion PR ReflexioAI/reflexio-enterprise#47 fixed it at the source).Backwards compatibility
is_local_embedder_available(), priority order, dispatch all preserved.register_if_enabled()is kept as an alias toregister_if_chromadb_available().setup init. Layer A's auto-detection still picks cloud when keys are configured.POST /api/set_config: semantics unchanged (full-schema replace)./api/update_configis purely additive.get_reflexio,invalidate_reflexio_cache,clear_reflexio_cache,get_cache_stats— signatures unchanged. Phase 1 layered the version probe inside without breaking callers.public_docs/content/docs/claude-smart/configuration.mdx— still accurate.Companion follow-up
After this lands and a release is cut,
reflexio-enterprisebumps its submodule pointer + ships Phase 3 (DB-levelconfig_versioncolumn → multi-replica cache invalidation in cloud) in ReflexioAI/reflexio-enterprise#47.Out of scope
reflexio-aiby moving heavy ML deps to extras. Bigger refactor; revisit after a couple of weeks of usage data.CLAUDE_SMART_USE_LOCAL_EMBEDDING. Public contract — any deprecation cycle would be coordinated with the claude-smart repo, not a side effect of this fix.Notable implementation deviations from the original plan
_prompt_embedding_provideris no longer called fromsetup init. Plan said to add a step "after" the LLM step, but calling both back-to-back would have asked the user to pick the embedder twice. Replaced the legacy call with the new step ininit. Legacy helper preserved for theservices startfirst-run wizard._choose_embedding_providerstep folds in the API-key prompt for openai/gemini when the env var isn't already set. Cleaner UX.tests/cli/test_service_builders.py::test_invalid_raises_bad_parameterwas hardcoded to"postgres"(which became valid in PR feat(storage): add StorageConfigPostgres model #47); switched to"not-a-real-backend".current_config_version()abstraction unified file-mtime and DB-version checks behind one tuple-returning method, so the cache module doesn't have to know about storage backends.Summary by CodeRabbit
Release Notes
New Features
--embeddingCLI option to setup commands for explicit embedding provider configurationadmin cache invalidatecommand to manually clear server-side cacheconfig updatecommand for partial configuration updatesBug Fixes