feat: add skip_should_run_check config flag by yilu331 · Pull Request #14 · ReflexioAI/reflexio

yilu331 · 2026-04-14T22:48:19Z

Summary

Add skip_should_run_check: bool = False to the Config model
Add early-return guard in _should_run_before_extraction() that bypasses the LLM eligibility check when the flag is enabled
Add unit tests for the new flag in test_base_generation_service.py

Context

Users have no way to bypass the LLM pre-extraction eligibility check. This flag lets orgs skip it to save cost/latency when they want every batch extracted.

Test plan

TestShouldRunBeforeExtraction::test_skip_should_run_check_bypasses_llm_call — verifies flag returns True without LLM call
TestShouldRunBeforeExtraction::test_default_skip_should_run_check_does_not_bypass — verifies default is False
Full test suite (64 tests) passes

Add a boolean config toggle that lets users bypass the LLM eligibility check before extraction. When enabled, extraction always proceeds without the pre-check LLM call.

Adds an opt-in LLM relevance-judge rerank stage to search_user_profiles (and the playbook variants), parallel to the existing cross-encoder rerank. The new stage bridges synonym/brand→category gaps that pure lexical/semantic models can't bridge — e.g. "Thrive Market" = grocery service, "Suica card" = Tokyo transit, "TripIt app" = travel-organizer. Cross-encoder upgrades (bge-reranker-v2-m3) were tested and rejected: they don't have the retail-brand world knowledge needed. Architecture: - New helper score_pairs_llm() in reflexio/server/llm/rerank/llm_reranker.py - New prompt rerank_relevance/v1.0.0 (relevance-judge with explicit brand→category and tool→use-case guidance, scoring rubric, and a rule that user-owned tools/cards/apps score 7-9 on help/tips questions) - New tool arg llm_rerank: bool = False on SearchUserProfilesArgs and the playbook variants - _maybe_rerank_hits dispatches LLM rerank → cross-encoder → hybrid order in fallback chain; any failure path returns None and the caller falls back gracefully - Bundle wiring: search-tool handlers now receive llm_client + prompt_manager via _bundle_handler_with_llm Search prompt v1.10.0 documents llm_rerank in the tool palette and adds targeted exceptions to Patterns A, C, D, F where brand/proper-noun profiles are likely the answer but don't share the question's literal keywords. Pattern B explicitly OPTS OUT (recency dominates; rerank scrambles date order). All exceptions are tightly scoped to the question shape. Tested: - 16 unit tests for score_pairs_llm fallback chain - 10 unit tests for _maybe_rerank_hits dispatch + fallback semantics - Trip-wire test updated; semver-sort bug in _get_latest_prompt_version fixed (would have locked v1.10.0 → v1.9.0 lexically) - Smoke test on gpt4_2ba83207 (grocery superlative): Thrive Market ranked #14 baseline → #4 with llm_rerank=True - Smoke test on 0a34ad58 (Tokyo Suica/TripIt): TripIt missing baseline → #3 with llm_rerank=True - LongMemEval tune-100 r93 vs r91: 76/100 vs 74/100 (+2 acc); macro 81.6% vs 80.5% (+1.1pt); M-S +14pt (the target gain), SS-P +10pt; K-U regression observed but traced to extraction-time non-determinism (knowledge updates not captured during re-ingest), not the rerank changes Bundled prompt-bank state catch-up: - answer_synthesis v1.3.0/v1.4.0 (rules 13/14 from earlier rounds) - extraction_user_profile v1.1.0/v1.1.1/v1.1.2 (relative-time resolution, started/finished pair preservation) - compress_session_for_query v1.0.0–v1.3.0 (the in-tool denoiser introduced earlier; currently hard-disabled at the code level) - Older prompt versions flipped to active: false Misc: - LiteLLMClient seeds default to "42" for benchmark reproducibility - /api/search response now exposes rehydrated_text (set by the search agent when it called read_session_text)

yilu331 added 2 commits April 14, 2026 15:47

chore(release): reflexio-ai v0.2.11

22553d3

feat: add skip_should_run_check config flag for extraction pre-check

1597506

Add a boolean config toggle that lets users bypass the LLM eligibility check before extraction. When enabled, extraction always proceeds without the pre-check LLM call.

yilu331 merged commit 4834903 into main Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add skip_should_run_check config flag#14

feat: add skip_should_run_check config flag#14
yilu331 merged 2 commits into
mainfrom
fix/type-checking-annotations

yilu331 commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yilu331 commented Apr 14, 2026

Summary

Context

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant