Skip to content

feat: strengthen retrieval eval and runtime integrations#1

Merged
cafitac merged 4 commits intomainfrom
feat/retrieval-eval-fixtures
Apr 29, 2026
Merged

feat: strengthen retrieval eval and runtime integrations#1
cafitac merged 4 commits intomainfrom
feat/retrieval-eval-fixtures

Conversation

@cafitac
Copy link
Copy Markdown
Owner

@cafitac cafitac commented Apr 29, 2026

Summary

  • expand retrieval evaluation into a stronger regression/comparator harness with symbolic fixture references, comparator matrix baselines, selective baseline gating, soft advisories, richer task metadata, and summary/delta rollups
  • add Codex and Claude prompt attach surfaces plus reusable wrapper scripts, and document real runtime readiness status
  • fix Hermes hook bootstrap merge behavior so existing hook indentation and continuation lines are preserved for both fresh-install and existing-hooks users

Key changes

  • retrieval-eval
    • support symbolic top-level fixture references plus richer selectors
    • add baseline modes: lexical, lexical-global, source-lexical, source-global
    • add selective --fail-on-baseline-regression-memory-type gating
    • add advisory-only soft thresholds for regression and baseline regression
    • expose rationale, notes, by_memory_type, by_primary_task_type, top-level pass/fail counts, and richer delta rollups
    • add adversarial checked-in fixtures for stale/drift/guardrail scenarios across facts, procedures, and episodes
  • runtime attach surfaces
    • add codex-prompt and claude-prompt CLI commands
    • add scripts/run_codex_with_memory.py and scripts/run_claude_with_memory.py
    • verify Codex wrapper E2E; keep Claude Code marked as pending auth-gated E2E
  • Hermes integration
    • preserve event-local indentation when merging hook snippets into existing Hermes configs
    • document fresh-install, existing-hook merge, doctor/test flow, and first-run --accept-hooks verification

Verification

  • uv run pytest tests/test_prompt_wrapper_scripts.py -q
  • uv run pytest tests/test_cli.py -q
  • uv run pytest tests/test_cli.py tests/test_retrieval_evaluation.py -q
  • uv run pytest tests/ -q
  • real Hermes verification:
    • uv run agent-memory hermes-bootstrap ~/.agent-memory/memory.db --config-path ~/.hermes/config.yaml
    • hermes hooks doctor
    • hermes hooks test pre_llm_call
    • hermes --accept-hooks chat -q 'Reply with OK only.' --quiet
  • real Codex verification:
    • codex exec --skip-git-repo-check --sandbox workspace-write --model gpt-5.4-mini 'Reply with OK only.'

Readiness status

  • Hermes: ready for real use
  • Codex: ready for real use
  • Claude Code: prompt/wrapper surface present, but final sign-off still blocked on auth-enabled E2E

@cafitac cafitac merged commit e305cc4 into main Apr 29, 2026
2 checks passed
@cafitac cafitac deleted the feat/retrieval-eval-fixtures branch April 29, 2026 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant