feat: strengthen retrieval eval and runtime integrations by cafitac · Pull Request #1 · cafitac/agent-memory

cafitac · 2026-04-29T12:42:29Z

Summary

expand retrieval evaluation into a stronger regression/comparator harness with symbolic fixture references, comparator matrix baselines, selective baseline gating, soft advisories, richer task metadata, and summary/delta rollups
add Codex and Claude prompt attach surfaces plus reusable wrapper scripts, and document real runtime readiness status
fix Hermes hook bootstrap merge behavior so existing hook indentation and continuation lines are preserved for both fresh-install and existing-hooks users

retrieval-eval
- support symbolic top-level fixture references plus richer selectors
- add baseline modes: lexical, lexical-global, source-lexical, source-global
- add selective --fail-on-baseline-regression-memory-type gating
- add advisory-only soft thresholds for regression and baseline regression
- expose rationale, notes, by_memory_type, by_primary_task_type, top-level pass/fail counts, and richer delta rollups
- add adversarial checked-in fixtures for stale/drift/guardrail scenarios across facts, procedures, and episodes
runtime attach surfaces
- add codex-prompt and claude-prompt CLI commands
- add scripts/run_codex_with_memory.py and scripts/run_claude_with_memory.py
- verify Codex wrapper E2E; keep Claude Code marked as pending auth-gated E2E
Hermes integration
- preserve event-local indentation when merging hook snippets into existing Hermes configs
- document fresh-install, existing-hook merge, doctor/test flow, and first-run --accept-hooks verification

uv run pytest tests/test_prompt_wrapper_scripts.py -q
uv run pytest tests/test_cli.py -q
uv run pytest tests/test_cli.py tests/test_retrieval_evaluation.py -q
uv run pytest tests/ -q
real Hermes verification:
- uv run agent-memory hermes-bootstrap ~/.agent-memory/memory.db --config-path ~/.hermes/config.yaml
- hermes hooks doctor
- hermes hooks test pre_llm_call
- hermes --accept-hooks chat -q 'Reply with OK only.' --quiet
real Codex verification:
- codex exec --skip-git-repo-check --sandbox workspace-write --model gpt-5.4-mini 'Reply with OK only.'

Hermes: ready for real use
Codex: ready for real use
Claude Code: prompt/wrapper surface present, but final sign-off still blocked on auth-enabled E2E

cafitac added 4 commits April 29, 2026 15:25

feat(retrieval-eval): add comparator matrix and soft gates

92b008f

feat(cli): add Codex and Claude memory wrapper scripts

d72b746

fix(hermes): preserve existing hook indentation on bootstrap merge

95c4c22

docs(readme): clarify Hermes onboarding and verification

7e25a90

cafitac merged commit e305cc4 into main Apr 29, 2026
2 checks passed

cafitac deleted the feat/retrieval-eval-fixtures branch April 29, 2026 19:45