feat: additive Q-learning + outcome-based rewards by anthroos · Pull Request #4 · anthroos/openexp

anthroos · 2026-03-22T16:22:52Z

Summary

Q-update formula changed: EMA → additive (Q = clamp(Q + α*r, floor, ceiling))
q_init: 0.5 → 0.0 — memories now earn value from zero based on actual utility
q_ceiling: 1.0 added as upper bound
Outcome resolver: CRM CSV stage transitions → memory rewards (closed-loop learning)
client_id tagging on memories for per-client context
resolve CLI command for manual outcome resolution
session-end hook with retrieval reward loop (rewards memories that were actually used)

Test plan

pytest tests/ -v → 73/73 pass
No private data in tracked files
.env not in git history
Manual test: run session-end hook, verify Q-values update

🤖 Generated with Claude Code

- Q-update: EMA → additive (Q = clamp(Q + α*r, floor, ceiling)) - q_init: 0.5 → 0.0 (memories earn value from zero) - q_ceiling: 1.0 added - Outcome resolver: CRM CSV transitions → memory rewards - client_id tagging on memories - resolve CLI command - session-end hook with retrieval reward loop - 73/73 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

For prompts >30 chars, inject a reminder to call search_memory before starting the task. Hooks do auto-recall, but targeted manual search catches context the auto-recall misses. Co-authored-by: Ivan Pasichnyk <ivanpasichnyk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

- Q-update: EMA → additive (Q = clamp(Q + α*r, floor, ceiling)) - q_init: 0.5 → 0.0 (memories earn value from zero) - q_ceiling: 1.0 added - Outcome resolver: CRM CSV transitions → memory rewards - client_id tagging on memories - resolve CLI command - session-end hook with retrieval reward loop - 73/73 tests pass Co-authored-by: Ivan Pasichnyk <ivanpasichnyk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

For prompts >30 chars, inject a reminder to call search_memory before starting the task. Hooks do auto-recall, but targeted manual search catches context the auto-recall misses. Co-authored-by: Ivan Pasichnyk <ivanpasichnyk@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

anthroos merged commit 7d25c28 into main Mar 22, 2026

anthroos deleted the feat/additive-q-learning-and-outcomes branch March 22, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: additive Q-learning + outcome-based rewards#4

feat: additive Q-learning + outcome-based rewards#4
anthroos merged 1 commit intomainfrom
feat/additive-q-learning-and-outcomes

anthroos commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anthroos commented Mar 22, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant