Skip to content

feat: additive Q-learning + outcome-based rewards#4

Merged
anthroos merged 1 commit intomainfrom
feat/additive-q-learning-and-outcomes
Mar 22, 2026
Merged

feat: additive Q-learning + outcome-based rewards#4
anthroos merged 1 commit intomainfrom
feat/additive-q-learning-and-outcomes

Conversation

@anthroos
Copy link
Copy Markdown
Owner

Summary

  • Q-update formula changed: EMA → additive (Q = clamp(Q + α*r, floor, ceiling))
  • q_init: 0.5 → 0.0 — memories now earn value from zero based on actual utility
  • q_ceiling: 1.0 added as upper bound
  • Outcome resolver: CRM CSV stage transitions → memory rewards (closed-loop learning)
  • client_id tagging on memories for per-client context
  • resolve CLI command for manual outcome resolution
  • session-end hook with retrieval reward loop (rewards memories that were actually used)

Test plan

  • pytest tests/ -v → 73/73 pass
  • No private data in tracked files
  • .env not in git history
  • Manual test: run session-end hook, verify Q-values update

🤖 Generated with Claude Code

- Q-update: EMA → additive (Q = clamp(Q + α*r, floor, ceiling))
- q_init: 0.5 → 0.0 (memories earn value from zero)
- q_ceiling: 1.0 added
- Outcome resolver: CRM CSV transitions → memory rewards
- client_id tagging on memories
- resolve CLI command
- session-end hook with retrieval reward loop
- 73/73 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@anthroos anthroos merged commit 7d25c28 into main Mar 22, 2026
@anthroos anthroos deleted the feat/additive-q-learning-and-outcomes branch March 22, 2026 16:23
anthroos added a commit that referenced this pull request Mar 23, 2026
For prompts >30 chars, inject a reminder to call search_memory
before starting the task. Hooks do auto-recall, but targeted
manual search catches context the auto-recall misses.

Co-authored-by: Ivan Pasichnyk <ivanpasichnyk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
anthroos added a commit that referenced this pull request Apr 27, 2026
- Q-update: EMA → additive (Q = clamp(Q + α*r, floor, ceiling))
- q_init: 0.5 → 0.0 (memories earn value from zero)
- q_ceiling: 1.0 added
- Outcome resolver: CRM CSV transitions → memory rewards
- client_id tagging on memories
- resolve CLI command
- session-end hook with retrieval reward loop
- 73/73 tests pass

Co-authored-by: Ivan Pasichnyk <ivanpasichnyk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
anthroos added a commit that referenced this pull request Apr 27, 2026
For prompts >30 chars, inject a reminder to call search_memory
before starting the task. Hooks do auto-recall, but targeted
manual search catches context the auto-recall misses.

Co-authored-by: Ivan Pasichnyk <ivanpasichnyk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant