Skip to content

feat(base): per-message / per-role analytics on RenderedTokens#38

Merged
hallerite merged 4 commits into
mainfrom
worktree-per-message-token-counts
May 16, 2026
Merged

feat(base): per-message / per-role analytics on RenderedTokens#38
hallerite merged 4 commits into
mainfrom
worktree-per-message-token-counts

Conversation

@hallerite
Copy link
Copy Markdown
Member

Summary

  • New RenderedTokens methods — tokens_per_message, tokens_by_role, message_token_spans, role_token_spans — derived from message_indices / sampled_mask and a new message_roles field every renderer now populates. Use cases: per-role length penalties for RL trainers (e.g. prime-rl), per-turn statistics over logprobs / attention / gradients via (start, end) spans.
  • Fixes attribution loss in every bridge_to_next_turn: bridges previously discarded msg_idx / is_sampled even though local emit helpers received them. Now bridges populate message_indices (relative to new_messages), sampled_mask (uniformly False — bridge output is a prompt), and message_roles. Consumers can run the new analytics on bridge output for incremental per-message accounting without re-rendering.
  • 17 renderers updated. 35 new parametrized tests; 1310 total passing.

Example

rendered = renderer.render(messages)
rendered.tokens_by_role(sampled_only=True)["assistant"]   # length-penalty signal
rendered.role_token_spans()["tool"]                       # [(start, end), ...]

Test plan

  • pytest tests/ — 1310 passed, 52 skipped (gpt-oss HF parity), 1 xfailed (pre-existing)
  • New tests parametrized across all 17 renderer fixtures in tests/test_tokens_per_message.py
  • End-to-end smoke with multi-turn tool-use trajectory (single render vs bridge-incremental) — counts match

🤖 Generated with Claude Code

hallerite and others added 2 commits May 15, 2026 23:19
Adds four methods, derived from existing message_indices /
sampled_mask plus a new message_roles field every renderer populates:

  tokens_per_message(*, sampled_only=False)
  tokens_by_role(*, sampled_only=False)
  message_token_spans()
  role_token_spans()

Use cases — per-role length penalties for RL trainers, per-turn
statistics on logprobs / attention / gradients via (start, end) spans.

Also fixes attribution loss in every bridge_to_next_turn: bridges
previously discarded msg_idx / is_sampled even though the local emit
helpers received them. Now bridges populate message_indices (relative
to new_messages), sampled_mask (uniformly False — bridge output is a
prompt), and message_roles. Consumers can run the new analytics on
bridge output for incremental per-message accounting without
re-rendering the conversation.

17 renderers updated. 35 new parametrized tests; 1310 total passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hallerite hallerite requested review from mikasenghaas and snimu May 16, 2026 00:43
Comment thread renderers/base.py Outdated
Comment thread renderers/base.py Outdated
Comment thread renderers/base.py
Frame RL length-penalty as an example rather than the use, and rewrite
the scaffolding sentence with explicit subject + concrete token examples
so it's no longer a run-on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread renderers/base.py
"""
if n_messages is None:
n_messages = len(self.message_roles)
out = [0] * n_messages
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't truncated to a max of len(self.message_roles) though, but it's returned directly below. So len(out) > len(self.message_roles) is possible. Is that intended?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 954ae83

Previously, passing n_messages > len(message_roles) padded the result
with trailing zeros for phantom messages the renderer never attributed.
The docstring framed the parameter as a truncation knob only, so this
was an undocumented footgun. Clamp to len(message_roles) and document
it; add a regression test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hallerite hallerite merged commit 17d0584 into main May 16, 2026
9 checks passed
@hallerite hallerite deleted the worktree-per-message-token-counts branch May 16, 2026 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants