Skip to content

feat(tokens): unify counting via ports.Tokenizer port#340

Merged
pocky merged 1 commit into
mainfrom
feature/F094-token-counting--unify-via-tokenizer-port
May 13, 2026
Merged

feat(tokens): unify counting via ports.Tokenizer port#340
pocky merged 1 commit into
mainfrom
feature/F094-token-counting--unify-via-tokenizer-port

Conversation

@pocky
Copy link
Copy Markdown
Contributor

@pocky pocky commented May 13, 2026

Summary

  • Unified token counting for all CLI-based agent providers (Claude, Gemini, Codex, Copilot, OpenCode) behind an injected ports.Tokenizer interface, replacing scattered estimateTokens/estimateInputTokens inline helpers
  • Each provider now exposes an extractTokenUsage hook that pulls real token counts from its JSON output; a fallbackTokenizer (len/4) is used only when the provider does not emit token data, with TokensEstimated=true to signal approximation
  • Adds TokensInput, TokensOutput, and TokensEstimated fields to step state, making them accessible as interpolation variables in workflow YAML
  • Removes the tiktoken-go dependency entirely and deletes the tiktoken_tokenizer implementation; the ports.Tokenizer port is the single injection point for future real tokenizer swaps

Changes

Domain

  • internal/domain/workflow/context.go: Add TokensInput, TokensOutput, TokensEstimated fields to StepState
  • internal/domain/workflow/reference.go: Register new token fields in ValidStateProperties and lowercaseToUppercase alias map

Application

  • internal/application/execution_service.go: Propagate TokensInput, TokensOutput, TokensEstimated from conversation and single-turn results into step state
  • internal/application/interpolation_helpers.go: Include new token fields when building interpolation context from step state

Infrastructure — base provider

  • internal/infrastructure/agents/base_cli_provider.go: Add tokenizer ports.Tokenizer field to baseCLIProvider; default to fallbackTokenizer; add extractTokenUsage hook to cliProviderHooks; replace estimateTokens/estimateInputTokens calls with tokenizer calls in both execute and executeConversation
  • internal/infrastructure/agents/base_cli_provider_tokenizer_test.go: New — 390-line test suite covering tokenizer injection, IsEstimate propagation, CountTurnsTokens slicing, no-mutation guarantee on prior turns, and error-path guard

Infrastructure — per-provider hooks

  • internal/infrastructure/agents/claude_provider.go: Add tokenizer field; wire extractClaudeTokenUsage hook (parses result event usage, including cache tokens and total_cost_usd)
  • internal/infrastructure/agents/gemini_provider.go: Add tokenizer field; wire extractGeminiTokenUsage hook (parses result event stats)
  • internal/infrastructure/agents/codex_provider.go: Add tokenizer field; wire extractCodexTokenUsage hook (parses turn.completed event usage)
  • internal/infrastructure/agents/copilot_provider.go: Add tokenizer field; wire extractCopilotTokenUsage hook (parses assistant.message event outputTokens)
  • internal/infrastructure/agents/opencode_provider.go: Add tokenizer field; wire extractOpenCodeTokenUsage hook (parses step_finish event part.tokens)
  • internal/infrastructure/agents/helpers.go: Remove dead estimateTokens and estimateInputTokens helpers; add intFromMap utility used by extraction hooks
  • internal/infrastructure/agents/helpers_test.go: Remove tests for deleted helpers
  • internal/infrastructure/agents/options.go: Add tokenizer-related provider option constants
  • internal/infrastructure/agents/provider_options_test.go: Expand option tests to cover tokenizer injection

Removed

  • internal/infrastructure/tokenizer/tiktoken_tokenizer.go: Deleted — tiktoken implementation removed; ports.Tokenizer is now the extension point
  • internal/infrastructure/tokenizer/tiktoken_tokenizer_test.go: Deleted — accompanying tests

Interpolation

  • pkg/interpolation/reference.go: Register TokensInput, TokensOutput, TokensEstimated in ValidStateProperties
  • pkg/interpolation/resolver.go: Map new token fields into StepStateData during resolution
  • pkg/interpolation/reference_json_field_test.go: Update fixture expectations for new fields
  • pkg/interpolation/reference_test.go: Update reference validation tests

Dependencies

  • go.mod: Remove tiktoken-go, glamour, and several transitive dependencies pulled in by tiktoken
  • go.sum: Remove corresponding checksums

Docs

  • docs/development/creating-agent-provider.md: New — comprehensive guide for implementing a new agent provider (hooks, token extraction, testing)
  • docs/reference/interpolation.md: Document TokensInput, TokensOutput, TokensEstimated variables; add per-provider source table
  • docs/user-guide/agent-steps.md: Update token tracking section with new fields, TokensEstimated semantics, and per-provider source table
  • docs/development/architecture.md: Update tokenizer package description
  • docs/development/project-structure.md: Update tokenizer package annotation

Project config

  • CHANGELOG.md: Add F094 entry under Unreleased
  • CLAUDE.md: Add nolint:errcheck replication rule; remove stale pitfall entry

Test plan

  • make build — binary compiles with tiktoken dependency removed
  • make test — all unit and integration tests pass, including new tokenizer tests in base_cli_provider_tokenizer_test.go
  • make lint — zero violations; nolint:errcheck directives present with matching comments across all providers
  • Run a workflow with a Claude agent step and verify {{.states.step.TokensInput}}, {{.states.step.TokensOutput}}, and {{.states.step.TokensEstimated}} interpolate correctly in a downstream step

Closes #339


Generated with awf commit workflow

- `CHANGELOG.md`: Document F094 unified token counting changes
- `CLAUDE.md`: Add nolint:errcheck replication rule; remove stale pitfall
- `docs/development/architecture.md`: Update tokenizer/ description with ports.Tokenizer detail
- `docs/development/creating-agent-provider.md`: Add new provider creation guide (1004 lines)
- `docs/development/project-structure.md`: Update tokenizer/ directory description
- `docs/reference/interpolation.md`: Document TokensInput, TokensOutput, TokensEstimated fields
- `docs/user-guide/agent-steps.md`: Add token tracking table with new fields and provider matrix
- `go.mod`: Remove tiktoken-go and glamour dependencies
- `go.sum`: Remove checksums for removed dependencies
- `internal/application/execution_service.go`: Propagate TokensInput, TokensOutput, TokensEstimated into step state
- `internal/application/interpolation_helpers.go`: Map new token fields into interpolation context
- `internal/domain/workflow/context.go`: Add TokensInput, TokensOutput, TokensEstimated to StepState
- `internal/domain/workflow/reference.go`: Register new token properties in ValidStateProperties and alias map
- `internal/infrastructure/agents/base_cli_provider.go`: Inject ports.Tokenizer; add extractTokenUsage hook; use real tokens when available, fallback to tokenizer estimate
- `internal/infrastructure/agents/base_cli_provider_tokenizer_test.go`: Add 390-line tokenizer integration tests for execute and conversation paths
- `internal/infrastructure/agents/claude_provider.go`: Wire extractTokenUsage hook from claude result event usage field
- `internal/infrastructure/agents/codex_provider.go`: Wire extractTokenUsage hook from turn.completed event usage field
- `internal/infrastructure/agents/copilot_provider.go`: Wire extractTokenUsage hook from assistant.message outputTokens field
- `internal/infrastructure/agents/gemini_provider.go`: Wire extractTokenUsage hook from result event stats field
- `internal/infrastructure/agents/helpers.go`: Remove dead estimateTokens and estimateInputTokens helpers
- `internal/infrastructure/agents/helpers_test.go`: Remove tests for deleted estimation helpers
- `internal/infrastructure/agents/opencode_provider.go`: Wire extractTokenUsage hook from step_finish part.tokens field
- `internal/infrastructure/agents/options.go`: Add SetTokenizer option for baseCLIProvider injection
- `internal/infrastructure/agents/provider_options_test.go`: Add tokenizer injection tests
- `internal/infrastructure/tokenizer/tiktoken_tokenizer.go`: Delete TiktokenTokenizer (tiktoken dep removed)
- `internal/infrastructure/tokenizer/tiktoken_tokenizer_test.go`: Delete tiktoken tokenizer tests
- `pkg/interpolation/reference.go`: Register TokensInput, TokensOutput, TokensEstimated in ValidStateProperties
- `pkg/interpolation/reference_json_field_test.go`: Update tests for new token property names
- `pkg/interpolation/reference_test.go`: Update reference validation tests
- `pkg/interpolation/resolver.go`: Handle TokensEstimated bool type in template resolver

Closes #339
@pocky pocky marked this pull request as ready for review May 13, 2026 09:56
@pocky pocky merged commit 8d0e753 into main May 13, 2026
5 checks passed
@pocky pocky deleted the feature/F094-token-counting--unify-via-tokenizer-port branch May 13, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

F094: Token Counting — Unify via Tokenizer Port

1 participant