15 add token counter#16
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive token counting functionality to track exact token counts from the Google CountTokens API. The implementation includes database migrations, service interfaces, progressive batch processing with graceful degradation, and extensive test coverage.
Key changes:
- Adds token counting fields (
token_count,token_counted_at) to thecode_chunkstable via database migration - Implements three token counting methods:
CountTokens,CountTokensBatch, andCountTokensWithCallbackin the embedding service interface - Integrates progressive token counting into the job processor workflow with configurable modes ("all", "sample", "on_demand")
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
migrations/000010_add_token_count_to_chunks.up.sql |
Adds token count columns and indexes to code_chunks table |
migrations/000010_add_token_count_to_chunks.down.sql |
Rollback migration for token count features |
internal/port/outbound/embedding_service.go |
Defines token counting interface methods and TokenCountResult type |
internal/port/outbound/chunk_repository.go |
Adds token count fields to CodeChunk and UpdateTokenCounts method |
internal/config/config.go |
Adds TokenCountingConfig for mode, sample percentage, and limits |
internal/application/worker/job_processor.go |
Implements progressive token counting with batch saving and deduplication |
internal/adapter/outbound/gemini/client.go |
Implements CountTokens methods calling Gemini API |
internal/adapter/outbound/gemini/token_cache.go |
LRU cache for token counts with TTL support |
internal/adapter/outbound/repository/chunk_repository.go |
Multi-row INSERT optimization and token count persistence |
configs/config.yaml, configs/config.dev.yaml |
Token counting configuration with defaults |
| Test files | Comprehensive test coverage for all new functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add token_count column to code_chunks table - Implement CountTokens and CountTokensBatch in Gemini client - Add TokenCache for efficient token count caching - Define port layer interfaces for token counting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add token_count support to chunk repository operations - Integrate token counting step in job processor workflow - Add token counting metrics and configuration options - Update test mocks with CountTokens methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Save chunks with token counts using SaveChunks instead of UpdateTokenCounts - Add callback pattern for progressive token counting - Enable chunk persistence during token counting phase Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Implement buildMultiRowChunkInsert helper for batch SQL generation - Convert SaveChunks to use multi-row INSERT - Convert FindOrCreateChunks to use multi-row INSERT - Significantly reduce database round-trips for large repositories All tests continue to pass. No behavior changes, only code quality improvements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Implement deduplicateChunksByKey to remove duplicate chunks - Integrate deduplication into countTokensForChunks - Add deduplication to submitBatchJobAsync before FindOrCreateChunks - Prevent duplicate key errors during batch processing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
b9181fd to
e4d2adb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.