Skip to content

15 add token counter#16

Merged
Anthony-Bible merged 5 commits into
masterfrom
15-add-token-counter
Nov 29, 2025
Merged

15 add token counter#16
Anthony-Bible merged 5 commits into
masterfrom
15-add-token-counter

Conversation

@Anthony-Bible
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive token counting functionality to track exact token counts from the Google CountTokens API. The implementation includes database migrations, service interfaces, progressive batch processing with graceful degradation, and extensive test coverage.

Key changes:

  • Adds token counting fields (token_count, token_counted_at) to the code_chunks table via database migration
  • Implements three token counting methods: CountTokens, CountTokensBatch, and CountTokensWithCallback in the embedding service interface
  • Integrates progressive token counting into the job processor workflow with configurable modes ("all", "sample", "on_demand")

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.

Show a summary per file
File Description
migrations/000010_add_token_count_to_chunks.up.sql Adds token count columns and indexes to code_chunks table
migrations/000010_add_token_count_to_chunks.down.sql Rollback migration for token count features
internal/port/outbound/embedding_service.go Defines token counting interface methods and TokenCountResult type
internal/port/outbound/chunk_repository.go Adds token count fields to CodeChunk and UpdateTokenCounts method
internal/config/config.go Adds TokenCountingConfig for mode, sample percentage, and limits
internal/application/worker/job_processor.go Implements progressive token counting with batch saving and deduplication
internal/adapter/outbound/gemini/client.go Implements CountTokens methods calling Gemini API
internal/adapter/outbound/gemini/token_cache.go LRU cache for token counts with TTL support
internal/adapter/outbound/repository/chunk_repository.go Multi-row INSERT optimization and token count persistence
configs/config.yaml, configs/config.dev.yaml Token counting configuration with defaults
Test files Comprehensive test coverage for all new functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Anthony-Bible and others added 5 commits November 28, 2025 21:21
- Add token_count column to code_chunks table
- Implement CountTokens and CountTokensBatch in Gemini client
- Add TokenCache for efficient token count caching
- Define port layer interfaces for token counting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add token_count support to chunk repository operations
- Integrate token counting step in job processor workflow
- Add token counting metrics and configuration options
- Update test mocks with CountTokens methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Save chunks with token counts using SaveChunks instead of UpdateTokenCounts
- Add callback pattern for progressive token counting
- Enable chunk persistence during token counting phase

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Implement buildMultiRowChunkInsert helper for batch SQL generation
- Convert SaveChunks to use multi-row INSERT
- Convert FindOrCreateChunks to use multi-row INSERT
- Significantly reduce database round-trips for large repositories

All tests continue to pass. No behavior changes, only code quality improvements.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Implement deduplicateChunksByKey to remove duplicate chunks
- Integrate deduplication into countTokensForChunks
- Add deduplication to submitBatchJobAsync before FindOrCreateChunks
- Prevent duplicate key errors during batch processing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@Anthony-Bible Anthony-Bible merged commit 96d7c09 into master Nov 29, 2025
@Anthony-Bible Anthony-Bible deleted the 15-add-token-counter branch November 29, 2025 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants