Skip to content

Comments

refactor: decompose stats-db into focused modules#252

Merged
reachraza merged 1 commit intomainfrom
code-refactor
Jan 30, 2026
Merged

refactor: decompose stats-db into focused modules#252
reachraza merged 1 commit intomainfrom
code-refactor

Conversation

@reachraza
Copy link
Collaborator

Decompose stats-db into focused modules

Summary

  • Breaks the 1,870-line monolithic src/main/stats-db.ts into 13 focused modules under src/main/stats/, following the same barrel-export pattern established in the agents module refactor
  • Splits the 6,400-line monolithic test file into 8 focused test files under src/__tests__/main/stats/ matching the source module structure
  • All 16,303 tests pass, TypeScript lint clean (0 errors), ESLint clean (0 new warnings), build successful

Module Breakdown

Module Responsibility Lines
types.ts Local interfaces (IntegrityCheckResult, BackupResult, Migration, etc.) 65
utils.ts LOG_CONTEXT, generateId(), getTimeRangeStart(), normalizePath(), StatementCache 97
schema.ts All CREATE_*_SQL constants, runStatements() helper 141
row-mappers.ts Typed raw row interfaces and snake_case→camelCase mapper functions 142
migrations.ts Migration registry, execution, history queries, migrateV1migrateV3 234
query-events.ts insertQueryEvent(), getQueryEvents() 87
auto-run.ts Auto Run session/task CRUD (insert, update, get) 172
session-lifecycle.ts recordSessionCreated(), recordSessionClosed(), getSessionLifecycleEvents() 105
aggregations.ts getAggregatedStats() decomposed into 10 sub-query functions 359
data-management.ts clearOldData(), exportToCsv(), csvEscape() 170
stats-db.ts Slimmed StatsDB class — lifecycle, vacuum, integrity, thin CRUD wrappers 523
singleton.ts getStatsDB(), initializeStatsDB(), closeStatsDB(), performance metrics API 87
index.ts Barrel re-exports preserving the full public API 44

Improvements Applied

  1. StatementCache — Reuses prepared statements across repeated CRUD calls instead of re-preparing each time
  2. get database() accessor — Single guard-and-throw replaces 18 repeated if (!this.db) throw checks
  3. Migration.up(db) signature — Migration functions receive db as a parameter, enabling standalone testing without class context
  4. runStatements() helper — Eliminates repeated split(';').filter(s => s.trim()) loops for multi-statement SQL
  5. CSV export completeness — Adds missing isRemote column and RFC 4180 compliant escaping via csvEscape()
  6. Transactional clearOldData() — All four DELETE operations wrapped in db.transaction() for atomic cleanup
  7. Statement cache cleanupclose() now calls clear*Cache() for each CRUD module to prevent stale references
  8. Row mapper functions — Centralized snake_case→camelCase conversion eliminates inline mapping duplication
  9. _meta table for vacuum timestamp — Replaces external stats-vacuum-timestamp file with self-contained DB storage
  10. CREATE_META_TABLE_SQL — Schema constant for the _meta key-value table used by vacuum scheduling

Test File Split

Test File Covers Tests
types.test.ts Shared type definitions 11
stats-db.test.ts Core class, init, singleton, migrations 36
query-events.test.ts Query event CRUD, filtering, CSV export 23
auto-run.test.ts Auto Run sessions/tasks, foreign keys 38
aggregations.test.ts Time ranges, aggregation calculations 62
paths.test.ts Cross-platform paths, normalization 65
integration.test.ts Concurrent access, native module verification 34
data-management.test.ts VACUUM, clearOldData 28

Verification

  • TypeScript: 0 errors across all 3 tsconfig files
  • ESLint: 0 errors (37 pre-existing warnings unchanged)
  • Tests: 16,303 passed across 392 files (297 stats tests across 8 split files)
  • Build: Successful
  • 1:1 Audit: Every function, constant, type, and SQL statement from the old monolith confirmed present in the new modules with no logic deviations

Test plan

  • npm run lint — TypeScript type checking passes
  • npm run lint:eslint — No new ESLint errors
  • npm run test — All 16,303 tests pass
  • npm run build — Production build succeeds

Break the 1,870-line monolithic stats-db.ts into 13 focused modules under
src/main/stats/, following the same pattern used in the agents module refactor.

Key improvements:
- StatementCache for prepared statement reuse across CRUD operations
- DB guard accessor (get database()) replaces 18 repeated null checks
- Transaction-wrapped clearOldData for atomic multi-table cleanup
- _meta table replaces external file for vacuum timestamp storage
- Row mapper functions centralize snake_case-to-camelCase conversion
- runStatements() helper eliminates repeated split-by-semicolon patterns
- Migration functions accept db param instead of relying on class context
- CSV export adds isRemote column and RFC 4180 compliant escaping
- Statement caches properly cleared on database close

Split the 6,400-line monolithic test file into 8 focused test files under
src/__tests__/main/stats/ matching the source module structure.
@claude
Copy link

claude bot commented Jan 30, 2026

PR Review: Refactor stats-db into focused modules

This is an excellent refactoring effort that significantly improves the codebase maintainability and follows established patterns. The decomposition is well-executed with comprehensive test coverage.

Strengths

  1. Well-structured modularization: The 13-module breakdown follows clear separation of concerns with logical groupings (schema, migrations, CRUD ops, aggregations, etc.)

  2. Excellent documentation: Each module has clear JSDoc comments explaining purpose and usage. The PR description provides a detailed breakdown with line counts and responsibilities.

  3. Barrel export pattern: Consistent with the agent module refactor, maintaining a clean public API via index.ts

  4. Performance improvements:

    • StatementCache for reusing prepared statements
    • get database() accessor eliminates 18 repeated null checks
    • Proper use of PerformanceMetrics class with configurable logging
  5. Data integrity enhancements:

    • Transactional clearOldData() ensures atomic cleanup
    • Migration functions receive db parameter for testability
    • _meta table replaces external file storage for vacuum timestamps
  6. CSV export completeness: Adds missing isRemote column with RFC 4180 compliant escaping

  7. Comprehensive test coverage: 297 tests across 8 focused test files matching the module structure

  8. Type safety: Centralized row mappers eliminate inline mapping duplication and improve type safety with explicit *Row interfaces

🔍 Areas for Consideration

1. Statement Cache Lifecycle (Minor)

In src/main/stats/query-events.ts:14, the stmtCache is a module-level singleton:

const stmtCache = new StatementCache();

While clearQueryEventCache() is called in StatsDB.close(), there's a potential issue if multiple StatsDB instances are created (though the singleton pattern prevents this in practice). Consider documenting that these caches are tied to the singleton instance lifecycle.

Suggestion: Add a JSDoc comment to clearQueryEventCache() noting it's called by StatsDB.close() and should not be called directly.

2. Migration Function Signature (Minor)

The migration functions in src/main/stats/migrations.ts use the new up: (db) => migrateV1(db) pattern. This is good for testability, but the actual migration functions are still private (function migrateV1()). Consider exporting them for unit testing in isolation:

export function migrateV1(db: Database.Database): void {
  // ...
}

This would allow testing individual migrations without running the full migration system.

3. Error Handling in Row Mappers (Low Priority)

The row mappers in src/main/stats/row-mappers.ts assume well-formed data from SQLite. While SQLite schema constraints protect against most issues, consider adding runtime validation for critical fields to handle potential corruption gracefully:

export function mapQueryEventRow(row: QueryEventRow): QueryEvent {
  if (!row.id || !row.session_id || !row.agent_type) {
    throw new Error('Invalid query event row: missing required fields');
  }
  // ...
}

4. SQL Injection Protection (Security - Low Risk)

In src/main/stats/query-events.ts:47-72, dynamic SQL construction uses parameterized queries correctly. However, the ORDER BY clause is hardcoded. If you ever need dynamic sorting, ensure column names are whitelisted:

// Future-proofing example (not needed now)
const allowedSortColumns = ['start_time', 'duration', 'agent_type'];
if (sortColumn && !allowedSortColumns.includes(sortColumn)) {
  throw new Error('Invalid sort column');
}

Current code is safe - this is just a preventive note for future enhancements.

5. Performance Metrics Threshold (Minor)

In src/main/stats/aggregations.ts:332-337, the warning threshold uses PERFORMANCE_THRESHOLDS.DASHBOARD_LOAD. Verify this threshold is appropriate for aggregations across large datasets. Consider adding metrics collection in production to tune this value.

6. CSV Export Memory Usage (Performance)

src/main/stats/data-management.ts:142 loads all events into memory before CSV conversion:

const events = getQueryEvents(db, range);

For very large datasets (years of data), this could cause memory issues. Consider implementing streaming CSV export if this becomes a concern:

// Future enhancement
export function* exportToCsvStream(db: Database.Database, range: StatsTimeRange) {
  yield headers.join(',') + '\n';
  // Stream rows in batches
}

Not critical for current usage, but worth noting for future scalability.

🧪 Testing

  • 16,303 tests passing - Excellent coverage
  • Test file split mirrors source structure - Good organization
  • Proper use of mocks for native modules (better-sqlite3)
  • Integration tests verify concurrent access

Recommendation: Consider adding a smoke test that verifies the old monolithic API surface is fully preserved by the new modular structure (e.g., importing both and comparing exported symbols).

📝 Documentation & Conventions

  • Follows CLAUDE.md conventions for module structure
  • Clear migration guide in migrations.ts:7-12
  • Proper use of barrel exports matching agent module pattern

🎯 Overall Assessment

This is a high-quality refactoring that:

  • ✅ Maintains backward compatibility
  • ✅ Improves code organization and testability
  • ✅ Adds performance optimizations
  • ✅ Enhances data integrity
  • ✅ Includes comprehensive tests
  • ✅ Follows established patterns

The concerns raised are minor and mostly preventive. The PR is ready to merge as-is, with the suggestions being optional enhancements for future consideration.

Verdict: ✅ Approved - Excellent work!


📊 Metrics

  • Lines removed: 8,278 (monolithic)
  • Lines added: 9,393 (modular + tests)
  • Net change: +1,115 (better structure with more tests)
  • Modules created: 13
  • Test files: 8
  • Test coverage: 297 tests for stats module

Great job maintaining comprehensive test coverage while improving code organization! 🎉

@reachraza reachraza merged commit 7db8e11 into main Jan 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant