-
Notifications
You must be signed in to change notification settings - Fork 399
Description
💾 Performance - Database Indexing Optimization
Goal
Implement comprehensive database indexing strategy to optimize query performance:
- Add indexes for foreign keys that are frequently used in JOINs
- Add composite indexes for common multi-column filter patterns
- Add indexes for frequently filtered columns (is_active, status, enabled)
- Add indexes for ORDER BY columns (created_at, updated_at, timestamps)
- Add indexes for unique lookups (email, slug, token)
- Optimize many-to-many association tables
- Add covering indexes where beneficial
This dramatically improves query performance by 10-100x for filtered and joined queries, reduces database CPU usage by 30-60%, and enables efficient pagination.
Why Now?
Database indexing is critical for scalability and performance:
- Query Performance: Reduce query time from seconds to milliseconds for filtered queries
- Scalability: Support 10x more concurrent users without database bottleneck
- Resource Efficiency: Reduce database CPU usage by 30-60%
- Better UX: Fast page loads even with thousands of records
- Federation Performance: Faster lookups for federated gateway queries
- Cost Savings: Reduce database costs by improving query efficiency
📖 User Stories
US-1: API Client - Fast Filtered Queries
As an API Client
I want filtered list queries to execute quickly
So that my application responds fast even with large datasets
Acceptance Criteria:
Given I request GET /tools with team_id filter
When the database has 10,000+ tools
Then the query should use an index on team_id
And the response time should be <50ms (vs 500-2000ms without index)
And the database should perform an index scan, not a full table scan
Given I request GET /tools?enabled=true&team_id=123
When filtering by multiple columns
Then the query should use a composite index (team_id, enabled)
And the query execution plan should show "Index Scan" not "Seq Scan"
And the response time should be <30ms
Given I request GET /teams with is_active=true filter
When the database has thousands of teams
Then the query should use an index on is_active
And pagination should be efficient with ORDER BY created_at
And the database CPU usage should be minimalTechnical Requirements:
- Indexes on all foreign key columns
- Composite indexes for common filter combinations
- Indexes on boolean filter columns (is_active, enabled, status)
- Indexes supporting ORDER BY clauses
US-2: Admin User - Fast Team Member Lookups
As an Admin User
I want team membership queries to execute instantly
So that I can quickly view team members and permissions
Acceptance Criteria:
Given I view a team with 100+ members
When the page loads member list
Then the query should use composite index (team_id, is_active)
And the query should execute in <20ms
And sorting by role should be efficient
Given I check if a user is a team member
When querying by team_id and user_email
Then the query should use composite index (team_id, user_email)
And the lookup should execute in <5ms
And the database should perform only an index-only scan
Given I list all teams for a user
When the user is a member of 50+ teams
Then the query should use index on user_email
And JOIN with teams should use index on team_id
And the query should execute in <50msTechnical Requirements:
- Composite index on (team_id, user_email, is_active) for email_team_members
- Index on user_email for reverse lookups
- Index on (team_id, role, is_active) for permission checks
- Efficient JOIN performance with indexed foreign keys
US-3: System - Efficient Cascade Deletes and Foreign Key Lookups
As a System
I want foreign key constraints and cascade operations to be efficient
So that deletes and updates don't cause performance bottlenecks
Acceptance Criteria:
Given I delete a team with many associated resources
When cascade delete operations occur
Then all foreign key lookups should use indexes
And the operation should complete in <100ms for 100 related records
And no full table scans should occur during cascade
Given I delete a gateway with federated tools
When the database checks foreign key references
Then the query should use index on tools.gateway_id
And the cascade delete should be efficient
And the operation should not block other queries
Given I update a user's email (primary key)
When foreign key references need to be updated
Then all referencing tables should have indexes on FK columns
And the update should propagate efficiently
And the operation should complete in <200msTechnical Requirements:
- Indexes on all foreign key columns for efficient lookups
- Proper cascade delete configuration
- Efficient JOIN performance for related records
- No full table scans during FK constraint checks
🏗 Architecture
Current State (Without Indexes)
-- Example: Query team members without composite index
EXPLAIN SELECT * FROM email_team_members
WHERE team_id = '123' AND is_active = true;
-- Result: Seq Scan on email_team_members (cost=0.00..1234.56)
-- Filter: (team_id = '123' AND is_active = true)
-- Planning time: 0.5ms
-- Execution time: 450ms ❌ SLOW - Full table scan!After Indexing
-- With composite index: CREATE INDEX idx_team_members_team_active ON email_team_members(team_id, is_active);
EXPLAIN SELECT * FROM email_team_members
WHERE team_id = '123' AND is_active = true;
-- Result: Index Scan using idx_team_members_team_active (cost=0.42..8.44)
-- Index Cond: (team_id = '123' AND is_active = true)
-- Planning time: 0.2ms
-- Execution time: 3ms ✅ FAST - Index scan!📋 Summary of Indexes to Add
150+ indexes across 35 tables organized into these categories:
Phase 1: Core Foreign Key Indexes (40+ indexes)
Tools, Resources, Prompts, Servers, Gateways, A2A Agents tables
Phase 2: RBAC and User Management (25+ indexes)
Roles, UserRoles, PermissionAuditLog, EmailUsers tables
Phase 3: Team Management (50+ indexes)
EmailTeams, EmailTeamMembers, EmailTeamInvitations, EmailTeamJoinRequests, History tables
Phase 4: Metrics Tables (15+ indexes)
Tool, Resource, Server, Prompt, A2A metrics tables
Phase 5: OAuth and Token Management (15+ indexes)
OAuth tokens, states, API tokens, usage logs, revocations
Phase 6: Session and SSO (12+ indexes)
Session records, SSO providers, auth sessions, auth events
Phase 7: Association Tables (8+ indexes)
Server-Tool, Server-Resource, Server-Prompt, Server-A2A associations
Phase 8: Miscellaneous (5+ indexes)
Resource subscriptions, GRPC services
Full implementation tasks are detailed in the complete Epic description.
✅ Success Criteria
- All foreign key columns have indexes (50+ indexes)
- Common filter patterns have composite indexes (30+ indexes)
- Boolean filter columns have indexes (20+ indexes)
- Timestamp columns for ORDER BY have indexes
- Query performance improved by 10-100x
- No sequential scans on large tables
- Database CPU usage reduced by 30-60%
- Migration tested on SQLite and PostgreSQL
- Documentation complete
🏁 Definition of Done
- Alembic migration created with 150+ index additions
- Migration tested on development and test databases
- EXPLAIN ANALYZE shows index usage
- Performance benchmarks show 10-100x improvement
- No regressions in existing tests
- Documentation updated
- Ready for production deployment
📝 Key Implementation Details
Index Types:
- Single column indexes for foreign keys
- Composite indexes for multi-column filters
- Partial indexes for filtered subsets
- Covering indexes to avoid table lookups
PostgreSQL vs SQLite:
- PostgreSQL: Use `CREATE INDEX CONCURRENTLY` (no locks)
- SQLite: Requires table locking during index creation
- Both support B-tree composite indexes
Migration Safety:
- Use concurrent index creation for PostgreSQL
- Test on development database first
- Include downgrade() for rollback
- Document estimated migration time
🔗 Related Issues
- Epic: REST API and UI Pagination for Large-Scale Multi-Tenant Deployments #1224 API Pagination (indexes enable efficient pagination)
- [Epic] 💾 Performance - Redis Endpoint Response Caching #1296 Redis Caching (reduces cache miss penalty)
- [Epic] ⚙️ Performance - Production Server Tuning #1297 Server Tuning (complements connection pooling)
- Feature Request: Implement Session Persistence & Pooling for Improved Performance and State Continuity #975 Session Pooling (reduces per-query overhead)
📚 References
Benchmarks:
docker-compose.ymlspins up postgresql (or mariadb) and nginx, or the associatedmake minikube*andcharts/ordocker-compose-ymltests/load/- can load up millions of records in the database, of all typesmcp-servers/go/benchmark-server- MCP server with arbitrary configurable number of tools, resources or prompts