Skip to content

[Epic] 💾 Performance - Database Indexing Optimization #1353

@crivetimihai

Description

@crivetimihai

💾 Performance - Database Indexing Optimization

Goal

Implement comprehensive database indexing strategy to optimize query performance:

  1. Add indexes for foreign keys that are frequently used in JOINs
  2. Add composite indexes for common multi-column filter patterns
  3. Add indexes for frequently filtered columns (is_active, status, enabled)
  4. Add indexes for ORDER BY columns (created_at, updated_at, timestamps)
  5. Add indexes for unique lookups (email, slug, token)
  6. Optimize many-to-many association tables
  7. Add covering indexes where beneficial

This dramatically improves query performance by 10-100x for filtered and joined queries, reduces database CPU usage by 30-60%, and enables efficient pagination.

Why Now?

Database indexing is critical for scalability and performance:

  1. Query Performance: Reduce query time from seconds to milliseconds for filtered queries
  2. Scalability: Support 10x more concurrent users without database bottleneck
  3. Resource Efficiency: Reduce database CPU usage by 30-60%
  4. Better UX: Fast page loads even with thousands of records
  5. Federation Performance: Faster lookups for federated gateway queries
  6. Cost Savings: Reduce database costs by improving query efficiency

📖 User Stories

US-1: API Client - Fast Filtered Queries

As an API Client
I want filtered list queries to execute quickly
So that my application responds fast even with large datasets

Acceptance Criteria:

Given I request GET /tools with team_id filter
When the database has 10,000+ tools
Then the query should use an index on team_id
And the response time should be <50ms (vs 500-2000ms without index)
And the database should perform an index scan, not a full table scan

Given I request GET /tools?enabled=true&team_id=123
When filtering by multiple columns
Then the query should use a composite index (team_id, enabled)
And the query execution plan should show "Index Scan" not "Seq Scan"
And the response time should be <30ms

Given I request GET /teams with is_active=true filter
When the database has thousands of teams
Then the query should use an index on is_active
And pagination should be efficient with ORDER BY created_at
And the database CPU usage should be minimal

Technical Requirements:

  • Indexes on all foreign key columns
  • Composite indexes for common filter combinations
  • Indexes on boolean filter columns (is_active, enabled, status)
  • Indexes supporting ORDER BY clauses
US-2: Admin User - Fast Team Member Lookups

As an Admin User
I want team membership queries to execute instantly
So that I can quickly view team members and permissions

Acceptance Criteria:

Given I view a team with 100+ members
When the page loads member list
Then the query should use composite index (team_id, is_active)
And the query should execute in <20ms
And sorting by role should be efficient

Given I check if a user is a team member
When querying by team_id and user_email
Then the query should use composite index (team_id, user_email)
And the lookup should execute in <5ms
And the database should perform only an index-only scan

Given I list all teams for a user
When the user is a member of 50+ teams
Then the query should use index on user_email
And JOIN with teams should use index on team_id
And the query should execute in <50ms

Technical Requirements:

  • Composite index on (team_id, user_email, is_active) for email_team_members
  • Index on user_email for reverse lookups
  • Index on (team_id, role, is_active) for permission checks
  • Efficient JOIN performance with indexed foreign keys
US-3: System - Efficient Cascade Deletes and Foreign Key Lookups

As a System
I want foreign key constraints and cascade operations to be efficient
So that deletes and updates don't cause performance bottlenecks

Acceptance Criteria:

Given I delete a team with many associated resources
When cascade delete operations occur
Then all foreign key lookups should use indexes
And the operation should complete in <100ms for 100 related records
And no full table scans should occur during cascade

Given I delete a gateway with federated tools
When the database checks foreign key references
Then the query should use index on tools.gateway_id
And the cascade delete should be efficient
And the operation should not block other queries

Given I update a user's email (primary key)
When foreign key references need to be updated
Then all referencing tables should have indexes on FK columns
And the update should propagate efficiently
And the operation should complete in <200ms

Technical Requirements:

  • Indexes on all foreign key columns for efficient lookups
  • Proper cascade delete configuration
  • Efficient JOIN performance for related records
  • No full table scans during FK constraint checks

🏗 Architecture

Current State (Without Indexes)

-- Example: Query team members without composite index
EXPLAIN SELECT * FROM email_team_members 
WHERE team_id = '123' AND is_active = true;

-- Result: Seq Scan on email_team_members (cost=0.00..1234.56)
--         Filter: (team_id = '123' AND is_active = true)
--         Planning time: 0.5ms
--         Execution time: 450ms ❌ SLOW - Full table scan!

After Indexing

-- With composite index: CREATE INDEX idx_team_members_team_active ON email_team_members(team_id, is_active);
EXPLAIN SELECT * FROM email_team_members 
WHERE team_id = '123' AND is_active = true;

-- Result: Index Scan using idx_team_members_team_active (cost=0.42..8.44)
--         Index Cond: (team_id = '123' AND is_active = true)
--         Planning time: 0.2ms
--         Execution time: 3ms ✅ FAST - Index scan!

📋 Summary of Indexes to Add

150+ indexes across 35 tables organized into these categories:

Phase 1: Core Foreign Key Indexes (40+ indexes)

Tools, Resources, Prompts, Servers, Gateways, A2A Agents tables

Phase 2: RBAC and User Management (25+ indexes)

Roles, UserRoles, PermissionAuditLog, EmailUsers tables

Phase 3: Team Management (50+ indexes)

EmailTeams, EmailTeamMembers, EmailTeamInvitations, EmailTeamJoinRequests, History tables

Phase 4: Metrics Tables (15+ indexes)

Tool, Resource, Server, Prompt, A2A metrics tables

Phase 5: OAuth and Token Management (15+ indexes)

OAuth tokens, states, API tokens, usage logs, revocations

Phase 6: Session and SSO (12+ indexes)

Session records, SSO providers, auth sessions, auth events

Phase 7: Association Tables (8+ indexes)

Server-Tool, Server-Resource, Server-Prompt, Server-A2A associations

Phase 8: Miscellaneous (5+ indexes)

Resource subscriptions, GRPC services

Full implementation tasks are detailed in the complete Epic description.


✅ Success Criteria

  • All foreign key columns have indexes (50+ indexes)
  • Common filter patterns have composite indexes (30+ indexes)
  • Boolean filter columns have indexes (20+ indexes)
  • Timestamp columns for ORDER BY have indexes
  • Query performance improved by 10-100x
  • No sequential scans on large tables
  • Database CPU usage reduced by 30-60%
  • Migration tested on SQLite and PostgreSQL
  • Documentation complete

🏁 Definition of Done

  • Alembic migration created with 150+ index additions
  • Migration tested on development and test databases
  • EXPLAIN ANALYZE shows index usage
  • Performance benchmarks show 10-100x improvement
  • No regressions in existing tests
  • Documentation updated
  • Ready for production deployment

📝 Key Implementation Details

Index Types:

  • Single column indexes for foreign keys
  • Composite indexes for multi-column filters
  • Partial indexes for filtered subsets
  • Covering indexes to avoid table lookups

PostgreSQL vs SQLite:

  • PostgreSQL: Use `CREATE INDEX CONCURRENTLY` (no locks)
  • SQLite: Requires table locking during index creation
  • Both support B-tree composite indexes

Migration Safety:

  • Use concurrent index creation for PostgreSQL
  • Test on development database first
  • Include downgrade() for rollback
  • Document estimated migration time

🔗 Related Issues


📚 References

Benchmarks:

  • docker-compose.yml spins up postgresql (or mariadb) and nginx, or the associated make minikube* and charts/ or docker-compose-yml
  • tests/load/ - can load up millions of records in the database, of all types
  • mcp-servers/go/benchmark-server - MCP server with arbitrary configurable number of tools, resources or prompts

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestperformancePerformance related itemspythonPython / backend development (FastAPI)

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions