Skip to content

fix(schema): Add critical performance indexes to resolve create_namespace latency >30s#3939

Open
machov wants to merge 1 commit intoapache:mainfrom
machov:fix/polaris-3685-critical-performance-indexes
Open

fix(schema): Add critical performance indexes to resolve create_namespace latency >30s#3939
machov wants to merge 1 commit intoapache:mainfrom
machov:fix/polaris-3685-critical-performance-indexes

Conversation

@machov
Copy link

@machov machov commented Mar 5, 2026

Problem

Issue #3685 reports critical performance degradation where create_namespace API operations consistently timeout (>30s), causing cascading failures with 504 Gateway Timeouts.

Root Cause Analysis

Database analysis revealed:

  • 65,000+ permission check queries performing full table scans on grant_records
  • Bulk entity lookups with ~7,800 tuples causing inefficient query plans
  • 13.7 billion rows read during a single create_namespace operation

Solution

Added schema v5 with three performance-critical indexes:

  1. idx_grants_realm_grantee on grant_records(realm_id, grantee_id)

    • Eliminates sequential scans for permission checks
  2. idx_grants_realm_securable on grant_records(realm_id, securable_id)

    • Optimizes grant record cleanup and GC operations
  3. idx_entities_catalog_id_id on entities(catalog_id, id)

    • Optimizes bulk entity lookups with large IN clauses

Performance Impact

Based on issue reporter's testing:

  • Before: create_namespace consistently 33-36 seconds
  • After: Latency dropped to under 2 seconds

Database Compatibility

  • ✅ PostgreSQL schema v5 created
  • ✅ H2 schema v5 created
  • ✅ Uses IF NOT EXISTS for safe deployment

Fixes #3685

…pace latency

Add database schema v5 with performance-critical indexes that resolve
create_namespace operation timeouts from 30+ seconds to under 2 seconds.

Changes:
- idx_grants_realm_grantee on grant_records(realm_id, grantee_id)
- idx_grants_realm_securable on grant_records(realm_id, securable_id)
- idx_entities_catalog_id_id on entities(catalog_id, id)

These indexes eliminate sequential scans on grant_records table during
permission checks and optimize bulk entity lookups with large IN clauses.

Fixes apache#3685
@dimas-b
Copy link
Contributor

dimas-b commented Mar 6, 2026

@jbonofre : FYI ... I believe we spoke about this in the Community Sync call today and @machov kindly opened a PR with a fix already 🎉

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @machov ! The change LGTM 👍

schema diff against v4 for reference:

19,21c19,21
< -- Changes from v2:
< --  * Added `events` table
< --  * Added `idempotency_records` table for REST idempotency
---
> -- Changes from v4:
> --  * Added performance-critical indexes for grant_records table to fix create_namespace latency (Issue #3685)
> --  * Added optimized index for entities bulk lookups
31c31
< VALUES ('version', 4)
---
> VALUES ('version', 5)
59a60,61
> -- Additional index for bulk entity lookups (Issue #3685)
> CREATE INDEX IF NOT EXISTS idx_entities_catalog_id_id ON entities (catalog_id, id);
98a101,107
> 
> -- Performance-critical indexes for grant_records (Issue #3685)
> -- These indexes resolve create_namespace latency from 30+ seconds to under 2 seconds
> CREATE INDEX IF NOT EXISTS idx_grants_realm_grantee 
>     ON grant_records (realm_id, grantee_id);
> CREATE INDEX IF NOT EXISTS idx_grants_realm_securable 
>     ON grant_records (realm_id, securable_id);

Given that quite a few people are involved in JDBC persistence, let's give this PR a few extra days in review.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 6, 2026
@dimas-b
Copy link
Contributor

dimas-b commented Mar 6, 2026

@machov : Do you rely on having this fix an a released version soon? Note: 1.4.0 is in the works ATM.

@machov
Copy link
Author

machov commented Mar 6, 2026

Can you elaborate what else needs to be done? I see all tests passed

@dimas-b
Copy link
Contributor

dimas-b commented Mar 6, 2026

@machov : The PR is good to merge from my POV, I merely wanted it to have some more time in review in case other interested people have opinions on the new indexes.

The question about 1.4.0 was basically to check whether you need this fix in the 1.4.0 release or you're ok with merging it after 1.4.0.

@singhpk234
Copy link
Contributor

couple of feedbacks :

  • do we need a v5, this is technically an additional command ?
  • do we need to ship these indexes or can we add them as recommendation in our docs or code ?
  • obviously do we need this in in 1.4 ?

@dimas-b
Copy link
Contributor

dimas-b commented Mar 6, 2026

do we need a v5, this is technically an additional command ?

I'm personally fine with adding the new indexes to v4 DDL files.

However, from a more rigorous perspective, it makes sense to version the schema every time there is a material change. This way, it is easier to track how the Polaris database is expected to behave... For example, we could (hypothetically) deny the expensive operations with a v4 schema.

I do not mean to do that in current PR, just exposing options to consider 🙂

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. Echo @singhpk234, we probably reuse the v4 as 1.4.0 isn't release yet.

@dimas-b
Copy link
Contributor

dimas-b commented Mar 6, 2026

Good point - I missed that v4 schema was added after 1.3.0. Let's update v4 in this PR and merge it before 1.4.0 then. Adding to milestone.

@dimas-b dimas-b added this to the 1.4.0 milestone Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance Critical: create_namespace Latency (>30s) - Schema & Query Optimization Required

4 participants