Skip to content

Improve schema sync performance for large schemas #1681

@kmcginnes

Description

@kmcginnes

Description

Schema sync fires thousands of sequential network requests when discovering a schema with 70k+ node types. Gremlin batches 100 types per request (700 batches), while openCypher and SPARQL fire one request per type at concurrency 10 (7,000 sequential batches). This makes initial schema discovery take an extremely long time with no user feedback or cancellation option.

Preferred Solution

1. Add progress reporting and cancellation to schema sync

Show a progress indicator during schema discovery so users know how far along the process is and can cancel if needed. The current UI gives no feedback during the potentially long sync.

2. Increase batch concurrency

DEFAULT_CONCURRENT_REQUESTS_LIMIT is 10 and DEFAULT_BATCH_REQUEST_SIZE is 100. Increase these values, especially for openCypher and SPARQL which currently fire one request per type. Consider batching multiple types into a single query where the query language supports it.

3. Consider lazy attribute fetching

Instead of fetching attribute details for all 70k types upfront, only fetch attributes for types the user actually interacts with (expands in the explorer, selects in data explorer, etc.). Store a lightweight type list initially and fetch full type configs on demand.

4. Optimize IndexedDB persistence for large schemas

With 70k types, the serialized schema can be ~43-86 MB. Deserialization on app startup blocks rendering. Consider chunked persistence or lazy loading of the schema from IndexedDB.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    connectionIssues related to database connection management or optionsenhancementNew feature or requestperformanceIssues relating to performanceschemaIssues related to the schema definition or synchronization
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions