-
Notifications
You must be signed in to change notification settings - Fork 88
Improve schema sync performance for large schemas #1681
Copy link
Copy link
Open
Labels
connectionIssues related to database connection management or optionsIssues related to database connection management or optionsenhancementNew feature or requestNew feature or requestperformanceIssues relating to performanceIssues relating to performanceschemaIssues related to the schema definition or synchronizationIssues related to the schema definition or synchronization
Metadata
Metadata
Assignees
Labels
connectionIssues related to database connection management or optionsIssues related to database connection management or optionsenhancementNew feature or requestNew feature or requestperformanceIssues relating to performanceIssues relating to performanceschemaIssues related to the schema definition or synchronizationIssues related to the schema definition or synchronization
Type
Fields
Give feedbackNo fields configured for Feature.
Description
Schema sync fires thousands of sequential network requests when discovering a schema with 70k+ node types. Gremlin batches 100 types per request (700 batches), while openCypher and SPARQL fire one request per type at concurrency 10 (7,000 sequential batches). This makes initial schema discovery take an extremely long time with no user feedback or cancellation option.
Preferred Solution
1. Add progress reporting and cancellation to schema sync
Show a progress indicator during schema discovery so users know how far along the process is and can cancel if needed. The current UI gives no feedback during the potentially long sync.
2. Increase batch concurrency
DEFAULT_CONCURRENT_REQUESTS_LIMITis 10 andDEFAULT_BATCH_REQUEST_SIZEis 100. Increase these values, especially for openCypher and SPARQL which currently fire one request per type. Consider batching multiple types into a single query where the query language supports it.3. Consider lazy attribute fetching
Instead of fetching attribute details for all 70k types upfront, only fetch attributes for types the user actually interacts with (expands in the explorer, selects in data explorer, etc.). Store a lightweight type list initially and fetch full type configs on demand.
4. Optimize IndexedDB persistence for large schemas
With 70k types, the serialized schema can be ~43-86 MB. Deserialization on app startup blocks rendering. Consider chunked persistence or lazy loading of the schema from IndexedDB.
Related Issues