Description
🔍 Performance Optimization for Multi-Hop Traversal in Apache AGE
Context:
We are currently using Apache AGE and have the following graph structure:
(:A)-[:HAS_Y]->(:Y)
(:A)-[:HAS_Z]->(:Z)
(:A)-[:HAS_D]->(:D)
(similar for 7 relation types and 8 node types total)
Our typical traversal pattern in Neo4j was:
MATCH (n:Y {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50
UNION ALL
MATCH (n:Z {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50
We expect:
~500 million nodes
3–4x that number in relationships
Question:
What kind of indexing strategy or query optimization would you recommend in Apache AGE for improving the performance of multi-hop traversal queries like [*..4]?
Any guidance or best practices for:
Node property indexing
Relationship indexing (e.g., start_id, end_id)
Traversal optimizations would be highly appreciated.
Current Setup:
We currently have:
Indexes on all relevant node properties
start_id and end_id indexes on all relationships
Sample test data:
~27 million vertices
~23 million edges
Query example:
SELECT d
FROM ag_catalog.cypher('user_unification', $$
MATCH (n:Y) WHERE n.value = 'a0de44c7fc8cb783'
MATCH (n)-[*..2]-(d:A)
RETURN d
$$) as (d ag_catalog.agtype);
Execution time:
For [*..2]: ~30 seconds
For [*..4]: >150 seconds (often fails to complete)
Expected execution time: ≤10 ms for [*..2]
Any suggestions or feedback from the AGE team would be incredibly helpful. Thanks in advance!