Skip to content

Performance issue #2187

Open
Open
@xephtar

Description

@xephtar

🔍 Performance Optimization for Multi-Hop Traversal in Apache AGE

Context:

We are currently using Apache AGE and have the following graph structure:

(:A)-[:HAS_Y]->(:Y)

(:A)-[:HAS_Z]->(:Z)

(:A)-[:HAS_D]->(:D)

(similar for 7 relation types and 8 node types total)

Our typical traversal pattern in Neo4j was:

MATCH (n:Y {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50
UNION ALL
MATCH (n:Z {property_example: 123})-[r*..4]-(d:A)
RETURN d.property_found AS property_found
LIMIT 50

We expect:

~500 million nodes

3–4x that number in relationships

Question:

What kind of indexing strategy or query optimization would you recommend in Apache AGE for improving the performance of multi-hop traversal queries like [*..4]?

Any guidance or best practices for:

Node property indexing

Relationship indexing (e.g., start_id, end_id)

Traversal optimizations would be highly appreciated.

Current Setup:

We currently have:

Indexes on all relevant node properties

start_id and end_id indexes on all relationships

Sample test data:

~27 million vertices

~23 million edges

Query example:

SELECT d
FROM ag_catalog.cypher('user_unification', $$
    MATCH (n:Y) WHERE n.value = 'a0de44c7fc8cb783'
    MATCH (n)-[*..2]-(d:A)
    RETURN d
$$) as (d ag_catalog.agtype);

Execution time:

For [*..2]: ~30 seconds

For [*..4]: >150 seconds (often fails to complete)

Expected execution time: ≤10 ms for [*..2]

Any suggestions or feedback from the AGE team would be incredibly helpful. Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions