-
Notifications
You must be signed in to change notification settings - Fork 479
Description
Describe the bug
A severe performance degradation is observed when executing queries that use the @> operator.
Importantly: the issue manifests across all workloads (not just SELECT / reads, but also during inserts and updates). Any operation that uses MATCH with property filtering (e.g., MATCH (n {id: ...}) CREATE ...) invokes this operator under the hood and suffers from this overhead.
Flame graph analysis revealed that the performance drop occurs during the query planning phase, which now consumes about 30% of the execution time within agtype_contains. The root cause lies in the altered definition of the @> operator (as well as the containment and existence operators) in AGE version 18. Specifically, the selectivity estimation functions (RESTRICT and JOIN) were replaced. The new matchingsel function provides more accurate estimates based on statistics, but due to its complex internal logic, it takes significantly longer to execute than the old, lightweight contsel.
For simple point queries (e.g., finding a node by ID) executed thousands of times per second, this "accurate" estimation yields no execution benefit but creates a critical parasitic load during query planning.
How are you accessing AGE (Command line, driver, etc.)?
psql (Command line)
What data setup do we need to do?
To reproduce the issue, I have prepared a set of SQL scripts (attached to the report).
- Graph creation and population:
Thesfparameter here defines the scale factor of the generated graph.
psql -d your_database -f generate_graph.sql -v sf=1
generate_graph.sql
- Creating wrapper functions for the workload:
This script sets up the environment and creates functions with the correct parameter mapping for Apache AGE.
psql -d your_database -f setup_func_for_workload.sql
setup_func_for_workload.sql
- Running the load test:
Running 8 clients for 120 seconds.
pgbench -d your_database -f workload_select.sql -D sf=1 -c 8 -T 120 -P 5
workload_select.sql
What is the necessary configuration info needed?
Standard PostgreSQL configuration.
What is the command that caused the error?
In this case, it is not a parsing error, but a planner issue. The slowdown occurs on any simple query with property filtering (Cypher translates this to the @> operator under the hood):
EXPLAIN ANALYZE
SELECT * FROM cypher('snb_graph', $$
MATCH (m:Post {id: 123})
RETURN m.creationDate, coalesce(m.content, m.imageFile, 'No content')
UNION ALL
MATCH (m:Comment {id: 123})
RETURN m.creationDate, coalesce(m.content, m.imageFile, 'No content')
$$) AS (
creationDate agtype,
content agtype
);Expected behavior
Planning for simple point queries is expected to be nearly instantaneous, without significant overhead. Reverting to the lightweight selectivity estimation functions (or optimizing matchingsel for cases where gathering deep statistics is impractical) should restore performance to the AGE 17 version level.
Environment (please complete the following information):
- PostgreSQL Version: 18
- AGE Version: 18
Additional context
In the operator definition, RESTRICT specifies the function that estimates operator selectivity in a standard WHERE clause, while JOIN does so for join conditions. Replacing contsel with matchingsel forces complex statistics-gathering logic to be invoked during the planning of every minor query, which destroys TPS.
Comparison of the @> operator DDL across versions:
AGE Version 17 (Fast planning):
CREATE OPERATOR @> (
LEFTARG = agtype,
RIGHTARG = agtype,
FUNCTION = ag_catalog.agtype_contains,
COMMUTATOR = '<@',
RESTRICT = contsel, -- Standard, lightweight function with fixed selectivity
JOIN = contjoinsel
);AGE Version 18 (Planning degradation):
CREATE OPERATOR @> (
LEFTARG = agtype,
RIGHTARG = agtype,
FUNCTION = ag_catalog.agtype_contains,
COMMUTATOR = '<@',
RESTRICT = matchingsel, -- More accurate, but extremely "heavy" function
JOIN = matchingjoinsel
);Evidence and Testing:
1. FlameGraph Analysis
As seen in the FlameGraphs below, the planning phase in agtype_contains consumes a massive portion of CPU time:
- Version using
matchingsel:
- Version using
contsel:
2. TPS Comparison Charts:
The TPS charts below demonstrate the difference between using matchingsel and contsel. Throughput drops critically with the new selectivity estimation:
- Version using
matchingsel:
- Version using
contsel:
3. Reverting the logic (Proof of Concept)
We verified this hypothesis in practice. If matchingsel is manually replaced back with contsel in the 18th version's code, the performance degradation completely disappears.