Skip to content

Performance issue when upserting 10k+ edges using MERGE #2177

Open
@Yashmitha1

Description

@Yashmitha1

Hi, I am trying to insert/update around 70K edges into the graph using age.
I created the following function:

CREATE OR REPLACE FUNCTION create_update_edge(edge_list TEXT)
RETURNS void AS $$
DECLARE
  query TEXT;
BEGIN
  query := format($q$
    SELECT * FROM ag_catalog.cypher(
      'test_attribute_instance_graph',
      $cypher$
        UNWIND %s AS edge
        MERGE (a:attribute_instance {name: edge.source_name, value: edge.source_value})-[r:cooccurs_with]-(b:attribute_instance {name: edge.target_name, value: edge.target_value})
        SET r.count = edge.count
        SET r.first_seen = edge.first_seen
        SET r.last_seen = edge.last_seen
          
      $cypher$
    ) AS (ignored agtype);
  $q$, edge_list);

  EXECUTE query;

  RAISE NOTICE 'Edges merged and properties set.';
END;
$$ LANGUAGE plpgsql;

These are some of the points I had to take care of :

  1. Function signature of ag_catalog.cyphe is (name, cstring, agtype)) So, I was not directly able to pass JSONB format as a parameter, and had to change the list of edges(each data is mapped to a dict) to a Cypher-compatible map(keys are unquoted) before passing it to the above function.
  2. Trying to use MERGE..ON CREATE/ON MATCH SET gave syntax error, so I simply used MERGE SET to do the upsertion of edges.
  3. Trying to B-tree indexing was a failed attempt, as again I was coming across a syntax error using the standard CREATE INDEX for B-tree command, so I used GIN Index.

Under this setup and batch unwind (batch size 1000 edges): I got the following results

Edges Inserted Time (s) Time (min)
1,000 0.62 0.01
2,000 16.35 0.27
3,000 22.42 0.37
4,000 69.63 1.16
5,000 104.75 1.75
6,000 177.12 2.95
7,000 447.66 7.46
8,000 598.38 9.97
9,000 913.82 15.23
10,000 1320.22 22.00

Clearly, the performance is pretty bad, so I was wondering if this is a limitation of age or a wrong implementation on my end. If someone could give me insights on my method and any other ways I should try inserting the data/or optimise the current method, it'd be of great help!
Thankyou!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions