Memory leaks on queries kill performance tests #456

elvis197686 · 2019-04-16T06:32:36Z

Hello Redis,

Recently we have been running performance tests on Redis Graph, however these tests have kept failing because the Graph has run out of memory
(it either crashes or we get the redis.clients.jedis.exceptions.JedisDataException: OOM command not allowed when used memory > 'maxmemory error).
Our monitoring tools have shown us that this seems to be happening because memory is being allocated and never released.
We conducted some experiments to see whether we could easily reproduce memory leaks, and we have.

We have created a medium size DB (around a 1000 objects) using a series of commands like the ones given below.
There are a few levels in the heirarchy, which (for illustration purposes) are called TOP, MIDDLE and BOTTOM

Example insert commands:

MATCH (n:TOP { id: 'TOP_1' }) SET n.prop1=30000.0,n.prop2='A parent prop'

MERGE (:MIDDLE { id: 'MIDDLE_1' })
MATCH (n:MIDDLE { id: 'MIDDLE_1' }) SET n.prop1=20000.0,n.prop2='A child prop'
MATCH (parent:TOP), (child:MIDDLE) WHERE parent.id='TOP_1' AND child.id='MIDDLE_1' CREATE (parent)-[:HAS]->(child)

MERGE (:BOTTOM { id: 'BOTTOM_1' })
MATCH (n:BOTTOM { id: 'BOTTOM_1' }) SET n.sort='SORT_1',n.startDateTime=1234567890,n.activeCount=3
MATCH (parent:MIDDLE), (child:BOTTOM) WHERE parent.id='MIDDLE_1' AND child.id='BOTTOM_1' CREATE (parent)-[:HAS]->(child)

We then executed a large number of queries (>50,000) of the same format but with different criteria.
The queries were executed in parallel (about 50 a second)
An example query is shown below - the "N" is replaced by the current UTC (in seconds)

Example Query:
graph.query GRAPH_NAME "MATCH (:TOP {id: 'TOP_1'})-[:HAS]->(:MIDDLE)-[:HAS]->(e:BOTTOM) WHERE e.sort = 'BOTTOM_SORT_1' AND e.startDateTime > N AND e.activeCount > 0 RETURN e ORDER BY e.startDateTime, e.name SKIP 0 LIMIT 100"

By doing this we were able to make Redis Graph consistently run out of memory after about 20 minutes of these queries.
The graph below shows that the memory is allocated and never released.
Note that we have run this with sequential queries, and there did appear to be a memory leak but it wasn't as severe.

Memory Graph:

Version:
The latest RedisGraph docker image (as of yesterday)

The text was updated successfully, but these errors were encountered:

swilly22 · 2019-04-16T08:59:17Z

Thanks @elvis197686 for reporting
We'll be looking into this ASAP.

swilly22 · 2019-04-17T15:20:48Z

@elvis197686 fix just pushed to master branch, can you please verify?
Please note, for the OS to reclaim memory used by Redis use the MEMORY PURGE command.

elvis197686 · 2019-04-23T09:33:10Z

Thanks @swilly22 - we can test it but it will be a lot easier to test if we can use a docker image - is the edge docker image built from the master branch?

styk-tv · 2019-04-23T09:37:07Z

Master is always autobuilt into edge (note commit id)
https://hub.docker.com/r/redislabs/redisgraph/builds

from

elvis197686 · 2019-04-23T09:42:54Z

Great, we'll give it a go

swilly22 · 2019-04-24T10:37:08Z

Hi @elvis197686 did you had a chance to test the latest Master branch?

elvis197686 · 2019-04-24T14:58:28Z

Yes, we have re-run our performance tests with a build from the master branch, and they did complete.

However the amount of memory used still went up significantly. We are going to do more tests tomorrow, but I suspect it won't take much more for the memory limit to be exceeded.

BTW we have tried the MEMORY PURGE command from redis-cli but we got ERR command 'MEMORY PURGE' is not allowed. Not sure if that would help much anyway.

swilly22 · 2019-04-24T17:49:42Z

@elvis197686, would it be possible to get your testing script(s) or a list of queries you're using?
Appreciate it.

elvis197686 · 2019-04-25T08:20:05Z

Yes, we will give more information to help reproduce the problem when we have completed the tests, but it might be a few days

elvis197686 · 2019-04-26T09:28:12Z

@swilly22 The test is the same as we wrote in the initial description: We did an initial ingestion of a linear DAG consisting of maybe 1000 nodes, then executed the same command for 30 minutes. It hasn't reached the end in any of our tests - in the last test we started getting the following errors about half way through:

{  
   "errors":[  
      ....
      "message":"Error executing command MATCH (s:TOP)-[:HAS]->(:MIDDL:E)-[:HAS]->(:BOTTOM) WHERE s.active=true RETURN DISTINCT s ORDER BY s.displayOrder; nested exception is redis.clients.jedis.exceptions.JedisDataException: OOM command not allowed when used memory > 'maxmemory'.",
      ....
   ]
}

And here is the graph showing the memory usage, etc. the test ran from around 13:45 to 14:00.

elvis197686 · 2019-04-26T09:35:24Z

Here is the info about the server:

WitchsCat · 2019-05-07T13:33:41Z

Had the same issue on our prod and dev cluster.

Read queries cause the amount of memory consumed to grow on master node & on slave nodes

jeffreylovitz · 2019-05-07T14:06:04Z

Some parser tokens (strings provided in user queries) currently cause memory leaks, so this is an outstanding problem that becomes noticeable with high query volumes.

We're replacing our current parser entirely (#454), which I expect should be completed in about 3 weeks. Those leaks should not be an issue after the replacement, so I think that trying to patch the current one wouldn't be an effective use of time!

github-actions · 2019-09-13T00:01:25Z

Stale issue message

jnsnjspr · 2019-11-28T16:41:58Z

Taking over from elvis197686, we can confirm v2 m6 still has the memory leak (ever climbing memory usage) but to a lesser extent than v1.

This is using exactly the same details as elvis197686 has posted previously.

K-Jo · 2019-12-13T17:59:35Z

@elvis197686 major patches have been done in #785 which is included in latest M07 and M08 releases. Could you please validate this again and if it's still leaking, send us the queries you run that are leaking?

2ZZ · 2019-12-16T12:29:26Z

Hi @K-Jo, replying on behalf of @jnsnjspr

We are still seeing the memory leak in M08:

This is a sample of the queries: https://gist.githubusercontent.com/2ZZ/9c95d3cb423fb91b562a751c53a798b9/raw/3dd939df6ca9b51b51cf96264054695778941f78/redisgraph-memoryleak.log

jeffreylovitz · 2019-12-17T15:57:45Z

Hey @2ZZ,

Thanks for the gist! That was very helpful.

I converted it into a script to reproduce your findings, and also observed the memory leak - https://gist.github.com/jeffreylovitz/59c2ee0b5bcd1423e8c85571fb9522d4#file-a_results-md .

I found that much of the memory could be reclaimed with the Redis command MEMORY PURGE, which indicates that jemalloc is storing some allocations for its own use rather than returning it to the system. This is fine provided that it does not become too overzealous - are you still able to trigger OOM errors on your server?

Some of the memory loss can't be accounted for by jemalloc, so we'll continue looking into this. If you have the opportunity, let us know how MEMORY PURGE affects your memory consumption!

2ZZ · 2020-01-08T16:14:09Z

Hi @jeffreylovitz
We tried running the same test again with MEMORY PURGE being called every 10 minutes, unfortunately the result was the same and Redis stopped accepting commands once it hit its memory limit

jeffreylovitz · 2020-02-05T13:57:39Z

We will resume testing on this using the sample queries provided in #456 (comment) .

What are the details of the server configuration, especially with regard to clustering and replication?

2ZZ · 2020-02-05T14:52:38Z

Hi @jeffreylovitz

We're doing this in Kubernetes, there is a single pod with 1 Redis container and 1 Sentinel container. Both running redislabs/redisgraph:1.99.7

Redis config:

dir "/data"
maxmemory 8gb
maxmemory-policy noeviction
min-slaves-max-lag 5
min-slaves-to-write 0
rdbchecksum yes
rdbcompression yes
repl-diskless-sync yes
save 900 1

slave-announce-ip 10.100.134.31
slave-announce-port 6379

Sentinel config

sentinel myid b8054d39d0fa534b01d074bc2f36dd442cfe3afd
sentinel monitor redis-master 10.100.134.31 6379 1
dir "/data"
sentinel down-after-milliseconds redis-master 10000
sentinel failover-timeout redis-master 180000
sentinel parallel-syncs redis-master 5
sentinel announce-ip 10.100.134.31
sentinel announce-port 26379

jesper-wh · 2020-03-10T15:39:19Z

I can confirm the memory leak for our workflow as outlined by elvis197686 in the first post
(#456 (comment))
is fixed in v2.0.5

jeffreylovitz · 2020-03-10T15:55:36Z

Thanks, @jesper-wh and all!

swilly22 added the bug label Apr 16, 2019

swilly22 self-assigned this Apr 16, 2019

swilly22 added this to ToDo in RedisGraph 2.0 via automation Apr 16, 2019

swilly22 moved this from ToDo to In progress in RedisGraph 2.0 Apr 16, 2019

jeffreylovitz mentioned this issue Apr 17, 2019

Memory leaks #458

Merged

github-actions bot added the no-issue-activity label Sep 13, 2019

mdecuir mentioned this issue Nov 15, 2019

Memory leaks when running queries with aliases #745

Closed

swilly22 removed this from In progress in RedisGraph 2.0 Feb 4, 2020

jeffreylovitz closed this as completed Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leaks on queries kill performance tests #456

Memory leaks on queries kill performance tests #456

elvis197686 commented Apr 16, 2019 •

edited

swilly22 commented Apr 16, 2019

swilly22 commented Apr 17, 2019

elvis197686 commented Apr 23, 2019

styk-tv commented Apr 23, 2019 •

edited

elvis197686 commented Apr 23, 2019

swilly22 commented Apr 24, 2019

elvis197686 commented Apr 24, 2019

swilly22 commented Apr 24, 2019

elvis197686 commented Apr 25, 2019

elvis197686 commented Apr 26, 2019 •

edited

elvis197686 commented Apr 26, 2019

WitchsCat commented May 7, 2019

jeffreylovitz commented May 7, 2019

github-actions bot commented Sep 13, 2019

jnsnjspr commented Nov 28, 2019

K-Jo commented Dec 13, 2019

2ZZ commented Dec 16, 2019

jeffreylovitz commented Dec 17, 2019

2ZZ commented Jan 8, 2020

jeffreylovitz commented Feb 5, 2020

2ZZ commented Feb 5, 2020

jesper-wh commented Mar 10, 2020

jeffreylovitz commented Mar 10, 2020

Memory leaks on queries kill performance tests #456

Memory leaks on queries kill performance tests #456

Comments

elvis197686 commented Apr 16, 2019 • edited

swilly22 commented Apr 16, 2019

swilly22 commented Apr 17, 2019

elvis197686 commented Apr 23, 2019

styk-tv commented Apr 23, 2019 • edited

elvis197686 commented Apr 23, 2019

swilly22 commented Apr 24, 2019

elvis197686 commented Apr 24, 2019

swilly22 commented Apr 24, 2019

elvis197686 commented Apr 25, 2019

elvis197686 commented Apr 26, 2019 • edited

elvis197686 commented Apr 26, 2019

WitchsCat commented May 7, 2019

jeffreylovitz commented May 7, 2019

github-actions bot commented Sep 13, 2019

jnsnjspr commented Nov 28, 2019

K-Jo commented Dec 13, 2019

2ZZ commented Dec 16, 2019

jeffreylovitz commented Dec 17, 2019

2ZZ commented Jan 8, 2020

jeffreylovitz commented Feb 5, 2020

2ZZ commented Feb 5, 2020

jesper-wh commented Mar 10, 2020

jeffreylovitz commented Mar 10, 2020

elvis197686 commented Apr 16, 2019 •

edited

styk-tv commented Apr 23, 2019 •

edited

elvis197686 commented Apr 26, 2019 •

edited