Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks on queries kill performance tests #456

Closed
elvis197686 opened this issue Apr 16, 2019 · 23 comments
Closed

Memory leaks on queries kill performance tests #456

elvis197686 opened this issue Apr 16, 2019 · 23 comments

Comments

@elvis197686
Copy link

elvis197686 commented Apr 16, 2019

Hello Redis,

Recently we have been running performance tests on Redis Graph, however these tests have kept failing because the Graph has run out of memory
(it either crashes or we get the redis.clients.jedis.exceptions.JedisDataException: OOM command not allowed when used memory > 'maxmemory error).
Our monitoring tools have shown us that this seems to be happening because memory is being allocated and never released.
We conducted some experiments to see whether we could easily reproduce memory leaks, and we have.

We have created a medium size DB (around a 1000 objects) using a series of commands like the ones given below.
There are a few levels in the heirarchy, which (for illustration purposes) are called TOP, MIDDLE and BOTTOM

Example insert commands:

MATCH (n:TOP { id: 'TOP_1' }) SET n.prop1=30000.0,n.prop2='A parent prop'

MERGE (:MIDDLE { id: 'MIDDLE_1' })
MATCH (n:MIDDLE { id: 'MIDDLE_1' }) SET n.prop1=20000.0,n.prop2='A child prop'
MATCH (parent:TOP), (child:MIDDLE) WHERE parent.id='TOP_1' AND child.id='MIDDLE_1' CREATE (parent)-[:HAS]->(child)

MERGE (:BOTTOM { id: 'BOTTOM_1' })
MATCH (n:BOTTOM { id: 'BOTTOM_1' }) SET n.sort='SORT_1',n.startDateTime=1234567890,n.activeCount=3
MATCH (parent:MIDDLE), (child:BOTTOM) WHERE parent.id='MIDDLE_1' AND child.id='BOTTOM_1' CREATE (parent)-[:HAS]->(child)

We then executed a large number of queries (>50,000) of the same format but with different criteria.
The queries were executed in parallel (about 50 a second)
An example query is shown below - the "N" is replaced by the current UTC (in seconds)

Example Query:
graph.query GRAPH_NAME "MATCH (:TOP {id: 'TOP_1'})-[:HAS]->(:MIDDLE)-[:HAS]->(e:BOTTOM) WHERE e.sort = 'BOTTOM_SORT_1' AND e.startDateTime > N AND e.activeCount > 0 RETURN e ORDER BY e.startDateTime, e.name SKIP 0 LIMIT 100"

By doing this we were able to make Redis Graph consistently run out of memory after about 20 minutes of these queries.
The graph below shows that the memory is allocated and never released.
Note that we have run this with sequential queries, and there did appear to be a memory leak but it wasn't as severe.

Memory Graph:

image

Version:
The latest RedisGraph docker image (as of yesterday)

@swilly22
Copy link
Collaborator

Thanks @elvis197686 for reporting
We'll be looking into this ASAP.

@swilly22 swilly22 added the bug label Apr 16, 2019
@swilly22 swilly22 self-assigned this Apr 16, 2019
@swilly22 swilly22 added this to ToDo in RedisGraph 2.0 via automation Apr 16, 2019
@swilly22 swilly22 moved this from ToDo to In progress in RedisGraph 2.0 Apr 16, 2019
@swilly22
Copy link
Collaborator

@elvis197686 fix just pushed to master branch, can you please verify?
Please note, for the OS to reclaim memory used by Redis use the MEMORY PURGE command.

@elvis197686
Copy link
Author

Thanks @swilly22 - we can test it but it will be a lot easier to test if we can use a docker image - is the edge docker image built from the master branch?

@styk-tv
Copy link

styk-tv commented Apr 23, 2019

Master is always autobuilt into edge (note commit id)
https://hub.docker.com/r/redislabs/redisgraph/builds

Screenshot 2019-04-23 at 10 36 01

from

Screenshot 2019-04-23 at 10 36 31

@elvis197686
Copy link
Author

Great, we'll give it a go

@swilly22
Copy link
Collaborator

Hi @elvis197686 did you had a chance to test the latest Master branch?

@elvis197686
Copy link
Author

Yes, we have re-run our performance tests with a build from the master branch, and they did complete.

However the amount of memory used still went up significantly. We are going to do more tests tomorrow, but I suspect it won't take much more for the memory limit to be exceeded.

BTW we have tried the MEMORY PURGE command from redis-cli but we got ERR command 'MEMORY PURGE' is not allowed. Not sure if that would help much anyway.

@swilly22
Copy link
Collaborator

@elvis197686, would it be possible to get your testing script(s) or a list of queries you're using?
Appreciate it.

@elvis197686
Copy link
Author

Yes, we will give more information to help reproduce the problem when we have completed the tests, but it might be a few days

@elvis197686
Copy link
Author

elvis197686 commented Apr 26, 2019

@swilly22 The test is the same as we wrote in the initial description: We did an initial ingestion of a linear DAG consisting of maybe 1000 nodes, then executed the same command for 30 minutes. It hasn't reached the end in any of our tests - in the last test we started getting the following errors about half way through:

{  
   "errors":[  
      ....
      "message":"Error executing command MATCH (s:TOP)-[:HAS]->(:MIDDL:E)-[:HAS]->(:BOTTOM) WHERE s.active=true RETURN DISTINCT s ORDER BY s.displayOrder; nested exception is redis.clients.jedis.exceptions.JedisDataException: OOM command not allowed when used memory > 'maxmemory'.",
      ....
   ]
}

And here is the graph showing the memory usage, etc. the test ran from around 13:45 to 14:00.
image

@elvis197686
Copy link
Author

Here is the info about the server:
image

@WitchsCat
Copy link

Had the same issue on our prod and dev cluster.

Read queries cause the amount of memory consumed to grow on master node & on slave nodes

@jeffreylovitz
Copy link
Contributor

Some parser tokens (strings provided in user queries) currently cause memory leaks, so this is an outstanding problem that becomes noticeable with high query volumes.

We're replacing our current parser entirely (#454), which I expect should be completed in about 3 weeks. Those leaks should not be an issue after the replacement, so I think that trying to patch the current one wouldn't be an effective use of time!

@github-actions
Copy link

Stale issue message

@jnsnjspr
Copy link

Taking over from elvis197686, we can confirm v2 m6 still has the memory leak (ever climbing memory usage) but to a lesser extent than v1.

This is using exactly the same details as elvis197686 has posted previously.

@K-Jo
Copy link
Collaborator

K-Jo commented Dec 13, 2019

@elvis197686 major patches have been done in #785 which is included in latest M07 and M08 releases. Could you please validate this again and if it's still leaking, send us the queries you run that are leaking?

@2ZZ
Copy link

2ZZ commented Dec 16, 2019

Hi @K-Jo, replying on behalf of @jnsnjspr

We are still seeing the memory leak in M08:
image

This is a sample of the queries: https://gist.githubusercontent.com/2ZZ/9c95d3cb423fb91b562a751c53a798b9/raw/3dd939df6ca9b51b51cf96264054695778941f78/redisgraph-memoryleak.log

@jeffreylovitz
Copy link
Contributor

Hey @2ZZ,

Thanks for the gist! That was very helpful.

I converted it into a script to reproduce your findings, and also observed the memory leak - https://gist.github.com/jeffreylovitz/59c2ee0b5bcd1423e8c85571fb9522d4#file-a_results-md .

I found that much of the memory could be reclaimed with the Redis command MEMORY PURGE, which indicates that jemalloc is storing some allocations for its own use rather than returning it to the system. This is fine provided that it does not become too overzealous - are you still able to trigger OOM errors on your server?

Some of the memory loss can't be accounted for by jemalloc, so we'll continue looking into this. If you have the opportunity, let us know how MEMORY PURGE affects your memory consumption!

@2ZZ
Copy link

2ZZ commented Jan 8, 2020

Hi @jeffreylovitz
We tried running the same test again with MEMORY PURGE being called every 10 minutes, unfortunately the result was the same and Redis stopped accepting commands once it hit its memory limit
image

@swilly22 swilly22 removed this from In progress in RedisGraph 2.0 Feb 4, 2020
@jeffreylovitz
Copy link
Contributor

We will resume testing on this using the sample queries provided in #456 (comment) .

What are the details of the server configuration, especially with regard to clustering and replication?

@2ZZ
Copy link

2ZZ commented Feb 5, 2020

Hi @jeffreylovitz

We're doing this in Kubernetes, there is a single pod with 1 Redis container and 1 Sentinel container. Both running redislabs/redisgraph:1.99.7

Redis config:

dir "/data"
maxmemory 8gb
maxmemory-policy noeviction
min-slaves-max-lag 5
min-slaves-to-write 0
rdbchecksum yes
rdbcompression yes
repl-diskless-sync yes
save 900 1

slave-announce-ip 10.100.134.31
slave-announce-port 6379

Sentinel config

sentinel myid b8054d39d0fa534b01d074bc2f36dd442cfe3afd
sentinel monitor redis-master 10.100.134.31 6379 1
dir "/data"
sentinel down-after-milliseconds redis-master 10000
sentinel failover-timeout redis-master 180000
sentinel parallel-syncs redis-master 5
sentinel announce-ip 10.100.134.31
sentinel announce-port 26379

@jesper-wh
Copy link

I can confirm the memory leak for our workflow as outlined by elvis197686 in the first post
(#456 (comment))
is fixed in v2.0.5

@jeffreylovitz
Copy link
Contributor

Thanks, @jesper-wh and all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants