-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CacheKV has no size constraint which would cause memory leak #13834
Comments
Same issue. |
Thanks for opening the issue. We will look into this. |
normally, the CacheKV should be resetted at commit event? |
shares the heap profile we catched before and after the memory jump if it helps. By substracting the two profiles we can see the large part is on iavl.MakeNode, I think it ought to be garbage collected but not quite sure why it isn't. My rough triaging came out that it's the cachekv that holds the reference to the node.
|
cachekv caches the key values directly, |
@yihuang we set it to 15625000, I got that the default capacity 781250 occupies around 50mb in memory, so by calculation 15625000 corresponds 1GB I think |
One case comes to mind where cachekv cache won't be cleared is grpc-only, where there's no abci event to reset the cache. |
Are you running |
We're not running with |
How did you subtract the profiles? And how do you open the heap file, I tried to open in browser but can't display the complete picture. |
After extracting the zip file, can run this command
to get the heap diff |
|
Oh seems I uploaded an wrong file, please refer to this one: |
there's only 100M created by |
The 100M+ is a sampled result of golang's heap profiler, we can look at the proportions.
yes it is, the point I think is there's no size limit on the loaded cold data, there're two layers of cache, iavl layer and the cachekv layer, |
Agree, the only limit is it's reset at each block cycle, but if there are a surge of query requests happens between that, it could cause a surge of memory usage. |
is the cachekv reset guranteed to run? if so it should be frequent since new block comes every several seconds, while it looks doesn't reset in our case since the memory just rises without fall. |
Yes, the cache is cleared on |
@cifer76 are you still running into this issue? the last comment eludes it could be something else |
Summary of Bug
We have an archive node, it's memory usage is constantly rising and never fall. The memory rises a lot especially when we make rpc calls to it for history state.
we made heap profile and looked into the code and found there's a cachekv layer which hold the pointer to the underlying iavl node. the cachekv uses a
map
to hold the pointer and the issue here is there's no other constraints on the map to limit how many iavl nodes it can hold.For a pruned node, iavl node keeps being pruned as the block number grows, while for an archive node, the iavl nodes are never pruned, so as the block grows, the iavl nodes loaded into memory(loaded by block sync, rpc calls etc...) will stay there permanently, they won't be GCed because cachekv also holds pointers to them permanently.
Version
v0.8.2
The text was updated successfully, but these errors were encountered: