Skip to content
This repository has been archived by the owner on Jun 9, 2024. It is now read-only.

Octo-RPKI process RAM usage #37

Closed
spinoshi opened this issue Mar 22, 2020 · 15 comments
Closed

Octo-RPKI process RAM usage #37

spinoshi opened this issue Mar 22, 2020 · 15 comments
Assignees

Comments

@spinoshi
Copy link

spinoshi commented Mar 22, 2020

Hello folks,
more than an issue, a question.
We are running octo-rpki for a couple of months, and our monitoring system is warning us about RAM usage. We have a VM with 8GB of RAM + 1GB swap, and most of the time, the situation is this:

              total        used        free      shared  buff/cache   available
Mem:           8017        2952        4101           0         963        4966
Swap:          1023          30         993

But sometimes octo-rpki process starts using all the available memory: so the free memory is just few hundred MB and we get the alert. Is anyone having the same situation? Any hint?
We use:

 dpkg -l | grep RPKI
ii  gortr                                 0.11.4                                          amd64        GoRTR: a RPKI-to-Router server
ii  octorpki                              1.1.4                                           amd64        OctoRPKI: a RPKI validator

on Ubuntu 18.04.
Thanks!

@lspgn lspgn self-assigned this Apr 6, 2020
@lspgn
Copy link
Contributor

lspgn commented Apr 6, 2020

@spinoshi thank you for your report. This indeed seems strange.
Could you sent me a pprof? (at the moment I don't have an easier way for debugging)
On line:
https://github.com/cloudflare/cfrpki/blob/master/cmd/ctrpki/ctrpki.go#L19
add

_ "net/http/pprof"

Then sharing a few profiles http://localhost:8080/debug/pprof/heap at startup and if the RAM usage becomes too high.

@tias77
Copy link

tias77 commented Jun 23, 2020

I'm also seeing heavy memory usage in octorpki compared to other solutions. Freshly started octorpki consumes ~2GB and the usage just increases from there. After a week or two it touches 4GB. Other servers (routinator and fort) uses less than 100MB steadily.

@nward
Copy link

nward commented Jun 29, 2020

We're trialing octorpki as well - will get some pprof data and share.

@nward
Copy link

nward commented Jun 29, 2020

Also worth noting - we're tracking gortr keeping open a lots of connections to octorpki - not sure if this is the source of this memory issue. @spinoshi do you see lots of stale gortr > octorpki connections kept open?

@lspgn
Copy link
Contributor

lspgn commented Jun 29, 2020

@nward are you using the latest version of GoRTR? I'll check if it's correctly closing.

@nward
Copy link

nward commented Jun 29, 2020

yep - 0.14.4

@lspgn
Copy link
Contributor

lspgn commented Jun 30, 2020

I double checked, was not properly closing the connections.
Could you give a try to cloudflare/gortr#72

@MorningLightMountain713
Copy link
Contributor

MorningLightMountain713 commented Jul 8, 2020

@lspgn We have applied the fix for gortr and this has fixed the issue with multiple open connections between gortr and octorpki.

However we are still seeing increasing memory usage with octorpki. I have attached a few pprof profiles. The top command shows memory usage is increasing steadily, res started out around 1.7g and after approx 18 hours has increased to 1.9g.

Please note the attached heaps.tar.zip is just a tarfile - added .zip so github would accept it.

heaps.tar.zip

@lspgn
Copy link
Contributor

lspgn commented Jul 8, 2020

Thank you, I will have a look.

@MorningLightMountain713
Copy link
Contributor

Continuing on from the above, now sitting at 3.2g usage for octorpki process. Here is another snap.

heap_140720.txt

@jtkristoff
Copy link

I've seen what may be a related problem. I'll try to do some additional monitoring as well. I'll include some output I do have from an earlier out of memory crash if it is of any help:

[...]
INFO[1909730] Rsync sync rsync://rpki.ripe.net/ta/ripe-ncc-ta.cer
INFO[1909732] Rsync sync rsync://rpkica.mckay.com/rpki/MCnet/
INFO[1909733] Rsync sync rsync://cc.rg.net/rpki/RGnet-cc/
ERRO[1909735] Error adding Resource 74E1D3EE651311EA83ECEE54F8AEA228.roa: Could 
not validate certificate due to expiration date 157b90727d015d070cc1f977bdbce687
f5344d9c: Certificate end of validity: 2020-03-13 10:14:36 +0000 UTC is before: 
2020-05-31 19:56:21.65687108 +0000 UTC
ERRO[1909743] Error exploring file: open cache/rpki.cnnic.cn/rpki/A9162E3D0000/7
3/CfUv1bVUg5EXd8PcpEpl08lfhYA.mft: no such file or directory
ERRO[1909743] Error exploring file: open cache/rpki.cnnic.cn/rpki/A9162E3D0000/6
43/PgsdlIaQ7Nxy4-Rg5eY-3pU8JOg.mft: no such file or directory
ERRO[1909743] Error exploring file: open cache/rpki.cnnic.cn/rpki/A9162E3D0000/4
23/Xb2Ek-hXQR0lkZ2GO7t9n_zfQCY.mft: no such file or directory
ERRO[1909743] Error adding Resource HCA8w1r7tOjORLbFUIT2dgPwbNE.roa: Certificate
 was revoked by issuer 1c203cc35afbb4e8ce44b6c55084f67603f06cd1
ERRO[1909743] Error adding Resource I4H8sMngqZ2Zy-SwjFrX1rxaW1s.roa: Certificate
 was revoked by issuer 2381fcb0c9e0a99d99cbe4b08c5ad7d6bc5a5b5b
ERRO[1909743] Error adding Resource O9PqoAj_WzgETD_1WF_9bOBHh0E.roa: Certificate
 was revoked by issuer 3bd3eaa008ff5b38044c3ff5585ffd6ce0478741
ERRO[1909743] Error adding Resource OvLhPkEnHO_zJk5Wu0H9pSKWm5k.roa: Certificate
 was revoked by issuer 3af2e13e41271ceff3264e56bb41fda522969b99
ERRO[1909743] Error adding Resource aET8SccSusg_odabkY5k2oHFoNs.roa: Certificate
 was revoked by issuer 6844fc49c712bac83fa1d69b918e64da81c5a0db
ERRO[1909743] Error adding Resource beRcMl46Qsx8jDL5WyPT5Mh9yuU.roa: Certificate
 was revoked by issuer 6de45c325e3a42cc7c8c32f95b23d3e4c87dcae5
ERRO[1909743] Error adding Resource c590Zx7vUgKibBkMpQqjGX-wLwY.roa: Certificate
 was revoked by issuer 739f74671eef5202a26c190ca50aa3197fb02f06
ERRO[1909743] Error adding Resource pky2i1jkAIp2zsZSfhGKOBYIGW4.roa: Certificate
 was revoked by issuer a64cb68b58e4008a76cec6527e118a381608196e
ERRO[1909743] Error adding Resource yiDl9SIs5EoV--4SfiNjC8jgo3A.roa: Certificate
 was revoked by issuer ca20e5f5222ce44a15fbee127e23630bc8e0a370
ERRO[1909768] Error exploring file: open cache/rpkica.mckay.com/rpki/MCnet/UEh2S
AvdIgPsUFdv92RSSaNqBnY.mft: no such file or directory
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x99778c, 0x16)
        /usr/lib/go-1.11/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc1c8000000, 0x4000000, 0xddbab8)
        /usr/lib/go-1.11/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0xdc2660, 0x4000000, 0xdc2678, 0x7f8bd4c89e68)
        /usr/lib/go-1.11/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0xdc2660, 0x1, 0x0)  
        /usr/lib/go-1.11/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0xdc2660, 0x1, 0xddbac8, 0x400)
        /usr/lib/go-1.11/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0xdc2660, 0x1, 0x10, 0x7f8bd2fd0fff)
        /usr/lib/go-1.11/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
        /usr/lib/go-1.11/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0xdc2660, 0x1, 0x7f8bd2010010, 0x7f8bd4c89e68)
        /usr/lib/go-1.11/src/runtime/mheap.go:758 +0x8a
runtime.(*mcentral).grow(0xdc3d98, 0x0)
        /usr/lib/go-1.11/src/runtime/mcentral.go:232 +0x94
runtime.(*mcentral).cacheSpan(0xdc3d98, 0x1fe)
        /usr/lib/go-1.11/src/runtime/mcentral.go:106 +0x2f8
runtime.(*mcache).refill(0x7f8c2ed2ad80, 0xc000036a10)
        /usr/lib/go-1.11/src/runtime/mcache.go:122 +0x95
runtime.(*mcache).nextFree.func1()
        /usr/lib/go-1.11/src/runtime/malloc.go:749 +0x32
runtime.systemstack(0xc00007d200)
        /usr/lib/go-1.11/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
        /usr/lib/go-1.11/src/runtime/proc.go:1229

goroutine 1 [running]:
runtime.systemstack_switch()
        /usr/lib/go-1.11/src/runtime/asm_amd64.s:311 fp=0xc1c5fc5698 sp=0xc1c5fc5690 pc=0x457880
runtime.(*mcache).nextFree(0x7f8c2ed2ad80, 0x10, 0x68, 0x8ce920, 0x428f00)
        /usr/lib/go-1.11/src/runtime/malloc.go:748 +0xb6 fp=0xc1c5fc56f0 sp=0xc1c5fc5698 pc=0x40cea6
runtime.mallocgc(0x70, 0x8ce920, 0xa12a01, 0x11)
        /usr/lib/go-1.11/src/runtime/malloc.go:903 +0x793 fp=0xc1c5fc5790 sp=0xc1c5fc56f0 pc=0x40d7f3
runtime.makeslice(0x8ce920, 0x7, 0x7, 0x1, 0x6, 0x7)
        /usr/lib/go-1.11/src/runtime/slice.go:70 +0x77 fp=0xc1c5fc57c0 sp=0xc1c5fc5790 pc=0x442947
strings.genSplit(0xc07dc8cb48, 0x95, 0x98f846, 0x1, 0x0, 0x7, 0xc1c3c39448, 0xc1c3c025a0, 0xc1c3c39430)
        /usr/lib/go-1.11/src/strings/strings.go:251 +0x6b fp=0xc1c5fc5820 sp=0xc1c5fc57c0 pc=0x4e515b
strings.Split(0xc07dc8cb48, 0x95, 0x98f846, 0x1, 0xc1c3c02628, 0x1, 0x7)
        /usr/lib/go-1.11/src/strings/strings.go:303 +0x5b fp=0xc1c5fc5878 sp=0xc1c5fc5820 pc=0x4e54eb
github.com/cloudflare/cfrpki/sync/lib.AddInMap(0xc07dc8cb40, 0x9d, 0xc0580de660)
        /home/jtk/go/src/github.com/cloudflare/cfrpki/sync/lib/utils.go:25 +0x7c fp=0xc1c5fc5900 sp=0xc1c5fc5878 pc=0x727c1c
main.(*state).MainReduce(0xc000106c60, 0x6c8f06430a2f1)
        /home/jtk/go/src/github.com/cloudflare/cfrpki/cmd/octorpki/octorpki.go:271 +0x275 fp=0xc1c5fc5b70 sp=0xc1c5fc5900 pc=0x86d555
main.main()
        /home/jtk/go/src/github.com/cloudflare/cfrpki/cmd/octorpki/octorpki.go:990 +0x111d fp=0xc1c5fc5f98 sp=0xc1c5fc5b70 pc=0x8763fd
runtime.main()
        /usr/lib/go-1.11/src/runtime/proc.go:201 +0x207 fp=0xc1c5fc5fe0 sp=0xc1c5fc5f98 pc=0x42e7f7
runtime.goexit()
        /usr/lib/go-1.11/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc1c5fc5fe8 sp=0xc1c5fc5fe0 pc=0x459961

goroutine 24 [IO wait, 31829 minutes]:
internal/poll.runtime_pollWait(0x7f8c2a2b9f00, 0x72, 0x0)
        /usr/lib/go-1.11/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000238018, 0x72, 0xc00006c000, 0x0, 0x0)
        /usr/lib/go-1.11/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000238018, 0xffffffffffffff00, 0x0, 0x0)
        /usr/lib/go-1.11/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000238000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/lib/go-1.11/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc000238000, 0xc00007a000, 0x30, 0x30)
        /usr/lib/go-1.11/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00000e070, 0x50, 0x7f8c2ed2a000, 0x0)
        /usr/lib/go-1.11/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).AcceptTCP(0xc00000e070, 0x40dbd8, 0x30, 0x94c100)
        /usr/lib/go-1.11/src/net/tcpsock.go:247 +0x47
net/http.tcpKeepAliveListener.Accept(0xc00000e070, 0x94c100, 0xc00001e780, 0x8f50e0, 0xdabca0)
        /usr/lib/go-1.11/src/net/http/server.go:3232 +0x2f
net/http.(*Server).Serve(0xc0001d60d0, 0xa1e3c0, 0xc00000e070, 0x0, 0x0)
        /usr/lib/go-1.11/src/net/http/server.go:2826 +0x22f
net/http.(*Server).ListenAndServe(0xc0001d60d0, 0xc0001d60d0, 0xc00000c4e0)
        /usr/lib/go-1.11/src/net/http/server.go:2764 +0xb6
net/http.ListenAndServe(0x7fffa110f2e2, 0xe, 0x0, 0x0, 0xc00000c4e0, 0xc00001e600)
        /usr/lib/go-1.11/src/net/http/server.go:3004 +0x74
main.(*state).Serve(0xc000106c60, 0x7fffa110f2e2, 0xe, 0x992aba, 0xb, 0x99143b, 0x8, 0x99089d, 0x6, 0x98f841, ...)
        /home/jtk/go/src/github.com/cloudflare/cfrpki/cmd/octorpki/octorpki.go:822 +0x427
created by main.main
        /home/jtk/go/src/github.com/cloudflare/cfrpki/cmd/octorpki/octorpki.go:932 +0x1c58

@lspgn
Copy link
Contributor

lspgn commented Jul 31, 2020

Would you be able to give a try to 1.2.0-prerelease?
Internally, I am observing 1.2GB and less fluctuations in RAM.

@jtkristoff
Copy link

I just installed and started running that yesterday. Will report back if I see something.

@MorningLightMountain713
Copy link
Contributor

Screenshot 2020-08-04 09 01 31

This shows version v1.1.4 vs v.1.2.0_pre - change was made 7/31. Memory usage is looking much better.

@lspgn
Copy link
Contributor

lspgn commented Oct 29, 2020

Thank you for testing. The new version has been running in our staging environment for a while and I released it today.
Feel free to reopen the ticket if the issue persists.

@lspgn lspgn closed this as completed Oct 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants