Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on raw vrrp socket ? #839

Closed
Turgon37 opened this issue Apr 17, 2018 · 28 comments
Closed

Memory leak on raw vrrp socket ? #839

Turgon37 opened this issue Apr 17, 2018 · 28 comments

Comments

@Turgon37
Copy link

Hello, I'm encountering an issue which seems similar to #658
In our infrastructure we have two node with keepalived installed

The configuration is simple with just one virutal ip address share between these two nodes

vrrp_instance VRID_22 {
  interface eth0
  state SLAVE
  virtual_router_id 22
  priority 100
  track_script {
    check_fluentd
  }
  virtual_ipaddress {
    10.0.0.1 dev eth0
  }
}

We are noticing that the slave/backup node (the one which have not the vrrp ip address), is suffering of a memory leak. For a very unknown reason, the server used memory is growing up during some hours until the memory is fully used.
As you can see on our metrology system, the memory is simply growning

memory_overview_zomm
memory_overview

So by analysing the processus whose are currently running and their resident memory size we cannot find "the" guilty process (see our ps_trace.txt here)

But (as I was suspicious about keepalived), I choose to restart the service,

root@node02 ~ # ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }'
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port   
raw    UNCONN     37749184 0                    *:vrrp                  *:*       

root@node02 ~ # free -h
             total       used       free     shared    buffers     cached
Mem:          2.0G       1.9G        79M        40M       1.8M        81M
-/+ buffers/cache:       1.8G       163M

root@node02 ~ # systemctl restart keepalived

root@node02 ~ # free -h
             total       used       free     shared    buffers     cached
Mem:          2.0G       481M       1.5G        40M       3.3M        87M
-/+ buffers/cache:       390M       1.6G

root@node02 ~ # ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }'
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port   
raw    UNCONN     3520   0                    *:vrrp                  *:*

And then it reveal that it is keepalived which is responsible of this leak. Please note that we have tuned the input socket buffer size using sysctl (for another reason), and in facts according to my test, the memory leak happen only with these tweaks. Under Linux defaults values the problem do not happen

{code}
net.core.rmem_default = 37748736
net.core.rmem_max = 37748736
{code}

I'm quite sure that the backup keepalived node is not reading it's input vrrp socket, It is visible regarding to the input socket size (see ss command output above)

Have you any idea to help us to fix this problems ?

Regards

@pqarmitage
Copy link
Collaborator

This isn't a memory leak, since the memory is recovered when keepalived is restarted. It does indeed appear that keepalived is not reading the received adverts.

It is fascinating that the problem seems to only occur for you when the input buffer socket size is set to 37748736. I have tried this setting on my system and I don't experience the problem that you are experiencing.

I have attached a patch that sets the vrrp receive and send socket buffer sizes to 212992 (the default on my system). Can you try applying the patch and see if that resolves the problem you are experiencing. If it does, then we will need to have a think about how to best resolve the problem, although it doesn't look to me as though it is a keepalived problem, but more to do with the kernel.

839.patch.txt

@pqarmitage
Copy link
Collaborator

This won't make a difference to your problem but state SLAVE should be state BACKUP.

@Turgon37
Copy link
Author

@pqarmitage thanks for your note about the state configuration value, I've seen this mistake but according to the keepalived source code it is not a "real mistake". In fact I've found why this mistake was not seen before, in cause the missing else statment here

log_message(LOG_INFO,"(%s): unknown state '%s', defaulting to BACKUP", vrrp->iname, str);

@Turgon37
Copy link
Author

But about your message above, I disagree with you, the linux kernel flush any allocated memory on program shutdown. So tipically, this is what I call memory leak, an increasing unhandled memory consumption that is generated during the program running. So in this case, it effectively appear to come from the kernel space, because this amount of memory cannot be reported by the information given by ps utils and to be caused by the input socket buffer.

So thanks for the patch, I will compile it and test it in asap.

@pqarmitage
Copy link
Collaborator

@Turgon37 Re #839 (comment) I think the code is correct, and there isn't a missing else. The parameters that are supported with the state keyword are MASTER and BACKUP; if something other than one of those is specified, e.g. SLAVE, it correctly logs that the state (e.g. SLAVE) is unknown, and that it is defaulting to BACKUP.

@pqarmitage
Copy link
Collaborator

@Turgon37 Re #839 (comment) what we need to understand is why keepalived is not reading from the socket when rmem_default is set to the large value you are using. What you have shown above is that keepalived has allowed the receive socket buffers to completely fill (the receive queue size is slightly larger than rmem_max). You have also shown that restarting keepalived frees about 1.4Gb of memory, whereas the receive socket buffer is 36Mb, so it looks as though there must be something else going on here too.

@Turgon37
Copy link
Author

Ok, it's my bad, the keepalived version we are using is less than this on which the commit 40e4a17 have been applied, so this "logging" feature is consequently not available in our keepalived runtine :)

About the memory "leak", I'm very frustrated because I really cannot find precisely where the memory is consumed. As you, I'm correctly understand the output provided by the netstat/ss command but the memory usage reported by theses tools mismatch with the lost memory amount. So we have to wait I've applied your patch before to continue to investigate.

Really thanks for your help :)

@Turgon37
Copy link
Author

Turgon37 commented Apr 26, 2018

I come to bring news about this issue, I've applied your patch from 15:54:53 yesterday and for now the memory seems to be still stable.

This is the result with the unpatched 1.4.3 version
version_v1 4 3_unpatche

then, with the patched 1.4.3 version
version_v1 4 3_patched

it could be still evolve in the next hours but we can notice that the memory usage is reasonable

@pqarmitage
Copy link
Collaborator

@Turgon37 Could you please provide the output of ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }' and free -h while running the patched version since it looks like there is still a steady consumption of memory.

It would also be interesting to know how much CPU time keepalived is consuming.

@Turgon37
Copy link
Author

I'm sorry the two images above were inverted, I've edited the message to fix their order, the patched version of keepalived is the one that correspond to the best stable graph

It appear that the socket is still accumulating data but very slower than before the patch

root@server # ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }'
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port   
nl     UNCONN     4352   0              tcpdiag:ss/3842                *       
nl     UNCONN     768    0              tcpdiag:kernel                 *       
raw    UNCONN     213312 0                    *:vrrp                  *:*
root@server # free -h
             total       used       free     shared    buffers     cached
Mem:          2.0G       1.7G       305M        40M       279M       744M
-/+ buffers/cache:       680M       1.3G
Swap:         487M       131M       356M

@pqarmitage
Copy link
Collaborator

The reason that the receive queue is smaller with the patch applied is that the patch causes keepalived to explicitly set the receive queue size for the vrrp sockets to 212992, i.e. it is overriding the net.core.rmem_default = 37748736 setting you have, and using a more 'standard' value for the receive queue size for the sockets.

There is still the issue of why the receive queue for the vrrp socket is growing up to it's limit. It is almost as though the keepalived vrrp process has stopped processing anything. Can you try sending SIGUSR1 signal to the parent keepalived process, and posting the output of the /tmp/keepalived.data file that should be created. It would be helpful to know the time the file was created too.

The output of ps -eo pmem,pcpu,vsize,pid,cmd would be interesting to see, especially if executed twice, a couple of hours apart, so we can see how process memory usage is growing.

@Turgon37
Copy link
Author

I've put your ask files in the following zip I it can help you to work on this issue

exports.zip

Regards

@pqarmitage
Copy link
Collaborator

pqarmitage commented May 7, 2018

From the two ps files it doesn't appear that keepalived itself is consuming additional memory over time. It also seems that the backup keepalived is still receiving adverts from the master, since otherwise it would transition to master itself.

You have stated Under Linux defaults values [rmem_default] the problem do not happen. Can you check whether when running with the default values, the Recv-Q as reported by ss grows, or whether it remains at or close to 0.

It looks as though, despite the backup process receiving the vrrp adverts from the master, the kernel isn't freeing up the resources used to deliver the packets to the vrrp process, and this is gradually over time consuming more and more memory. When later the keepalived process is terminated, all the resources that the kernel is holding for the keepalived processes are released, and you have plenty of free memory again. At the moment this looks to me to be more of a kernel issue than a keepalived issue. Does the same problem occur if the system that is normally the master is the backup, and vice versa (you can make this happen by adding the nopreempt keyword in the vrrp_instance block and restarting the master instance of keepalived)?

Can you provide a description of the systems that you are running keepalived on, i.e. what distro, what version, what kernel version etc. Are you running in a virtual machine or container, and if so a description of that environment as well as the host environment. If I can run that in a virtual machine I might be able to reproduce the problem).

@pqarmitage
Copy link
Collaborator

Closing since no response for over 2 weeks. If you have more information re this issue, please report it here and the issue can be reopened.

@Turgon37
Copy link
Author

Turgon37 commented May 23, 2018

@pqarmitage Hello, I'm sorry to have been away so long, I was working on higher priority production problems.

I] about your first query

If I reset rmem to it's defaults values :

Before, the vrrp socket was full to a a strange value => 6885120

root@server ~ # grep -Ev '^#|^$' /etc/sysctl.conf 
net.core.rmem_default = 37748736
net.core.rmem_max = 37748736
root@server ~ # ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }'
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port   
nl     UNCONN     4352   0              tcpdiag:ss/26668                *       
nl     UNCONN     768    0              tcpdiag:kernel                 *       
raw    UNCONN     6885120 0                    *:vrrp                  *:*       

After to reset the original values

root@server ~ # sysctl net.core.rmem_default=212992; sysctl net.core.rmem_max=212992
net.core.rmem_default = 212992
net.core.rmem_max = 212992
root@server ~ # systemctl restart keepalived
root@server ~ # ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }'
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port   
nl     UNCONN     4352   0              tcpdiag:ss/28161                *       
nl     UNCONN     768    0              tcpdiag:kernel                 *       
raw    UNCONN     9152   0                    *:vrrp                  *:*

And some minute later (circa 10), the socket is full at the sysctl value

root@server ~ # ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }'
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer Address:Port   
nl     UNCONN     4352   0              tcpdiag:ss/3212                *       
nl     UNCONN     768    0              tcpdiag:kernel                 *       
raw    UNCONN     213312 0                    *:vrrp                  *:* 

II] about your second questions

This problem apply to any backup node, If I trigger a master switch, the old backup become the master and stop to "leak" memory. Then reciprocally, the old master that became the new backup start to consume memory at the same rate than the old backup (the two servers have same physical memory amount)

III] Our platform run on debian
Full lsb release is

root@server ~ # lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 8.10 (jessie)
Release:	8.10
Codename:	jessie

Keepalived version is

v1.2.13 (05/28,2014)

The only sysctl values we have tune are

net.core.rmem_default = 37748736
net.core.rmem_max = 37748736
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.conf.lo.arp_ignore = 1
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2

==>> So, If you planned to reproduce the problem, please feel free to know that under 24M (25165824) memory consumption stay flat but above 36M (37748736) the memory consumption start to grow.

Thanks again for your help

@pqarmitage pqarmitage reopened this May 23, 2018
@pqarmitage
Copy link
Collaborator

I see the keepalived version you are using is VERY old (1448 non merge commits behind the current release). Could you try running keepalived v1.4.4 and see if you experience the same problems.

@pqarmitage
Copy link
Collaborator

The issue of the receive buffer queue filling up was resovled in commit b71116f. The first release version of keepalived that it was included in was v1.2.20 on 2 Apr 2016.

If you upgrade to a recent version of keepalived, such as v1.4.4 as suggested above, the problem will be resolved.

@Turgon37
Copy link
Author

@pqarmitage The v1.2.13 version is the one that is available on debian repository, but during my tests above (see #839 (comment)), I've temporary switched to the latest release with and without the patch.
I see that only your patch seems to solve the problem.

@pqarmitage
Copy link
Collaborator

@Turgon37 Could you please try running the unpatched version of v1.4.3 or later (v2.0.0 is the current release), and post the output of the following:

  1. keepalived -v
  2. netstat -anp | grep keepalived

@Turgon37
Copy link
Author

Hello, I've just to re-download the home compiled version of keepalived into one of our servers

The keepalived version output

root@server ~ # keepalived -v
Keepalived v1.4.3 (04/11,2018), git commit v1.4.3-2-g27f7e51+

Copyright(C) 2001-2018 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 3.16.51
Running on Linux 3.16.0-6-amd64 #1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08)

Build options:  PIPE2 FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP RTAX_QUICKACK FRA_OIFNAME NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION LVS VRRP VRRP_AUTH VRRP_VMAC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE OLD_CHKSUM_COMPAT FIB_ROUTING SO_MARK

The related sysctl

root@server ~ # sysctl -A | grep net.core.rmem
net.core.rmem_default = 37748736
net.core.rmem_max = 37748736

The netstat output

root@server ~ # netstat -anp | grep keepalived
raw   3355968      0 0.0.0.0:112             0.0.0.0:*               7           16418/keepalived
raw        0      0 0.0.0.0:112             0.0.0.0:*               7           16418/keepalived
unix  2      [ ]         DGRAM                    141458583 16416/keepalived    

So please be informed that the current value of 3 355 968 bytes is currently growing up. I suppose it will grow up until the sysctl value set above.

I've tried to identify a difference between the memory consumption reported by available linux tool, and the difference between ps output (based on /proc/*) and free output (based on /proc/meminfo) is currently growing too but at a very slower rate

root@server ~ # echo -n 'sum of used RSS memory using ps '; ps faux | sort -k 6 -n | awk 'BEGIN {count=0} { count=count + $6 } END{print count} '; echo used - shared - buffers -cached; echo -n 'calculated RSS used memory using free '; free -k | sed -n '2p' | awk '{ print $3-$5-$6-$7 }'

sum of used RSS memory using ps 456 932
used - shared - buffers -cached
calculated RSS used memory using free 604 228

@pqarmitage pqarmitage reopened this May 31, 2018
@Turgon37
Copy link
Author

Turgon37 commented Jun 1, 2018

Today, I noticed that the memory has grown until to fill the maximum available memory

root@server ~ # free -h
             total       used       free     shared    buffers     cached
Mem:          2.0G       1.9G        73M       8.4M       824K        31M
root@server ~ # echo -n 'sum of used RSS memory using ps '; ps faux | sort -k 6 -n | awk 'BEGIN {count=0} { count=count + $6 } END{print count} '; echo used - shared - buffers -cached; echo -n 'calculated RSS used memory using free '; free -k | sed -n '2p' | awk '{ print $3-$5-$6-$7 }'
sum of used RSS memory using ps 259 064
used - shared - buffers -cached
calculated RSS used memory using free 1 916 444
root@server ~ # netstat -anp | grep keepalived
raw   37749184      0 0.0.0.0:112             0.0.0.0:*               7           16418/keepalived
raw        0      0 0.0.0.0:112             0.0.0.0:*               7           16418/keepalived
unix  2      [ ]         DGRAM                    141458583 16416/keepalived

image

@pqarmitage
Copy link
Collaborator

@Turgon37 I have been investigating this further, and at the moment I suspect that the problem is due to sockets that are opened for sending, and which currently aren't read, are having data queued to those sockets.

I have produced a patch (rx_bufs.patch-1e57698.txt) which will read data on the send sockets, and log some information if it receives any such data (this patch is for the current git HEAD - commit 1e57698). This version (rx_bufs-27f7e514.patch.txt) is for the version you indicate above that you have built - v1.4.3-2-g27f7e51+.

If you could apply the patch and monitor the output of ss -a | awk '{ if ($3 > 0 or $4 > 0) {print $0} }' to see if it stops the receive buffer queue. Also have a look at the keepalived log entries to see if there are any of the following:
Write socket nn, thread_type nn
Read on send socket nn failed ...
Received message on send socket ...
and if there are post the log messages so that we can see what is happening.

Unfortunately I can't test the patches since I cannot reproduce the problem, but I have run keepalived with the patches applied and it at least doesn't seem to cause problems to keepalived.

@pqarmitage
Copy link
Collaborator

@Turgon37 Good news - I can reproduce this problem in a Debian Jessie VM. The patch in the previous comment indeed identifies the issue. Adverts from the master are being queued to the send socket of the backup, however the send socket is configured not to receive any multicast packets.

I'm not seeing this problem on Debian Stretch, but I can see the same problem on CentOS 7 with a 3.10.0 kernel and also on Fedora 16 with a kernel 3.6.11. my Debian Jessie has a 3.16.0 kernel. I've had a quick look at the kernel sources and can't see anything obvious about any change to socket option IP_MULTICAST_ALL that would explain the change in behaviour, however I suspect it is probably a kernel issue, although it may something to do with default system settings.

I will produce a patch to work around this problem. I think it would also be beneficial for keepalived to set the maximum size of the socket receive buffers to something sufficient for keepalived but relatively small, so that the settings of net.core.rmem_default/net.core.rmem_max` don't cause system problems.

There also seems to be an issue that when net.core.rmem_default/net.core.rmem_max` are set to 37748736, then 1.5G b of memory can be freed by restarting keepalived.

@pqarmitage
Copy link
Collaborator

Commit 6fb5980 is the first patch, and should stop the receive queues growing and causing the system to having memory problems.

@pqarmitage
Copy link
Collaborator

@Turgon37 Are you still experiencing the problem?

@pqarmitage
Copy link
Collaborator

@Turgon37 Commit 81a92ab, which was merged last Wednesday (18 July) completely alters the way keepalived stops packets being queued to the send sockets (now rather than receiving and discarding the packets it applies a BPF filter to filter out all such packets). If you are still experiencing problems it would be worth testing with the latest code.

@ING-XIAOJIAN
Copy link

ING-XIAOJIAN commented Dec 22, 2022

hello, I'm encountering this issue with keepalived 2.0.10. My infrastructure is similar to this presenter. but this issue is occasional. when I encountered this issue, I upgrade Keepalived to the latest version(v2.2.7), which didn't happen again. Still, I worry about happening this issue again, So I wanna reproduce this issue on v2.0.10 and have a deep understanding and guarantee that latest version(v2.2.7) has resolved this issue. any ideas?

@pqarmitage
Copy link
Collaborator

@ING-XIAOJIAN It is unlikely that it is this problem that you are experiencing on v2.0.10 since it was resolved a long time before that, and the updated fix of using a BPF filter was added in v2.0.6. It is more likely that you are experiencing the problem of issue #1080, as you have already identified, which wasn't resolved until v2.0.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants