Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on raw vrrp socket ? #1973

Closed
xqjcool1 opened this issue Aug 23, 2021 · 14 comments
Closed

Memory leak on raw vrrp socket ? #1973

xqjcool1 opened this issue Aug 23, 2021 · 14 comments

Comments

@xqjcool1
Copy link
Contributor

Describe the bug
Hello, I'm encountering an issue which seems similar to #839
In our infrastructure we have two node with keepalived installed

The configuration is simple with just one virutal ip address share between these two nodes

there is memory leak on vrrp socket.

[root@cpe ~]# ss -pf link
Netid  Recv-Q Send-Q                                                                                     Local Address:Port                                                                                                      Peer Address:Port                
p_raw  204800512 0                                                                                                   rarp:*                                                                                                                     *                      users:(("keepalived",pid=2547,fd=12))
[root@cpe ~]# ps aux | grep keepalived
root       2546  0.0  0.0  71744  1048 ?        Ss   19:00   0:00 /usr/sbin/keepalived -D -P
root       2547  0.0  0.0  71744  1796 ?        S    19:00   0:00 /usr/sbin/keepalived -D -P
root       3560  0.0  0.0 112648   968 pts/1    S+   19:22   0:00 grep --color=auto keepalived

To Reproduce
send broadcast rarp packet to the vrrp master equipment.

Expected behavior
no memory leak.

Keepalived version

[root@cpe ~]# keepalived --version
Keepalived v2.2.2 (04/20,2021), git commit v2.1.5-429-gb8eda11+

Copyright(C) 2001-2021 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 3.10.0
Running on Linux 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017
Distro: CentOS Linux 7 (Core)

configure options: --prefix=/home/chenxu/code/third/keepalived/binary/centos PKG_CONFIG_PATH=:/usr/local/lib/pkgconfig

Config options:  LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT INIT=systemd

System options:  VSYSLOG LIBNL3 RTA_ENCAP RTA_EXPIRES FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK IFA_FLAGS IPTABLES NET_LINUX_IF_H_COLLISION NET_LINUX_IF_ETHER_H_COLLISION LIBIPVS_NETLINK IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE SO_MARK

Distro (please complete the following information):

[root@cpe ~]# uname -a
Linux cpe 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@cpe ~]# cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core)

Details of any containerisation or hosted service (e.g. AWS)
If keepalived is being run in a container or on a hosted service, provide full details

Configuration file:

[root@cpe ~]# cat /etc/keepalived/keepalived.conf 
vrrp_instance LW_CPE_HB1 {
state BACKUP

virtual_router_id 51
priority 200
advert_int 1

interface eth5
dont_track_primary
preempt_delay 0

track_interface {
eth3 weight -55
eth0 weight -55
eth1 weight -55
}
virtual_ipaddress {
10.10.2.5/16 brd 10.10.255.255 dev eth0 no_track
172.21.1.5/16 brd 172.21.255.255 dev eth1 no_track
172.16.100.1/24 brd 172.16.100.255 dev eth3 no_track
}
notify /bin/appex/ha/notify
}
@xqjcool1
Copy link
Contributor Author

I checked the code and found the cause of the issue. I fixed it and the pull request is as below
#1975

@pqarmitage
Copy link
Collaborator

@xqjcool1 Many thanks for providing the patch. The socket(7) man page states of SO_RCVBUF The minimum (doubled) value for this option is 256, so setting SO_RCVBUF to 0 does quite stop all packets being queued.

I have done some more investigating, and I have discovered that shutdown(fd, SHUT_RD) might be the right way to do it. I need to find a way to generate some RARP traffic so that I can see the problem and then check if using shutdown() works.

Before looking into this issue, I hadn't been aware of the shutdown() function. We used to have a similar problem with packets being queued to vrrp send sockets; I resolved this by adding a BPF filter on the socket to drop all received packets (see if_setsockopt_no_receive() in vrrp_if.c. My recollection is that at the time I tried setting SO_RCVBUF to 0, but it didn't quite work.

Are you in a position to try modifying your patch to use shutdown(garp_fd, SHUT_RD) to see if that works for you?

@xqjcool1
Copy link
Contributor Author

@pqarmitage OK. I‘ll try it and see if it works.

@xqjcool1
Copy link
Contributor Author

@pqarmitage Unfortunately, garp sockets do not support shutdown operations.

error log:

Aug 27 12:48:08 cpe Keepalived_vrrp[94304]: Assigned address 169.254.201.3 for interface eth5
Aug 27 12:48:08 cpe Keepalived_vrrp[94304]: Registering gratuitous ARP shared channel
Aug 27 12:48:08 cpe Keepalived_vrrp[94304]: shutdown SHUT_RD for socket 12 failed (95) - Operation not supported

linux kernel source code:

static const struct proto_ops packet_ops = {
	.family =	PF_PACKET,
	.owner =	THIS_MODULE,
	.release =	packet_release,
	.bind =		packet_bind,
	.connect =	sock_no_connect,
	.socketpair =	sock_no_socketpair,
	.accept =	sock_no_accept,
	.getname =	packet_getname,
	.poll =		packet_poll,
	.ioctl =	packet_ioctl,
	.listen =	sock_no_listen,
	.shutdown =	sock_no_shutdown,
	.setsockopt =	packet_setsockopt,
	.getsockopt =	packet_getsockopt,
	.sendmsg =	packet_sendmsg,
	.recvmsg =	packet_recvmsg,
	.mmap =		packet_mmap,
	.sendpage =	sock_no_sendpage,
};

int sock_no_shutdown(struct socket *sock, int how)
{
	return -EOPNOTSUPP;
}

@pqarmitage
Copy link
Collaborator

@xqjcool1 Many thanks for trying the shutdown() option; it would have been good if it had worked.

I would prefer to be consistent in the way we stop received packets being queued on a socket, so could you please try the following patch to see if it stops your receive queue building up. I have tried it on my development system and it doesn't report an error and the sending of gratuitous ARP messages still works.

diff --git a/keepalived/vrrp/vrrp_arp.c b/keepalived/vrrp/vrrp_arp.c
index 3730e8c4..2d9b0c92 100644
--- a/keepalived/vrrp/vrrp_arp.c
+++ b/keepalived/vrrp/vrrp_arp.c
@@ -228,6 +228,9 @@ void gratuitous_arp_init(void)
                return;
        }
 
+       /* We don't want to receive any data on this socket */
+       if_setsockopt_no_receive(&garp_fd);
+
        /* Initalize shared buffer */
        garp_buffer = PTR_CAST(char, MALLOC(GARP_BUFFER_SIZE));
 }

Do you think the same should be done for the IPv6 NDISC socket in vrrp_ndisc.c?

@xqjcool1
Copy link
Contributor Author

@pqarmitage Your patch works well.
In my opinion, it should be done for IPv6 as well.

@pqarmitage
Copy link
Collaborator

@xqjcool1 I have just noticed that in gratuitous_arp_init(), the protocol specified when the socket is opened is ETH_P_RARP, which would explain why your RARP packets are being queued to the socket. We definitely don't want to just change this to ETH_P_ARP, since that will exacerbate the problem for everyone. keepalived has been using ETH_P_RARP since 2002, without, so far as I am aware, anyone else raising it as an issue.

@pqarmitage
Copy link
Collaborator

@xqjcool1 I am quite happy to produce a patch, or would you prefer to do so since you identified the issue?

@xqjcool1
Copy link
Contributor Author

@pqarmitage It's an honor to let me do it. :-) Thanks very much. I'll do it ASAP.

@xqjcool1
Copy link
Contributor Author

@pqarmitage Please see the pull request: #1982

@pqarmitage
Copy link
Collaborator

@xqjcool1 Many thanks for the patch.

I have written a small program to send RARP requests, so I was able to see for myself the socket receive queue building up. I have therefore been able to confirm for myself that your patch resolves the issue.

Now that there are no longer any packets queued on the socket, I have also merged commit 1c0d791 which now opens the garp socket using ETH_P_ARP rather than ETH_P_RARP. It will be less confusing for anyone looking at the code in the future.

Many thanks for your help with this.

@ING-XIAOJIAN
Copy link

I can't reproduce this problem by sending rarp request. below is my command of sending rarp request.
nping --arp --arp-type RARP --arp-sender-mac xx:xx:xx:xx --arp-sender-ip xx.xx.xx.xx --arp-target-mac xx.xx.xx.xx -c 1000 xx.xx.xx.xx
I wanna know how to reproduce this problem

@pqarmitage
Copy link
Collaborator

Attached is the source of the test program to generate RARP messages I referred to at #1973 (comment).

rarp.c.txt

@ING-XIAOJIAN
Copy link

Attached is the source of the test program to generate RARP messages I referred to at #1973 (comment).

rarp.c.txt

cool, I used this program to reproduce this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants