New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the ipAddrAdd performance #965
Conversation
Table: unix.RT_TABLE_LOCAL, | ||
Protocol: unix.RTPROT_KERNEL, | ||
Scope: unix.RT_SCOPE_HOST, | ||
Type: unix.RTN_LOCAL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters come from running the original ip route replace ...
command through strace:
strace ip route replace local 10.97.216.74 dev kube-dummy-if table local proto kernel scope host src 172.17.0.2
which prints the system call with needed constants:
sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=52, type=RTM_NEWROUTE, flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_REPLACE|NLM_F_CREATE, seq=1596701111, pid=0}, {rtm_family=AF_INET, rtm_dst_len=32, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_LOCAL, rtm_protocol=RTPROT_KERNEL, rtm_scope=RT_SCOPE_HOST, rtm_type=RTN_LOCAL, rtm_flags=0}, [{{nla_len=8, nla_type=RTA_DST}, inet_addr("10.97.216.74")}, {{nla_len=8, nla_type=RTA_PREFSRC}, inet_addr("172.17.0.2")}, {{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("kube-dummy-if")}]}, iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 52
@iamakulov thanks for your PR We ran into some issues with Action done by netlink routereplace was done to address specific issue with using service proxy in case of nodes with multiple network interface. This is the problem kubernetes/kubeadm#102 (comment) and recommended solution https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#check-network-adapters. That piece of code is trying to automate above requirement. On hindsight adding a route by exec command or with netlink libray for each VIP just does not scale. Now that kube-router has Not sure you have motivation to work on the fix, but here is what i think would be better approach
Unfortunatley CIDR that is used for cluster Ip's can not inferred from the kubernetes API's to we will have to rely on user input through this parameter. |
This commit removes the `Exec("ip", "route", "replace")` call from `ipAddrAdd`. When `setupClusterIPServices` calls `ipAddrAdd` in a loop, `Exec()` gets executed multiple times and becomes a hot path. Instead of these looped `Exec()` calls, this method adds a single `Exec("ip", "route", "replace")` call that sets up routing based on the cluster CIDR. This was suggested in cloudnativelabs#965 (comment)
d2e223b
to
93fedf1
Compare
@murali-reddy Just took a round on your suggestion! Did I get your idea right? |
thanks @iamakulov PR looks good. Will give a quick test and will update the PR. |
This commit removes the `Exec("ip", "route", "replace")` call from `ipAddrAdd`. When `setupClusterIPServices` calls `ipAddrAdd` in a loop, `Exec()` gets executed multiple times and becomes a hot path. Instead of these looped `Exec()` calls, this method adds a single `Exec("ip", "route", "replace")` call that sets up routing based on the cluster CIDR. This was suggested in cloudnativelabs#965 (comment)
93fedf1
to
c1ff9f9
Compare
(Rebased on master.) |
@iamakulov Apolgies for the delay in review. We are trying to get 1.1 release out in next couple of days. Marking for 1.2 release which is focussed on performance enhancements. |
Tested this patch in production, and it doesn’t work as expected. 1) For new nodes, 2) The change resurrects #376.
$ ip addr add 10.10.10.10 dev kube-dummy-if
$ ip addr list kube-dummy-if
8: kube-dummy-if: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
link/ether 56:51:fe:11:e3:02 brd ff:ff:ff:ff:ff:ff
inet 10.10.10.10/32 brd 10.10.10.10 scope link kube-dummy-if
valid_lft forever preferred_lft forever
$ ip route show table all | grep 10.10.10.10
local 10.10.10.10 dev kube-dummy-if table local proto kernel scope host src 10.10.10.10 # ← here, `src 10.10.10.10` is the problem
broadcast 10.10.10.10 dev kube-dummy-if table local proto kernel scope link src 10.10.10.10 Before, we were fixing this by calling To fix both issues, I’m going to change the approach:
I’ll test this change in our system and then cherry-pick it here. |
Also, re: UX/DX. This PR makes the Kubernetes already knows the |
@iamakulov While it's true that Kubernetes does know the service-cluster-ip-range value, as far as I know, it doesn't expose that information anywhere that is obtainable with through the API. Although, it may just be that it's not in a common place and I haven't run across it yet? |
Right, it appears that it’s not implemented: kubernetes/kubernetes#25533 kubernetes/kubernetes#46508 Would it be a viable idea to make (If yes, it might be a good idea to also document how to retrieve this value – from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally coming back around this one. The only thing that I can see is that we should do as you suggest and make --service-cluster-ip-range
a required argument as well as update the documentation in this area.
Otherwise this looks good to me.
This commit removes the `Exec("ip", "route", "replace")` call from `ipAddrAdd`. When `setupClusterIPServices` calls `ipAddrAdd` in a loop, `Exec()` gets executed multiple times and becomes a hot path. Instead of these looped `Exec()` calls, this method adds a single `Exec("ip", "route", "replace")` call that sets up routing based on the cluster CIDR. This was suggested in cloudnativelabs#965 (comment)
@@ -1006,7 +995,7 @@ func (ln *linuxNetworking) prepareEndpointForDsr(containerID string, endpointIP | |||
} | |||
|
|||
// assign VIP to the KUBE_TUNNEL_IF interface | |||
err = ln.ipAddrAdd(tunIf, vip, false) | |||
err = ln.ipAddrAdd(tunIf, vip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iamakulov You missed another one of these changes on line 1229: https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/proxy/network_services_controller.go#L1229
naddr := &netlink.Addr{IPNet: &net.IPNet{IP: net.ParseIP(ip), Mask: net.IPv4Mask(255, 255, 255, 255)}, Scope: syscall.RT_SCOPE_LINK} | ||
err := netlink.AddrAdd(iface, naddr) | ||
if err != nil && err.Error() != IfaceHasAddr { | ||
glog.Errorf("Failed to assign cluster ip %s to dummy interface: %s", | ||
naddr.IPNet.IP.String(), err.Error()) | ||
return err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iamakulov I found an additional problem with this logic. Down below, you only account for the ClusterIP range in the enforceRightServiceInterface()
function. However, we also needs routes inserted for ExternalIP and LoadBalancer VIPs if we remove this section here.
ExternalIPs and LoadBalancer VIPs are less likely to be in a cohesive subnet, and I don't think that it would be good to use the same process for those (even though there is a parameter for them) as it would likely exclude certain valid use-cases for users.
I think that we'll likely have to go back to the drawing board here.
Hey! I’m afraid I left the project I was working for when I implemented this patch, so I won’t be able to contribute fixes for this PR. Sorry! However, as I mentioned in #965 (comment), this patch actually doesn’t work as expected, so this PR can’t be merged as-is. In the project I was working for, we ended up taking the alternative approach:
– and this one worked well. It’s been running in production for a few months with no observable issues. Let me know if there’s anything else I can help with! |
Closing as this PR is incomplete and the author unfortunately had to move on. In NPC we've started taking the approach of using ipset save and ipset restore, and likely we'll work that same functionality into the NSC in the future as well if we wanted to approach this problem again. |
This PR replaces the
Exec()
call insideipAddrAdd()
withnetlink.RouteReplace()
.When calling
ipAddrAdd
in a loop (e.g., insidesetupClusterIPServices
), this gives a significant performance boost.Ref: #962