New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request time out from container to other container using hostIP:hostPort on same host with portmap CNI chained #9784
Comments
Same issue here. With cilium |
We are having the same problem, even with
portmap commit attempted to fix the problem for hair-pinning, but it looks to me that they only fixed it for the host itself and PODs that are using hostNetworking. PODs that use cluster networking are not correctly being masqueraded when a local pod to the host tries to access the hostIP:hostPort (e.g., DNAT). In the case of consul-agent using port 8500, we are able to work around the issue by adding the below:
If portmap is configured with defaults, we can also use:
It seems to me that this is an issue with portmap, but maybe there is something Cilium can do...?? :) |
I've reproduced this on Digital Ocean using the Consul helm chart. Every pod in other nodes can access the hostIP:hostPort except for the pods on that node itself. |
the workaround in the description above doesn't seem to be compatible with a CiliumNetworkPolicy that enforces ingress using EDIT .. meaning that I need to remove such policy in order for the workaround to be effective |
One additional snippet of information to add to @jdef's observations above is that the source identity in that case was 2 ( |
The commit which likely broke this is 4ae2ead. The commit is needed to make Traffic pattern looks like this:
The forward-path looks correct. The identity is correctly preserved.
The reply path loses the identity because the traffic seems to gets masqueraded. |
When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmaqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
Prior to commit 4ae2ead, Cilium implemented a relatively aggressive masquerading behavior of traffic from the local host. This simplified integration with components such as portmap and others which implement their own DNAT rules but often lack SNAT to force reply traffic back through the same path. Starting with commit 4ae2ead, this was relaxed in order to allow for proper behavior of
With the removal of this masquerade rule, it is no longer guaranteed for reply traffic to be routed symmetrically on the reply path. It was expected that portmap would ensure for this, given that it depends on seeing the reply traffic. It seems that portmap assumes that the routing path is always symmetric. The following PRs will fix this issues: |
When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io>
[ upstream commit 5f50d82 ] Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 1fb15cf ] The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit f25d8b9 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 5f50d82 ] Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 1fb15cf ] The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 5f50d82 ] Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 1fb15cf ] The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit f25d8b9 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. To work with clang-7 and avoid the pattern where the ctx is read into a register and then incremented then finally a value assigned to it, r1 = %[ctx] r1 += 8 ... *(u32)(r1 +=8) = %[mark] We wrote the code block in asm which is not the same as master branch which was able to use C code due to use of clang-11. We attempted to update the branch to clang-10 but that created a separate set of issue that was causing more code churn than we wanted. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit f25d8b9 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. To work with clang-7 and avoid the pattern where the ctx is read into a register and then incremented then finally a value assigned to it, r1 = %[ctx] r1 += 8 ... *(u32)(r1 +=8) = %[mark] We wrote the code block in asm which is not the same as master branch which was able to use C code due to use of clang-11. We attempted to update the branch to clang-10 but that created a separate set of issue that was causing more code churn than we wanted. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit f25d8b9 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. To work with clang-7 and avoid the pattern where the ctx is read into a register and then incremented then finally a value assigned to it, r1 = %[ctx] r1 += 8 ... *(u32)(r1 +=8) = %[mark] We wrote the code block in asm which is not the same as master branch which was able to use C code due to use of clang-11. We attempted to update the branch to clang-10 but that created a separate set of issue that was causing more code churn than we wanted. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit f25d8b9 ] [ based on v1.7 backport 60b4210 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. To work with clang-7 and avoid the pattern where the ctx is read into a register and then incremented then finally a value assigned to it, r1 = %[ctx] r1 += 8 ... *(u32)(r1 +=8) = %[mark] We wrote the code block in asm which is not the same as master branch which was able to use C code due to use of clang-11. We attempted to update the branch to clang-10 but that created a separate set of issue that was causing more code churn than we wanted. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 5f50d82 ] Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 1fb15cf ] The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f25d8b9 ] [ based on v1.7 backport 60b4210 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. To work with clang-7 and avoid the pattern where the ctx is read into a register and then incremented then finally a value assigned to it, r1 = %[ctx] r1 += 8 ... *(u32)(r1 +=8) = %[mark] We wrote the code block in asm which is not the same as master branch which was able to use C code due to use of clang-11. We attempted to update the branch to clang-10 but that created a separate set of issue that was causing more code churn than we wanted. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 5f50d82 ] Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 1fb15cf ] The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f25d8b9 ] [ based on v1.7 backport 60b4210 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. To work with clang-7 and avoid the pattern where the ctx is read into a register and then incremented then finally a value assigned to it, r1 = %[ctx] r1 += 8 ... *(u32)(r1 +=8) = %[mark] We wrote the code block in asm which is not the same as master branch which was able to use C code due to use of clang-11. We attempted to update the branch to clang-10 but that created a separate set of issue that was causing more code churn than we wanted. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 5f50d82 ] Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 1fb15cf ] The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f25d8b9 ] [ based on v1.7 backport 60b4210 ] When Cilium is used in chaining mode with portmap, the hostPort is translated using iptables DNAT as inserted by the portmap plugin. When this happens all within a node, we can preserve the source identity for the reply traffic for correct visibility. The traffic will be allowed anyway based on the connection tracking state. To work with clang-7 and avoid the pattern where the ctx is read into a register and then incremented then finally a value assigned to it, r1 = %[ctx] r1 += 8 ... *(u32)(r1 +=8) = %[mark] We wrote the code block in asm which is not the same as master branch which was able to use C code due to use of clang-11. We attempted to update the branch to clang-10 but that created a separate set of issue that was causing more code churn than we wanted. Updates: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 5f50d82 ] Packets to a host IP are currently redirected via cilium_host/cilium_net. The reason for this is mostly historic. For other packets where routing by the kernel routing tables is desired, packets are already passed on via TC_ACT_OK to the stack directly. The two cases where this redirection is needed are: * For proxy redirection due to a kernel limitation on passing the routing tables multiple times. This case is left untouched. * For the HOST_REDIRECT_TO_INGRESS case, e.g. flannel integration. This case is left untouched. The IPv4 and IPv6 case is brought in line to not accidently lose this logic later on. A side effect of this is that the skb gets scrubbed including the skb->mark. The presence of the identity in the skb->mark is being relied on in a follow-up fix however. Therfore, pass packets via the stack using TC_ACT_OK. This is faster, simpler, and allows for the identity to be carried in the mark. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 1fb15cf ] The traffic is sent to the stack and hairpin'ed back into a local pod after a component on the stack has applied a DNAT rule, the traffic must be SNATed to ensure the reverse NAT can take place. This can happen if portmap or kiam is being used and redirection happens to a local destination. The masquerade filter must be limited as not all DNAT traffic may be affected. NodePort traffic from a non-local source must remain unmasqueraded in order for trafficPolicy=local to continue working. Also, when EnableEndpointRoutes is enabled, traffic always traverses the stack and must not be masqueraded either. Fixes: #9784 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Bug report
General Information
cilium version
)uname -a
)kubectl version
, Mesos, ...)curl -sLO releases.cilium.io/tools/cluster-diagnosis.zip && python cluster-diagnosis.zip sysdump
and then attach the generated zip file)( The sysdump was too big to attach. Please let me know if it's needed and I'll send or make it available somehow. )
How to reproduce the issue
hostPort
hostPort
servicehostPort
pod and debug pod on the same host (e.g.nginx-2cb8j
andubuntu-vf44z
in the example below)hostPort
pod via itshostIP
Using the above example:
The
cilium monitor
output is (where id 568 is the debug podubuntu-vf44z
):Contacting the hostPort pod using its ClusterIP works:
root@ubuntu-vf44z:/# curl 172.16.3.16:80 -I HTTP/1.1 200 OK ...
and contacting a pod running on another host using the hostIP and hostPort works:
It seems the issue only arises when the source and destination pod is on the same host and the hostIP:hostPort is used.
The text was updated successfully, but these errors were encountered: