Skip to content

Commit

Permalink
datapath: Fix ICMP ID placement in CT entries
Browse files Browse the repository at this point in the history
The [1] changed the ICMP ECHO/ECHO_REPLY ID placement in CT entries in
order to fix the problem when an egress NAT entry for ECHO_REPLY cannot
be found by a corresponding CT entry which lead to leaking NAT entries,
as the CT GC could not find the NAT entries by the given CT entry.

The changed placement introduced an interesting problem described below.

What happens when a pod (10.154.0.89) sends ICMP EchoRequest to 8.8.8.8?

A CT entry with the following key is created:

dst         src          dport sport    TUPLE_F_OUT
|           |            |     |        |
0a 9a 00 59 08 08 08 08  00 00 08 00 01 00 <-- dst=pod because of
					       the reverse before the
                                               second __ct_lookup.
("ICMP OUT 10.154.0.89:2048 -> 8.8.8.8:0 [...]" in the
"cilium bpf ct list global" output).

What happens when 8.8.8.8 sends ICMP EchoRequest to the pod? The lookup
is performed for the reverse flow first with the following key:

dst         src          dport sport    TUPLE_F_OUT <-- dir is TUPLE_F_OUT
|           |            |     |        |               because we do the
0a 9a 00 59 08 08 08 08  00 00 08 00 01 00              lookup in reverse
							order first.

The key matches the first __ct_lookup(), hence the return is CT_REPLY.
Previously, before the changed ID placement, the CT key for 8.8.8.8 ->
the pod lookup was:

0a 9a 00 59 08 08 08 08  08 00 00 00 01 00

This resulted in CT_NEW instead of CT_REPLY.

[1]: #12729

Signed-off-by: Martynas Pumputis <m@lambda.lt>
  • Loading branch information
brb authored and joestringer committed Mar 9, 2021
1 parent 572006a commit ff6ebae
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 19 deletions.
17 changes: 8 additions & 9 deletions bpf/lib/conntrack.h
Original file line number Diff line number Diff line change
Expand Up @@ -346,12 +346,12 @@ static __always_inline int ct_lookup6(const void *map,
tuple->flags |= TUPLE_F_RELATED;
break;

case ICMPV6_ECHO_REQUEST:
case ICMPV6_ECHO_REPLY:
if (dir == CT_INGRESS)
tuple->sport = identifier;
else
tuple->dport = identifier;
tuple->sport = identifier;
break;

case ICMPV6_ECHO_REQUEST:
tuple->dport = identifier;
/* fall through */
default:
action = ACTION_CREATE;
Expand Down Expand Up @@ -530,11 +530,10 @@ static __always_inline int ct_lookup4(const void *map,
break;

case ICMP_ECHOREPLY:
tuple->sport = identifier;
break;
case ICMP_ECHO:
if (dir == CT_INGRESS)
tuple->sport = identifier;
else
tuple->dport = identifier;
tuple->dport = identifier;
/* fall through */
default:
action = ACTION_CREATE;
Expand Down
22 changes: 12 additions & 10 deletions test/runtime/connectivity.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,14 +142,12 @@ var runtimeConntrackTest = func(datapathMode string) func() {
assert: BeFalse,
},
{
// see comment below about ICMP ids
from: helpers.Client,
to: helpers.Ping6WithID(serverDockerNetworking[helpers.IPv6], 1111),
destination: helpers.Server,
assert: BeTrue,
},
{
// see comment below about ICMP ids
from: helpers.Client,
to: helpers.PingWithID(serverDockerNetworking[helpers.IPv4], 1111),
destination: helpers.Server,
Expand Down Expand Up @@ -203,21 +201,25 @@ var runtimeConntrackTest = func(datapathMode string) func() {

By("Testing bidirectional connectivity from client to server")

// NB: Previous versions of this test did not specify the ICMP id, which
// presumably caused transient errors (see #12891) when the ICMP ids for the
// valid direction (client->server) matched the ICMP ids for the invalid
// direction (server->client). We now ensure that the ICMP ids do not match.
// Furthermore, the original issue can be now easily reproduced by changing
// 2222 to 1111 below.
By("container %s pinging %s IPv6 (should NOT work)", helpers.Server, helpers.Client)
res = vm.ContainerExec(helpers.Server, helpers.Ping6WithID(clientDockerNetworking[helpers.IPv6], 1111))
ExpectWithOffset(1, res).ShouldNot(helpers.CMDSuccess(),
"container %q unexpectedly was able to ping to %q IP:%q with ID:%d", helpers.Server, helpers.Client,
clientDockerNetworking[helpers.IPv6], 1111)
res = vm.ContainerExec(helpers.Server, helpers.Ping6WithID(clientDockerNetworking[helpers.IPv6], 2222))
ExpectWithOffset(1, res).ShouldNot(helpers.CMDSuccess(),
"container %q unexpectedly was able to ping to %q IP:%q", helpers.Server, helpers.Client, clientDockerNetworking[helpers.IPv6])
"container %q unexpectedly was able to ping to %q IP:%q with ID:%d", helpers.Server, helpers.Client,
clientDockerNetworking[helpers.IPv6], 2222)

By("container %s pinging %s IPv4 (should NOT work)", helpers.Server, helpers.Client)
res = vm.ContainerExec(helpers.Server, helpers.PingWithID(clientDockerNetworking[helpers.IPv4], 1111))
ExpectWithOffset(1, res).ShouldNot(helpers.CMDSuccess(),
"%q was unexpectedly able to ping to %q IP:%q with ID:%d", helpers.Server, helpers.Client,
clientDockerNetworking[helpers.IPv4], 1111)
res = vm.ContainerExec(helpers.Server, helpers.PingWithID(clientDockerNetworking[helpers.IPv4], 2222))
ExpectWithOffset(1, res).ShouldNot(helpers.CMDSuccess(),
"%q was unexpectedly able to ping to %q IP:%q", helpers.Server, helpers.Client, clientDockerNetworking[helpers.IPv4])
"%q was unexpectedly able to ping to %q IP:%q with ID:%d", helpers.Server, helpers.Client,
clientDockerNetworking[helpers.IPv4], 2222)

By("============= Finished Connectivity Test ============= ")
}
Expand Down

0 comments on commit ff6ebae

Please sign in to comment.