fix(network_policy): mask mark reset on FW marks #992

aauren · 2020-09-25T16:45:40Z

TL;DR;

This PR changes the way we manage marks in the NetworkPolicyController by applying a mask so that we only reset our FW mark as part of the FW chain and leave the other marks that may exist on the packet alone.

Background

Ever since kube-router v1.0.0-rc5 we've been resetting all marks on the packet whenever the pod has a network policy applied to it as part of the new MARK and RETURN flow which allows traffic to traverse all relevant network policies instead of being ACCEPT'ed after hitting the first match rule.

However, there are other systems in kube-router and in the CNI subsystems which also apply marks. When the packet is traversing a firewall chain that is applied by the NPC, it goes through like this (using FORWARD as an example here):

FORWARD -> KUBE-ROUTER-FORWARD
KUBE-ROUTER-FORWARD -> if a pod has a network policy applied to it -> KUBE-POD-FW-<hash>
KUBE-POD-FW-<hash> -> KUBE-NWPLCY-<hash> (there will be one of these for each individual policy applied)
If the pod matches a rule it will be MARKed with 0x10000/0x10000 and then RETURNed to the parent KUBE-POD-FW-<hash> chain
4.5) If it doesn't match a rule, it will finish traversing the chain and will not receive a MARK and eventually end up back in the parent KUBE-POD-FW-<hash> chain
If the packet doesn't have the mark 0x10000/0x10000 then it will be logged and rejected
If the packet does have the mark, then it will clear the mark with 0x0 and the packet will be returned to the parent KUBE-ROUTER-FORWARD chain so that it can finish processing and eventually be accepted if it makes it all the way through the KUBE-ROUTER-FORWARD chain.

Problem

The problem with this flow is how the mark is cleared. When it clears the mark with 0x0 at the end of the KUBE-POD-FW-<hash> chain, it not only clears the mark it applied, but since it doesn't have a mask on it, it clears all marks which has the potential to disrupt other elements of iptables that have applied a mark (like kube-router's DSR functionality, hostPort CNI nat markings, etc).

This change adds a mask to the mark so that we only unset the specific mark that we set in the FW chain and leave the other marks that may or may not exist on the packet intact.

Testing Procedure

I've tested this by watching the counters on a test host by mocking up some iptables chains:

This was done against a test VM that had no other rules on it:

# reproduce current kube-router state
$ iptables -N TEST_MARK_POLICY
iptables -A TEST_MARK_POLICY -m comment --comment "add hostport mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x2000/0x2000
iptables -A TEST_MARK_POLICY -m comment --comment "add kube-router FW mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x10000/0x10000
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without kube-router FW mark" -m mark ! --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "clear mark the way kube-router currently clears the mark" -j MARK --set-xmark 0x0/0xffffffff
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets with kube-router FW mark (to make sure mark no longer exists)" -m mark --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark (to make sure mark still exists)" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -j ACCEPT
iptables -I OUTPUT -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j TEST_MARK_POLICY

$ curl "10.10.2.102:9090"
{}

$ iptables -nvL TEST_MARK_POLICY
Chain TEST_MARK_POLICY (1 references)
 pkts bytes target     prot opt in     out     source               destination
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add hostport mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x2000
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add kube-router FW mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without kube-router FW mark */ mark match ! 0x10000/0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark */ mark match ! 0x2000/0x2000
    6   416 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* clear mark the way kube-router currently clears the mark */ MARK and 0x0
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets with kube-router FW mark (to make sure mark no longer exists) */ mark match 0x10000/0x10000
    6   416 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark (to make sure mark still exists) */ mark match ! 0x2000/0x2000
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Above we can see that we:

Applied both a 0x2000 mark representing the current CNI hostPort marking as well as the 0x10000 mark representing the kube-router FW mark
Ensure that the packet matched both of the marks otherwise return it
Clear the mark using the current kube-router logic 0x0
Then return the traffic if it still matches the kube-router FW mark 0x10000
Then return the traffic if it doesn't match the hostPort mark 0x2000
Add a final catch all ACCEPT at the end

From the packet counters we can see that:

the traffic goes through and receives both marks
both 0x10000 and 0x2000 were successfully applied and matched (as it the packet doesn't get returned)
we can see that the mark was successfully cleared
that the mark no longer matches 0x10000
that the mark also no longer matches 0x2000 and is returned instead of passing through the end of the chain to ACCEPT

If we then change this to use the logic from this PR we do the same thing only this time we add a mask to the mark:

# Attempt to mask mark reset with new rules
$ iptables -F TEST_MARK_POLICY
iptables -A TEST_MARK_POLICY -m comment --comment "add hostport mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x2000/0x2000
iptables -A TEST_MARK_POLICY -m comment --comment "add kube-router FW mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x10000/0x10000
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without kube-router FW mark" -m mark ! --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "clear mark the way kube-router should clear the mark" -j MARK --set-xmark 0/0x10000
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets with kube-router FW mark (to make sure mark no longer exists)" -m mark --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark (to make sure mark still exists)" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -j ACCEPT

$ curl "10.10.2.102:9090"
{}

$ iptables -nvL TEST_MARK_POLICY
Chain TEST_MARK_POLICY (1 references)
 pkts bytes target     prot opt in     out     source               destination
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add hostport mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x2000
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add kube-router FW mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without kube-router FW mark */ mark match ! 0x10000/0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark */ mark match ! 0x2000/0x2000
    6   416 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* clear mark the way kube-router should clear the mark */ MARK and 0xfffeffff
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets with kube-router FW mark (to make sure mark no longer exists) */ mark match 0x10000/0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark (to make sure mark still exists) */ mark match ! 0x2000/0x2000
    6   416 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0

From the packet counters we can see that:

the traffic goes through and receives both marks
both 0x10000 and 0x2000 were successfully applied and matched (as it the packet doesn't get returned)
we can see that the mark was successfully cleared
that the mark no longer matches 0x10000
that the mark now matches 0x2000 and is not returned
now the packet passes through the end of the chain to ACCEPT

kube-router testing

I've also tested this against our kube-router cluster and ensured that hostPort and DSR work correctly after effecting the change.

Don't resent all marks, only the mark that we originally set as part of the firewall rules so that we don't affect other systems like hostPort and other elements of the nat chain that may apply their own marks.

murali-reddy · 2020-09-25T21:34:21Z

@aauren thanks for the PR and explanation. lgtm.

fix(network_policy): mask mark reset on FW marks

f4111f1

Don't resent all marks, only the mark that we originally set as part of the firewall rules so that we don't affect other systems like hostPort and other elements of the nat chain that may apply their own marks.

aauren requested a review from murali-reddy September 25, 2020 16:50

aauren added the bug label Sep 25, 2020

aauren added this to In Progress in 1.1 via automation Sep 25, 2020

murali-reddy merged commit 5a5e835 into cloudnativelabs:master Sep 25, 2020

1.1 automation moved this from In Progress to Done Sep 25, 2020

This was referenced Jan 27, 2021

Update kube-router network policy controller k3s-io/k3s#2867

Merged

kube-router network policy controller needs to be updated k3s-io/k3s#2876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(network_policy): mask mark reset on FW marks #992

fix(network_policy): mask mark reset on FW marks #992

aauren commented Sep 25, 2020 •

edited

murali-reddy commented Sep 25, 2020

fix(network_policy): mask mark reset on FW marks #992

fix(network_policy): mask mark reset on FW marks #992

Conversation

aauren commented Sep 25, 2020 • edited

TL;DR;

Background

Problem

Testing Procedure

kube-router testing

murali-reddy commented Sep 25, 2020

aauren commented Sep 25, 2020 •

edited