Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.8 backports 2021-07-01 #16745

Merged
merged 4 commits into from
Jul 15, 2021
Merged

v1.8 backports 2021-07-01 #16745

merged 4 commits into from
Jul 15, 2021

Conversation

jrajahalme
Copy link
Member

Once this PR is merged, you can update the PR labels via:

$ for pr in 16391; do contrib/backporting/set-labels.py $pr done 1.8; done
DNS proxy is now more available during Cilium restarts, including upgrades. 

@jrajahalme jrajahalme added release-note/bug This PR fixes an issue in a previous release of Cilium. kind/backports This PR provides functionality previously merged into master. backport/1.8 labels Jul 1, 2021
@jrajahalme jrajahalme requested a review from a team as a code owner July 1, 2021 18:50
@jrajahalme
Copy link
Member Author

test-backport-1.8

@jrajahalme jrajahalme changed the title v1.9 backports 2021-07-01 v1.8 backports 2021-07-01 Jul 1, 2021
@joestringer
Copy link
Member

@jrajahalme you've signed these off with vagrant user as well :-)

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: vagrant <vagrant@runtime>

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from Joe's comment.

@joestringer
Copy link
Member

What's interesting about the failure is that in v1.8 we try to fetch the DNS proxy port from iptables rules, but don't make it unique... so apparently there are many iptables rules with the DNS proxy port:

/home/jenkins/workspace/Cilium-PR-K8s-1.14-kernel-4.9/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:461
Invalid DNS proxy port on cilium-xl854: 37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379
37379

Whereas on v1.9 or later, we use ss instead (so actually list sockets listening). Could this be a bug in the implementation?

Sure enough from the sysdump, there are many duplicate iptables rules for proxy ports:

# Generated by iptables-save v1.8.4 on Thu Jul  1 21:31:45 2021
*mangle
:PREROUTING ACCEPT [73825:41144242]
:INPUT ACCEPT [72671:41036871]
:FORWARD ACCEPT [1150:107179]
:OUTPUT ACCEPT [60208:49995726]
:POSTROUTING ACCEPT [61358:50102905]
:CILIUM_POST_mangle - [0:0]
:CILIUM_PRE_mangle - [0:0]
[73982:41207719] -A PREROUTING -m comment --comment "cilium-feeder: CILIUM_PRE_mangle" -j CILIUM_PRE_mangle
[61503:50178413] -A POSTROUTING -m comment --comment "cilium-feeder: CILIUM_POST_mangle" -j CILIUM_POST_mangle
[0:0] -A CILIUM_PRE_mangle -m socket --transparent -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p tcp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff
[0:0] -A CILIUM_PRE_mangle -p udp -m mark --mark 0x3920200 -m comment --comment "cilium: TPROXY to host cilium-dns-egress proxy" -j TPROXY --on-port 37379 --on-ip 0.0.0.0 --tproxy-mark 0x200/0xffffffff

@jrajahalme
Copy link
Member Author

Could this be a bug in the implementation?

This change added code scanning the existing rules to figure out if the rule already exists; the problem was that we formatted marks with leading zeroes, while iptables formats them without. While proxy port numbers are large enough to not actually have leading zeroes, the port in the mark is in the host byte order, causing the mark to have leading zeroes whenever the low byte of the mark is < 0x10.

The 1.8 CI code catched this due to the code only expecting one port number existing for any proxy. The CI fail would have been similar regardless if the port numbers were duplicates or not.

Fixed this by changing the formatting, and using a port number with leading zeroes in the mark in unit tests.

@jrajahalme jrajahalme force-pushed the pr/v1.8-backport-2021-07-01 branch from 764f9e9 to 367b174 Compare July 7, 2021 11:38
@jrajahalme
Copy link
Member Author

test-backport-1.8

@pchaigno
Copy link
Member

pchaigno commented Jul 7, 2021

Fixed this by changing the formatting, and using a port number with leading zeroes in the mark in unit tests.

Do we need the same fix in master?

@jrajahalme
Copy link
Member Author

Do we need the same fix in master?

#16817

@jrajahalme
Copy link
Member Author

CI infra fails, retesting

@jrajahalme
Copy link
Member Author

test-backport-1.8

@jrajahalme
Copy link
Member Author

Build runtime vm provisioning fail v1.8 backports 2021-07-01

@jrajahalme
Copy link
Member Author

test-runtime

@jrajahalme
Copy link
Member Author

test-upstream-k8s
vagrant up fail

[ upstream commit 537715a ]

Wrap "iptables" and "ip6tables" programs with iptablesInterface so
that unit testing can mock up the executables.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 5839d23 ]

Keep old iptables rules by renaming Cilium chains so that new rules
can be added while old are still in use. Copy old TPROXY rules from
the renamed old rules. Remove the backups only after new rules have
been successfully added.

This change makes it possible to keep old rules in effect while adding
new ones without special consideration for transient rules.

On first initialization only copy over the DNS proxy TPROXY rules, as
other proxies can't reuse old proxy ports across restarts.

Pick the last applicable proxy port from iptables, if multiple are
present.

Remove stale TPROXY rules once the current port is known.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 28e7e39 ]

Panicing in Finalize functions may leave endpoint locked and brick the
whole agent. Better avoid itt and log errors instead, and unlock the
Endpoint in defer if it still happens.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit d5ff687 ]

Remove leading zeroes from marks, as 'iptables' is not formatting
them. This allows proper matching of existing rules and avoids
appending duplicate rules.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme
Copy link
Member Author

Rebased, added upstream commit reference to the last commit

@jrajahalme
Copy link
Member Author

test-backport-1.8

@jrajahalme
Copy link
Member Author

jrajahalme commented Jul 15, 2021

known flake #13400 (reopened) in test-1.17-4.19.
This is indirectly iptables related (the prior port number is read from iptables rules, so retesting.

@jrajahalme
Copy link
Member Author

jrajahalme commented Jul 15, 2021

Known flake #16551 in test-1.16-4.9 AND test-1.11-netnext, marked mitigation #16554 for v1.8 backporting.
The failed verified test is not related to this PR, no need for retesting these.

@jrajahalme
Copy link
Member Author

test-1.17-4.19

@jrajahalme jrajahalme added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 15, 2021
@joestringer joestringer merged commit 3dcf3cc into v1.8 Jul 15, 2021
@joestringer joestringer deleted the pr/v1.8-backport-2021-07-01 branch July 15, 2021 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants