Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.10 backports 2021-05-12 #16103

Merged
merged 21 commits into from
May 13, 2021
Merged

Conversation

glibsm
Copy link
Member

@glibsm glibsm commented May 12, 2021

Once this PR is merged, you can update the PR labels via:

$ for pr in 15975 15942 16006 16020 16066 16017 16027 16070 16068 16056 16045 15979 15978 16057; do contrib/backporting/set-labels.py $pr done 1.10; done

@glibsm glibsm requested a review from a team as a code owner May 12, 2021 02:45
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels May 12, 2021
@glibsm
Copy link
Member Author

glibsm commented May 12, 2021

Two PRs didn't get their commits correctly picked up by the scripts. I manually found them in the log and grabbed the shas. I'm not sure what confused the scripts. Is this common?

 * PR: 16006 -- docs: update OpenShift getting started guide (@twpayne) -- https://github.com/cilium/cilium/pull/16006
   Merge with 1 commit(s) merged at: Mon, 10 May 2021 13:56:34 +0000!
     Branch:     master (!)                          refs/pull/16006/head
                 ----------                          -------------------
     v (start)
     |  Warning: No commit correlation found!    via a4b7645a9b90303e54eee9e9edfc609402ac7cca ("docs: update OpenShift getting started guide")
     v (end)

     GS: Manually pulled up `421314d9b` from the git log

 * PR: 15979 -- docs: improve the aws-cni chaining page (@bmcustodio) -- https://github.com/cilium/cilium/pull/15979
   Merge with 1 commit(s) merged at: Tue, 11 May 2021 21:56:30 +0000!
     Branch:     master (!)                          refs/pull/15979/head
                 ----------                          -------------------
     v (start)
     |  Warning: No commit correlation found!    via 216ddbb299fc4bcfa42b0cc6539a2b6bc8fdeec3 ("docs: improve the aws-cni chaining page")
     v (end)

     GS: Manually found 3b350ce81

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits look good, but it seems like all the commits in the PR are not signed off by you, but rather the author references vagrant.

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15975 -- docs: improve and fix minor issues (@qmonnet)

Looks good (apart from the SOB issue reported by Chris already).

As far as I'm concerned, I don't remember seeing the commit correlation issue before :/.

@bmcustodio
Copy link
Contributor

#15979 -- docs: improve the aws-cni chaining page (@bmcustodio)

@glibsm my commit (216ddbb) seems to have been picked up correctly as 4c2874b.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits look good 👍

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for my commit (apart from Signed-off-by already pointed out by Chris)

Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me.

Copy link
Contributor

@twpayne twpayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenShift GSG changes LGTM.

Copy link
Contributor

@joamaki joamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commit is OK.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the sign-off as pointed out by others

@glibsm
Copy link
Member Author

glibsm commented May 12, 2021

My commits look good, but it seems like all the commits in the PR are not signed off by you, but rather the author references vagrant.

Thanks for pointing it out, that's really bizarre since I didn't even shut down this vagrant since last backport. Strange. I'll force push a fix

qmonnet and others added 16 commits May 12, 2021 07:31
[ upstream commit b99403f ]

Provide minor improvements of the text or its formatting at various
places of the documentation.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit d35c94a ]

The host firewall is now supported when running with endpoint routes
enabled. We can remove the note from the getting started guide.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit cb4c6ad ]

Detail why we won't loop when jumping from bpf_lxc to bpf_host and back
for ingress to the pods with host firewall enabled.

Suggested-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 9ab5c8e ]

When endpoint routes are enabled, the host firewall requires the
remote-node identity to be enabled as well to function properly.
The host identity is used to detect traffic that should be sent through
bpf_host on egress/ingress of bpf_lxc. If the remote-node identity is
disabled, the host identity then also matches remote-node IPs.

Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit bf48876 ]

HOST_EP_ID was defined as a static data variable in a695f53 ("Endpoint
for host") because the ID of the host endpoint wasn't known at the time
we generated the datapath header files. Commit 81f0626 ("datapath: Include
the host endpoint ID in all endpoint headers") changed this by making
the host endpoint ID available globally to the agent via the node
package and by setting the ID sooner on startup.

We can therefore define the HOST_EP_ID as a normal constant now, without
needing static data. That in turns allows us to revert commit 04cbfa1
("bpf: Lift constraint for inline asm of static tail calls").

We however still use a template-specific value in bpf_host for the BPF
program injected in the policy map (see TEMPLATE_HOST_EP_ID). We could
instead directly use HOST_EP_ID for the program's name, but we would
then have to ignore that symbol when replacing symbols in the ELF [1].
To ignore that symbol we would also have to ignore similar symbols
corresponding to pod endpoints (i.e., all "1/xxxx" symbols), opening the
door to silent errors. It's probably not worth it since there's no
benefit to using HOST_EP_ID for the program's name directly.

1 - https://github.com/cilium/cilium/blob/v1.10.0-rc0/pkg/datapath/loader/cache.go#L39
Suggested-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 421314d ]

* Correct some minor errors in scripts.
* Update openshift-install output for latest stable version.
* Add instructions on how to delete the cluster.
* Minor grammatical tweaks.
* Minor formatting tweaks.

Signed-off-by: Tom Payne <tom@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 6ae08fd ]

For more optimal packet sizes in large MTU setups, set the
cilium_wg0 MTU based on the detected MTU minus overhead.

Additionally set the route MTU on container side to detected
MTU minus wireguard overhead when wireguard is enabled.

Tested manually with "--mtu 1400", causing cilium_wg0 and
route MTU to be set to 1320.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit e147250 ]

go.mod is currently using a revoked wireguard version. To fix this we
will update to the latest wireguard commit.
More info: WireGuard/wgctrl-go#102

Reported-by: Vlad Ungureanu <vladu@palantir.com>
Suggested-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 2e0603b ]

To make it easier debug or share configurations across users and
developers, Cilium will store its viper.Config as well as agent's
configuration in files under `/var/run/cilium`. It will store up to
the last 3 previous configurations.

Another use case for it will be to check the previous configuration run
by Cilium in case certain steps need to be executed, for example, to
clean up state left from a previous run that will not be cleaned up by a
new set of flags.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 4eafc09 ]

Following the vote and topic raised on 2021-05-05 SIG-Datapath, since
the primary community meeting is moved to Wednesday, the SIG-Datapath
meeting is moved to Thursday.

Q: When would you prefer for SIG-Datapath to meet?

Votes:
    Tuesday 8am PT, 5pm CET     3
    Tuesday  9am PT, 6pm CET    3
    Thursday 8am PT, 5pm CET    6
    Thursday 9am PT, 6pm CET    2

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f7bd191 ]

This pulls in vishvananda/netlink#637 and
vishvananda/netlink#648 with data race fixes.

Fixes cilium#15438

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 9fa9672 ]

ValidateEndpointsAreCorrect ensures that all Docker containers have a
corresponding Cilium endpoints. The information necessary to match
containers with Cilium endpoints is however only available once the
endpoints are ready (have received labels, identity, and other
information). We therefore need to wait for all endpoints to be ready
before running ValidateEndpointsAreCorrect.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f578ba5 ]

PR cilium#15125 broke race detector pipelines due to the `RACE` environment
variable not being present, when it is required for
`SkipRaceDetectorEnabled` to work properly.

A quick inspection leads me to believe `LOCKDEBUG` also is required for
package `lock`. I am less convinced about `BASE_IMAGE` being required,
but I'm adding it back anyway in a sort of spiritual revert until we
figure it out.

Fixes cilium#15214: the Kafka tests were actually already disabled for race
detector pipelines, as per cilium#13757. They were only running the pipelines
due to cilium#15125 re-enabling them incidentally.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 437e2bb ]

This is meant as a temporary workaround for cilium#16007.

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 3b350ce ]

Improve the AWS VPC CNI plugin chaining page. Also, make use of the
Cilium CLI to check for the status of the installation and for
performing the connectivity test.

Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 1208496 ]

This is also mirror the Wireguard documentation.

Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Signed-off-by: Glib Smaga <code@gsmaga.com>
kaworu and others added 5 commits May 12, 2021 07:32
[ upstream commit 6f695aa ]

Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 12dcef5 ]

Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit bbec696 ]

Previously, the tests would leave state on the host machine.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit ebd3833 ]

This will be used in the subsequent commit, when creating a XFRM policy
specifically to allow to-proxy traffic.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit a9f18f3 ]

This is needed to fix the L7 ingress policy case.

In tunneling mode when a packet is received on the destination node, it
makes two passes through the stack. The first pass decrypts the packet
because it matches the XFRM IN policy with mark 0xd00, indicating it
needs decryption. The second pass through, since L7 ingress policy is
enabled, the packet mark is set to 0x200 meaning the packet is destined
to the proxy. The problem occurs because there is only one XFRM IN
policy matching on mark 0xd00. Since the packet mark is 0x200, the match
fails and the packet is dropped by the stack.

Therefore, we add a new XFRM policy that matches packets destined for
the proxy so that they're allowed.

Why doesn't this happen in direct routing mode? The reason is because
the skb extension bits[1] are cleared in DR, whereas they are not in
tunneling. When the bits are toggled on, then this causes extra logic to
be executed in the kernel inside `__xfrm_policy_check()`. This logic
upon a policy lookup failure drops the packet [2]. When the bits are
cleared, there is no logic to cause a drop upon policy lookup failure.

Why are the skb extension bits cleared in DR and not in tunneling?
Because the packet path traversal in DR is `cilium_host` -> `cilium_net`
-> stack, where the veth pair of `cilium_host` and `cilium_net` calls
the kernel `veth_forward_skb()`, which eventually calls
`skb_scrub_packet()` where the extension bits are cleared. The path for
tunneling is `cilium_{vxlan,geneve}` -> stack, where there is no veth
pair traversal, and thus no call to `skb_scrub_packet()`.

Hence why we only create a new XFRM policy in tunneling mode.

(This was debugged with the help of the following bpftrace script:
https://gist.github.com/christarazi/4bb48eb623a03f25026be21856ea10fb)

[1]:
https://elixir.bootlin.com/linux/v5.12.2/source/net/xfrm/xfrm_policy.c#L3558

[2]:
https://elixir.bootlin.com/linux/v5.12.2/source/net/xfrm/xfrm_policy.c#L3590

Co-authored-by: John Fastabend <chris@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
@glibsm glibsm force-pushed the pr/v1.10-backport-2021-05-12 branch from d66e0c8 to 38ba079 Compare May 12, 2021 14:33
@christarazi
Copy link
Member

test-backport-1.10

@pchaigno
Copy link
Member

pchaigno commented May 12, 2021

Triggering manually because test-backport trigger wasn't set for that job. Now fixed.
test-1.19-5.4

@aanm aanm merged commit 9f5dd00 into cilium:v1.10 May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet