Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2022-01-26 #18630

Merged
merged 19 commits into from
Jan 30, 2022
Merged

Conversation

kkourt
Copy link
Contributor

@kkourt kkourt commented Jan 26, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18483 18538 18546 18499 18484 18479 18592 18564 18606 18582 18554 18553 18469; do contrib/backporting/set-labels.py $pr done 1.11; done

Conflicts

pchaigno and others added 19 commits January 26, 2022 10:47
[ upstream commit 9708e5d ]

Commit 49cb220 ("iptables: Don't masquerade traffic to cluster
nodes") introduced ipsets in our iptables rules, to skip masquerading
traffic to cluster nodes when iptables-based masquerading is used.
BPF-based masquerading already implemented this logic.

In practice, traffic to both Internal and External node IPs would skip
masquerading. However, Kubernetes doesn't state that External Node IPs
are routable within the cluster. Masquerading may thus be required for
External Node IPs, as reported by several users.

Hence, this commit changes the logic to only skip masquerading in
iptables for Internal node IPs. This commit doesn't affect the BPF-based
masquerading logic, as unfortunately we can't easily distinguish between
Internal and External node IPs in our datapath today (both are
identified as host or remote-node).

[ Backport notes: This commit introduces a typo that is going to be
addresses by a backport of 9b5b964 in
the same series ]

Fixes: 49cb220 ("iptables: Don't masquerade traffic to cluster nodes")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit aaf0a75 ]

This commit documents the difference in handling traffic to External IPs
of cluster nodes between the BPF-based and iptables-based
implementations of masquerading in Cilium.

Suggested-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 87bdb30 ]

Commit 2bede03 ("release: Generate helm values docs") broke the
release scripts for branches v1.10 and earlier, as it introduced a
dependency on a new Makefile target that has not been backported.
Grep for the dependency and skip it silently if it's not available.

Fixes: 2bede03 ("release: Generate helm values docs")
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 8f1979d ]

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 966cc4c ]

This commit adds the kernel requirements for advanced features such as
IPsec, iptables-based masquerading, and the bandwidth manager. Those
kernel configuration options were determined by making a
somewhat-minimal kernel configuration for our CI.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 2fdfd91 ]

The db5300d commit normalized the KPR initialisation routines by
making them to return an error. Unfortunately, in some error returns
it forgot to add actual errors. This makes debugging of KPR init
difficult. For instance:

    "Detected devices" devices="[enp0s9 eth0]"
    <..>
    "failed to finalise LB initialization: Cannot retrieve enp0s9 link"

The actual error for the link retrieval is missing.

Fixes: db5300d ("choir: normalize error handling in kube_proxy_replacement.go")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit ffe59b3 ]

When host firewall feature is enable, to-stack trace can be generated
even if the packets are rejected by host firewall policies. Generate trace
only when the packets are actually delivered to the stack.

Fixes: cilium#12562
Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 97c473f ]

When we use JSON output of cilium monitor, it would be useful if we
could filter the output by the tools like JQ. However, since some of the
non-JSON messages are generated through stdout, it ends up to the JSON
parse error. Fix `cilium monitor` command to generate non-JSON messages
to stderr.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 92bf6c6 ]

Previously, the PCAP recorder [1] was enabled only when running the
lb-only mode. However, there was an ask from users, who are running KPR
in the XDP mode, to have means to observe the LB traffic (tcpdump
cannot be used for XDP progs).

Why we didn't allow it before? Our main concern was potential verifier
complexity issues. But considering that the opt is disabled by default,
it's up to a user to take the potential risk.

Tested manually on the 5.16 kernel with the following cmds:

    ./daemon/cilium-agent  --debug=true --enable-ipv4=true
    --enable-ipv6=false --disable-envoy-version-check=true
    --tunnel=disabled --k8s-kubeconfig-path=/home/vagrant/.kube/config
    --kube-proxy-replacement=strict --native-routing-cidr=10.154.0.0/16
    --enable-ipv4-masquerade=true --auto-direct-node-routes=true
    --enable-hubble=true --enable-recorder=true --bpf-lb-acceleration=native
    --bpf-lb-acceleration=testing-only

    # create nginx pod and expose it via NodePort service

    hubble record --server=unix:///var/run/cilium/hubble.sock "0.0.0.0/0
    0 192.168.34.11/32 31894 TCP"

    # access the svc from outside

    2022-01-24T10:25:03Z Status: 14 packets (1106 bytes) written

[1]: https://cilium.io/blog/2021/05/20/cilium-110#pcap

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit f9a6011 ]

In lookupDefaultRoute(), to determine the default route for a given IP
family we list all routes filtering just by the 0.0.0.0 destination.

On some EKS setups where multilple NICs are attached to an instance,
this is not enough to identify a single route, since each interface has
its own default route.

The following example illustrates the problem:

    $ ks exec ds/cilium -- ip route list
    default via 10.100.1.1 dev eth0
    default via 10.100.255.65 dev eth1 metric 10001
    169.254.169.254 dev eth0

This will cause the agent to report a fatal error, as it can't find a
single gateway:

    cilium-agent level=fatal msg="Error while creating daemon" error="unable to install ip rule for ENI multi-node NodePort: failed to find interface with default route: Found default routes with different netdev ifindices: 2 vs 3" subsys=daemon

To address this issue, this commit changes the logic of
lookupDefaultRoute() to pick the one with the lowest priority.

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 4145b6a ]

In the current implementation of HostToNetwork{16,32} and
NetworkToHost{16,32} for big endian architectures the Go compiler isn't
able to convert these operations into native assembly instructions, e.g.
BSWAP on x86_64.

Use ReverseBytes{16,32} from the math/bits stdlib package to implement
these functions which allows the Go compiler to emit the respective
assembly instructions which generally yield better performance.

The Go toolchain test suite checks that the correct native assembly
instructions are generated for the respective math/bits functions:
https://github.com/golang/go/blob/16d6a5233a183be7264295c66167d35c689f9372/test/codegen/mathbits.go#L194-L208

Example output of `go tool compile -S byteorder_littleendian.go` on
x86_64 for the HostToNetwork{16,32} functions below.
NetworkToHost{16,32} look the same. Note the ROLW and BSWAPL
instructions emitted after this change.

Before:

```
"".HostToNetwork16 STEXT nosplit size=20 args=0x8 locals=0x0 funcid=0x0
        0x0000 00000 (byteorder_littleendian.go:13)     TEXT    "".HostToNetwork16(SB), NOSPLIT|ABIInternal, $0-8
        0x0000 00000 (byteorder_littleendian.go:13)     FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:13)     FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:13)     FUNCDATA        $5, "".HostToNetwork16.arginfo1(SB)
        0x0000 00000 (<unknown line number>)    NOP
        0x0000 00000 (byteorder_littleendian.go:13)     MOVL    AX, CX
        0x0002 00002 (byteorder_littleendian.go:19)     ANDL    $-256, AX
        0x0007 00007 (byteorder_littleendian.go:19)     SHRW    $8, AX
        0x000b 00011 (byteorder_littleendian.go:19)     MOVBLZX CL, CX
        0x000e 00014 (byteorder_littleendian.go:19)     SHLL    $8, CX
        0x0011 00017 (byteorder_littleendian.go:19)     ORL     CX, AX
        0x0013 00019 (byteorder_littleendian.go:13)     RET
        0x0000 89 c1 25 00 ff ff ff 66 c1 e8 08 0f b6 c9 c1 e1  ..%....f........
        0x0010 08 09 c8 c3                                      ....
"".HostToNetwork32 STEXT nosplit size=45 args=0x8 locals=0x0 funcid=0x0
        0x0000 00000 (byteorder_littleendian.go:14)     TEXT    "".HostToNetwork32(SB), NOSPLIT|ABIInternal, $0-8
        0x0000 00000 (byteorder_littleendian.go:14)     FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:14)     FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:14)     FUNCDATA        $5, "".HostToNetwork32.arginfo1(SB)
        0x0000 00000 (<unknown line number>)    NOP
        0x0000 00000 (byteorder_littleendian.go:14)     MOVL    AX, CX
        0x0002 00002 (byteorder_littleendian.go:23)     ANDL    $-16777216, AX
        0x0007 00007 (byteorder_littleendian.go:23)     SHRL    $24, AX
        0x000a 00010 (byteorder_littleendian.go:23)     MOVL    CX, DX
        0x000c 00012 (byteorder_littleendian.go:23)     ANDL    $16711680, CX
        0x0012 00018 (byteorder_littleendian.go:23)     SHRL    $8, CX
        0x0015 00021 (byteorder_littleendian.go:23)     ORL     CX, AX
        0x0017 00023 (byteorder_littleendian.go:23)     MOVL    DX, CX
        0x0019 00025 (byteorder_littleendian.go:23)     ANDL    $65280, DX
        0x001f 00031 (byteorder_littleendian.go:23)     SHLL    $8, DX
        0x0022 00034 (byteorder_littleendian.go:23)     ORL     DX, AX
        0x0024 00036 (byteorder_littleendian.go:23)     MOVBLZX CL, CX
        0x0027 00039 (byteorder_littleendian.go:23)     SHLL    $24, CX
        0x002a 00042 (byteorder_littleendian.go:23)     ORL     CX, AX
        0x002c 00044 (byteorder_littleendian.go:14)     RET
        0x0000 89 c1 25 00 00 00 ff c1 e8 18 89 ca 81 e1 00 00  ..%.............
        0x0010 ff 00 c1 e9 08 09 c8 89 d1 81 e2 00 ff 00 00 c1  ................
        0x0020 e2 08 09 d0 0f b6 c9 c1 e1 18 09 c8 c3           .............
```

After:

```
"".HostToNetwork16 STEXT nosplit size=5 args=0x8 locals=0x0 funcid=0x0
        0x0000 00000 (byteorder_littleendian.go:16)     TEXT    "".HostToNetwork16(SB), NOSPLIT|ABIInternal, $0-8
        0x0000 00000 (byteorder_littleendian.go:16)     FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:16)     FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:16)     FUNCDATA        $5, "".HostToNetwork16.arginfo1(SB)
        0x0000 00000 (<unknown line number>)    NOP
        0x0000 00000 (byteorder_littleendian.go:16)     ROLW    $8, AX
        0x0004 00004 (byteorder_littleendian.go:16)     RET
        0x0000 66 c1 c0 08 c3                                   f....
"".HostToNetwork32 STEXT nosplit size=3 args=0x8 locals=0x0 funcid=0x0
        0x0000 00000 (byteorder_littleendian.go:17)     TEXT    "".HostToNetwork32(SB), NOSPLIT|ABIInternal, $0-8
        0x0000 00000 (byteorder_littleendian.go:17)     FUNCDATA        $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:17)     FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
        0x0000 00000 (byteorder_littleendian.go:17)     FUNCDATA        $5, "".HostToNetwork32.arginfo1(SB)
        0x0000 00000 (byteorder_littleendian.go:17)     BSWAPL  AX
        0x0002 00002 (byteorder_littleendian.go:17)     RET
        0x0000 0f c8 c3                                         ...
```

Fixes: 332b16f ("byteorder: Simplify byte order functions")
Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 9b5b964 ]

Due to a typo in the condition for the removal of IP addresses from the
node ipset in commit 9708e5d ("node: Don't skip masquerading for
External node IPs"), the ipsets would have never been cleaned up. This
commit fixes it.

Fixes: 9708e5d ("node: Don't skip masquerading for External node IPs")
Reported-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit c7d9836 ]

This commit enables CI for all feature branches based on the master one.

The naming convention for the base feature branch is:

  ft/master/<feature>

[ Backkport note: for v1.11, the feature branch is ft/v1.11/<feature> ]

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit cc85af8 ]

Currently CI will run only on the base cilium/cilium repo, due to a
check in most workflows in the form of:

    if: ${{ github.repository == 'cilium/cilium' }}

The original intent of this check was to avoid running CI on forks,
since workflow secrets are usually not configured, resulting in multiple
failures.

It turned out that the ability to run CI on forks is something useful to
have, and as a bonus point it would ease the development and maintenance
of new and existing workflows.

Because of this, this commit removes the aforementioned check.

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 7818a5f ]

This commit is to expose xfrm stats via prometheus metrics if IPSec is
enabled.

Fixes: cilium#14725

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit b1940f2 ]

Calls to OpenMap register the resulting map in a global array. This behaviour
is only intended in some cases, so implement our own version of LoadPinnedMap
that takes care of populating the pin path field in our custom Map struct.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 3b00e46 ]

Wrap errors from InitMaglevMaps to make the code easier to navigate.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit ca134d4 ]

While working on a solution to cilium#18037, I encountered a few minor issues with
the Maglev code. This commit aims to make the code a bit more idiomatic.

- Fix a missing ipv6 (or ipv4) Maglev map from breaking`bpf lb maglev list`
  by always performing both dumps and ignoring missing maps.
- Reduced the length of the 'get' handler in particular by carving the logic
  out into separate functions.
- Implemented more logic as methods instead of free functions wherever it made
  sense.
- Added sanity checks to prevent e.g. double-initialization of the Maglev maps,
  since their handles are stored in global variables. (this should be addressed
  at some point)
- Store the Maglev table size intended by the user in a global var to avoid
  having multiple copies littered throughout outer/inner map structs.
- Use errors to convey error states to the caller instead of opaque catch-all
  'UnknownMaglevTableSize' return.
- Provide helpers like MaglevOuterMap.UpdateService() that are friendlier to
  work with and somewhat easier to reason about.
- Provide dump methods on both outer and inner map types.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 8789c95 ]

To better reflect what it does (loading a map from a bpffs map pin and
registering it in the 'mapRegister' global), rename this to something that
doesn't look as innocuous as 'open a map'. This can leak (potentially closed)
map handles if not used with care, and should probably be changed later on.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
@kkourt kkourt requested a review from a team as a code owner January 26, 2022 14:24
@kkourt kkourt requested a review from a team January 26, 2022 14:24
@kkourt kkourt requested review from a team as code owners January 26, 2022 14:24
@kkourt kkourt added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Jan 26, 2022
@kkourt kkourt requested review from joestringer, brb, jibi and tklauser and removed request for joestringer January 26, 2022 14:25
@kkourt
Copy link
Contributor Author

kkourt commented Jan 26, 2022

/test-backport-1.11

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good 👍 Thanks Kornilios!

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good for my change.

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes LGTM, thanks.

Copy link
Member

@jibi jibi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ for my changes, thanks!

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good for my commits 💯

@@ -14,6 +14,7 @@
/pkg/byteorder/ @cilium/bpf @cilium/api
/pkg/client @cilium/api
/pkg/k8s/apis/cilium.io/v2/ @cilium/api
/pkg/datapath/linux/ipsec/xfrm_collector* @cilium/metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like the original list is following alphabetical order.

@ti-mo
Copy link
Contributor

ti-mo commented Jan 27, 2022

Thanks Kornilios, looks good!

@YutaroHayakawa
Copy link
Member

YutaroHayakawa commented Jan 28, 2022

Thanks Kornilios my changes looks good to me!

@glibsm
Copy link
Member

glibsm commented Jan 28, 2022

/test-backport-1.11

@glibsm glibsm added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 30, 2022
@glibsm glibsm merged commit b248711 into cilium:v1.11 Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet