Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2022-05-09 #19752

Merged
merged 11 commits into from
May 9, 2022
Merged

Conversation

kaworu
Copy link
Member

@kaworu kaworu commented May 9, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18957 19681 18620 19710 19711 19576 19158; do contrib/backporting/set-labels.py $pr done 1.11; done

or with

$ make add-label BRANCH=v1.11 ISSUES=18957,19681,18620,19710,19711,19576,19158

@kaworu kaworu requested a review from a team as a code owner May 9, 2022 08:17
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels May 9, 2022
@kaworu
Copy link
Member Author

kaworu commented May 9, 2022

@YutaroHayakawa BPF check failure is due to TRACE_REASON_UNKNOWN being undefined. It has been introduced by this PR, let me know what we should do.

@kaworu kaworu force-pushed the pr/v1.11-backport-2022-05-09 branch from a8c5231 to 9fdc4e2 Compare May 9, 2022 08:27
@YutaroHayakawa
Copy link
Member

@kaworu (cc: @qmonnet) Hi, thanks for the heads up! I think just replacing TRACE_REASON_UNKNOWN with 0 should be enough in this case. We also have an option to backport #19226, but we anyways release v1.12 soon and we probably won't request the backporting of the tracepoint bug to v1.11 branch anymore because it is not a "major" bug (c.f. https://docs.cilium.io/en/stable/contributing/release/backports/#backport-criteria).

@kaworu
Copy link
Member Author

kaworu commented May 9, 2022

@YutaroHayakawa replacing by 0 will effectively be the same as TRACE_REASON_POLICY (see CT_NEW) right? If correct, let me know if you prefer explicit 0 or TRACE_REASON_POLICY (can add a comment linking to #19226 either way).

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits look good, thanks!

@sayboras
Copy link
Member

sayboras commented May 9, 2022

my change looks good, but I got the below error while trying to verify it again locally

$ make -j $(nproc) build
... 
In file included from bpf_xdp.c:38:
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:1020:19: error: variable has incomplete type 'enum trace_point'
        enum trace_point obs_point;
                         ^
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:1020:7: note: forward declaration of 'enum trace_point'
        enum trace_point obs_point;
             ^
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:1033:48: error: use of undeclared identifier 'TRACE_REASON_UNKNOWN'; did you mean 'TRACE_REASON_POLICY'?
        send_trace_notify(ctx, obs_point, 0, 0, 0, 0, TRACE_REASON_UNKNOWN, 0);
                                                      ^~~~~~~~~~~~~~~~~~~~
                                                      TRACE_REASON_POLICY
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/trace.h:49:2: note: 'TRACE_REASON_POLICY' declared here
        TRACE_REASON_POLICY = CT_NEW,
        ^
In file included from bpf_xdp.c:38:
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:2092:19: error: variable has incomplete type 'enum trace_point'
        enum trace_point obs_point;
                         ^
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:2092:7: note: forward declaration of 'enum trace_point'
        enum trace_point obs_point;
             ^
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:2104:48: error: use of undeclared identifier 'TRACE_REASON_UNKNOWN'; did you mean 'TRACE_REASON_POLICY'?
        send_trace_notify(ctx, obs_point, 0, 0, 0, 0, TRACE_REASON_UNKNOWN, 0);
                                                      ^~~~~~~~~~~~~~~~~~~~
                                                      TRACE_REASON_POLICY
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/trace.h:49:2: note: 'TRACE_REASON_POLICY' declared here
        TRACE_REASON_POLICY = CT_NEW,

@qmonnet
Copy link
Member

qmonnet commented May 9, 2022

@YutaroHayakawa replacing by 0 will effectively be the same as TRACE_REASON_POLICY (see CT_NEW) right? If correct, let me know if you prefer explicit 0 or TRACE_REASON_POLICY (can add a comment linking to #19226 either way).

TL;DR: Yes, go with 0 instead.

The supposition above is correct. 0 was used both when the reason was unknown and for new conntrack connections. This was not convenient, and that's the reason why I changed it in #19226.

So 0 is the correct value to use, but there's an additional thing we need to check. Quoting a DM from Sebastian:

In Hubble, we maintain a list of trace points for which we know that they have the reason filed in send_trace_notify set. This is required because 0 is actually valid value for the reason field. So whenever Hubble reads a trace notify reason field and it's 0, we don't know if it's zero because it was a new connection or because the trace point statically sets it to zero. Thus this check here.
If you added all trace points to include the ct_reason, you can just add them to the list here.

In the case of Yutaro's PR, the context of the new traces will be either TRACE_TO_OVERLAY or TRACE_TO_NETWORK. Both contexts will cause TraceObservationPointHasConnState() (second link above) to return false, meaning that we will in fact consider the reason as unknown, meaning that Hubble won't pass it to the UI and won't cause flow parsing errors. So I'd say we're good with just using 0 here (cc @gandro just in case). Other trace points in v1.11 use 0 rather than TRACE_REASON_POLICY when the reason is not known, so let's stick with that.

abocim and others added 10 commits May 9, 2022 13:21
[ upstream commit 90c0d62 ]

... while clustermesh-apiserver is starting and new CEW with identity 0
is existing in etcd

In original implementation clustermesh-apiserver and etcd are running
in separate containers within one pod. So a new empty etcd is created while
clustermesh-apiserver is starting.

In our usecase we use an existing etcd which is shared with cilium-agent.
An error below has been occurring when a new CEW resource was already applied
into kubernetes and cilium-agent was already started on related external
workload machine while clustermesh-apiserver was not deployed yet.

$ ./clustermesh-apiserver --cluster-id=12 --cluster-name=uacl-test --k8s-kubeconfig-path ../../uacl/uacl-test.kubeconfig --kvstore-opt etcd.config=../../uacl/etcd.config
level=info msg="Started gops server" address="127.0.0.1:9892" subsys=clustermesh-apiserver
level=info msg="Starting clustermesh-apiserver..." cluster-id=12 cluster-name=uacl-test subsys=clustermesh-apiserver
level=info msg="Establishing connection to apiserver" host="https://uacl-test-api.test:31243" subsys=k8s
level=info msg="Connected to apiserver" subsys=k8s
level=info msg="Waiting until all Cilium CRDs are available" subsys=k8s
level=info msg="All Cilium CRDs have been found and are available" subsys=k8s
level=info msg="Initializing identity allocator" subsys=identity-cache
level=info msg="Creating etcd client" ConfigPath=../../uacl/etcd.config KeepAliveHeartbeat=15s KeepAliveTimeout=25s RateLimit=20 subsys=kvstore
level=info msg="Started health API" subsys=clustermesh-apiserver
level=info msg="Connecting to etcd server..." config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" subsys=kvstore
level=info msg="Got lease ID 320f7d7b1f23bc32" subsys=kvstore
level=info msg="Got lock lease ID 320f7d7b1f23bc34" subsys=kvstore
level=info msg="Initial etcd session established" config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" subsys=kvstore
level=info msg="Successfully verified version of etcd endpoint" config=../../uacl/etcd.config endpoints="[https://uacl-test-api.test:30108]" etcdEndpoint="https://uacl-test-api.test:30108" subsys=kvstore version=3.4.16
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1919aa3]

goroutine 213 [running]:
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).keyPath(0x0, {0x7f349c122c38, 0xc000a7e240})
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:276 +0x43
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).syncLocalKey(0x0, {0x2204f70, 0xc000050138}, {0x22051d8, 0xc000a7e240})
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:288 +0x87
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).UpdateKeySync(...)
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:375
main.(*VMManager).OnUpdate(0xc0005f6080, {0x21e9e10, 0xc000a7e0c0})
/home/abocim/go/src/github.com/cilium/cilium/clustermesh-apiserver/vmmanager.go:183 +0x3f4
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).onUpdate(...)
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:233
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).updateKey(0xc000128780, {0xc00025e85d, 0x15}, {0xc0000f6a00, 0xf3, 0x100})
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:414 +0x102
github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).watcher(0xc000128780, 0xc000270a20)
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:482 +0x73c
created by github.com/cilium/cilium/pkg/kvstore/store.(*SharedStore).listAndStartWatcher
/home/abocim/go/src/github.com/cilium/cilium/pkg/kvstore/store/store.go:447 +0x89

Signed-off-by: Adam Bocim <adam.bocim@seznam.cz>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit bb43425 ]

Since robots.txt file is now used in the 'html_extra_path' we will
move the sitemap-index.xml to the static directory. That way we can
robots can access the sitemap-index.xml file directly.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 21e6e6a ]

Currently Hubble-Relay builds its client list by querying
the Peer Service over the local Hubble Unix domain socket.
This goes against best security practices (sharing files
across pods) and is not allowed on platforms that strictly
enforce SELinux policies (e.g. OpenShift).

This PR enables, by default, the creation of a Kubernetes Service
that proxies the Hubble Peer Service so that Hubble-Relay
can use it to build its client list, eliminating the need
for a shared Unix domain socket completely.

Helm values and configurations have been added to enable
the service in a cilium deployment.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 88d31cd ]

The upgrade tests are using the official helm charts with unreleased
Cilium images. This can might cause the upgrade tests to fail in case
the changes done in the unreleased Cilium versions require a new helm
chart release. To fix this problem the upgrade tests will now use the
unreleased helm charts as well.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 67f74ff ]

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit e80f60d ]

This commit is to perform lift and shift current initialization for k8s
node watcher to another function with single scope of work, so that it
can be re-used later. There is no change in logic.

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 64b37e1 ]

This is to make sure that k8s node watcher is only setup at max once.
Also, synced channel is added, so that the consumer of this store
knows if syncing process is done.

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit edc1a0a ]

In the normal scenario, CiliumNode is created by agent with owner
references attached all time in below PR[0]. However, there could
be the case that CiliumNode is created by IPAM module[1], which
didn't have any ownerReferences at all. For this case, if the
corresponding node got terminated and never came back with same
name, the CiliumNode resource is still dangling, and needs to be
garbage collected.

This commit is to add garbage collector for CiliumNode, with below
logic:

- Gargage collector will run every predefined interval (e.g. specify
  by flag --nodes-gc-interval)
- Each run will check if CiliumNode is having a counterpart k8s node
  resource. Also, remove this node from GC candidate if required.
- If yes, CiliumNode is considered as valid, happy day.
- If no, check if ownerReferences are set.
  - If yes, let k8s perform garbage collection.
  - If no, mark the node as GC candidate. If in the next run, this
    node is still in GC candidate, remove it.

References:

[0]: cilium#17329
[1]: https://github.com/cilium/cilium/blob/master/pkg/ipam/allocator/podcidr/podcidr.go#L258

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 167fa2b ]

This commit is to map operator CLI node GC interval flag in helm value
and config map.

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit d378a91 ]

src_id variable in to_netdev in bpf_host.c is always zero, so safe to
delete.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@kaworu kaworu force-pushed the pr/v1.11-backport-2022-05-09 branch from 9fdc4e2 to 8139b39 Compare May 9, 2022 11:24
@kaworu
Copy link
Member Author

kaworu commented May 9, 2022

@qmonnet thanks a lot for the context, I've replaced TRACE_REASON_UNKNOWN by 0 as suggested and added a comment.

@sayboras the error you got was expected as we were discussing the fix, it should work now! 🙏

@sayboras
Copy link
Member

sayboras commented May 9, 2022

Thanks, I still have the below error :(

$ git rev-parse HEAD      

8139b3940a680183c27570862c600b2618b099aa


make[1]: *** [Makefile:216: bpf_xdp.ll] Error 1
llc -march=bpf -mattr=dwarfris -mcpu=v2 -filetype=obj -o bytecount.o bytecount.ll
In file included from bpf_host.c:55:
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:1020:19: error: variable has incomplete type 'enum trace_point'
        enum trace_point obs_point;
                         ^
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:1020:7: note: forward declaration of 'enum trace_point'
        enum trace_point obs_point;
             ^
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:2094:19: error: variable has incomplete type 'enum trace_point'
        enum trace_point obs_point;
                         ^
/home/tammach/go/src/github.com/cilium/cilium/bpf/lib/nodeport.h:2094:7: note: forward declaration of 'enum trace_point'
        enum trace_point obs_point;

@kaworu kaworu force-pushed the pr/v1.11-backport-2022-05-09 branch from 8139b39 to c1b00cd Compare May 9, 2022 13:32
[ upstream commit 322510d ]

In to-netdev in bpf_host.c, the packet tracing is missing when the
handle_nat_fwd uses tailcall. This PR adds new packet tracing inside the
handle_nat_fwd, but there were two complicated issue in here.

1. handle_nat_fwd is called from to-netdev as well as to-overlay

There are two possible caller for this function, so we needed to use
IS_BPF_OVERLAY macro to make sure where it is called.

2. handle_nat_fwd does not always tailcalls

Internally, it uses conditional tailcall. When it returns without
tailcall, we will hit the packet tracing hook at the bottom of the
to_netdev and have duplicated packet trace. We need to make sure whether
we already trace the packet or not and skip it.

Below is an example of simple ICMP Echo to outside of Pod network

Before the fix (with tailcall)
```
<- endpoint ... 10.0.1.185 -> 1.1.1.1 EchoRequest
-> stack ... 10.0.1.185 -> 1.1.1.1 EchoRequest
["-> network" is missing here]
<- network ... 1.1.1.1 -> 172.18.0.3 EchoReply
<- host ... 1.1.1.1 -> 10.0.1.185 EchoReply
-> endpoint ... 1.1.1.1 -> 10.0.1.185 EchoReply
```

Before the fix (without tailcall)
```
<- endpoint ... 10.0.1.185 -> 1.1.1.1 EchoRequest
-> stack ... 10.0.1.185 -> 1.1.1.1 EchoRequest
-> network ... 172.18.0.3 -> 1.1.1.1 EchoRequest
<- network ... 1.1.1.1 -> 172.18.0.3 EchoReply
<- host ... 1.1.1.1 -> 10.0.1.185 EchoReply
-> endpoint ... 1.1.1.1 -> 10.0.1.185 EchoReply
[no problem]
```

After the fix (with tailcall)
```
<- endpoint ... 10.0.1.185 -> 1.1.1.1 EchoRequest
-> stack ... 10.0.1.185 -> 1.1.1.1 EchoRequest
-> network ... 172.18.0.3 -> 1.1.1.1 EchoRequest
<- network ... 1.1.1.1 -> 172.18.0.3 EchoReply
<- host ... 1.1.1.1 -> 10.0.1.185 EchoReply
-> endpoint ... 1.1.1.1 -> 10.0.1.185 EchoReply
[no problem]
```

After the fix (without tailcall)
```
<- endpoint ... 10.0.1.185 -> 1.1.1.1 EchoRequest
-> stack ... 10.0.1.185 -> 1.1.1.1 EchoRequest
-> network ... 172.18.0.3 -> 1.1.1.1 EchoRequest
<- network ... 1.1.1.1 -> 172.18.0.3 EchoReply
<- host ... 1.1.1.1 -> 10.0.1.185 EchoReply
-> endpoint ... 1.1.1.1 -> 10.0.1.185 EchoReply
[no duplicated trace, no problem]
```

Fixes: cilium#18909

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@kaworu
Copy link
Member Author

kaworu commented May 9, 2022

@YutaroHayakawa I've replaced usage of enum trace_point by int as the bpf code used to do before 6343f4c.

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good for my commit :). Verified locally to make sure GC was done correctly.

$ ksyslo deployment/cilium-operator --timestamps | grep "subsys=watchers"
2022-05-09T13:52:42.607874474Z level=info msg="Starting to garbage collect stale CiliumNode custom resources" subsys=watchers
...
2022-05-09T13:57:42.608310224Z level=info msg="Add CiliumNode to garbage collector candidates" nodeName=dummy-one subsys=watchers
2022-05-09T13:57:42.608347044Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
...
2022-05-09T14:02:42.609286306Z level=debug msg="CiliumNode is valid, no gargage collection required" nodeName=minikube subsys=watchers
2022-05-09T14:02:42.609323065Z level=info msg="Perform GC for invalid CiliumNode" nodeName=dummy-one subsys=watchers
2022-05-09T14:02:42.615194899Z level=info msg="CiliumNode is garbage collected successfully" nodeName=dummy-one subsys=watchers

@nathanjsweet
Copy link
Member

To not break CI, we need to kind of backport the following PRs:

@aanm
Copy link
Member

aanm commented May 9, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-1.23-kernel-4.9' failed:

Click to show.

Test Name

K8sServicesTest Checks graceful termination of service endpoints Checks client terminates gracefully on service endpoint deletion

Failure Output

FAIL: Timed out after 60.001s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-4.9 so I can create one.

@aanm
Copy link
Member

aanm commented May 9, 2022

/test-1.23-4.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants