Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.10 backports 2021-12-16 #18276

Merged
merged 11 commits into from
Dec 21, 2021
Merged

Conversation

tklauser
Copy link
Member

Once this PR is merged, you can update the PR labels via:

$ for pr in 18196 18041 18198 18259 18247; do contrib/backporting/set-labels.py $pr done 1.10; done

gandro and others added 9 commits December 16, 2021 11:37
[ upstream commit 1cf3ef3 ]

The `reason` argument of `send_trace_notify` is intended to contain
connection tracking state (see TRACE_REASON_*). It is used by user-space
to filter out reply packets where possible and thus should be zero if no
ct state is available, to avoid misclassification.

The code in bpf_host erroneously populated this value with the verdict
instead. This commit removes those values and adds documentation of what
value may be passed in `send_trace_notify`.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 3d78837 ]

This commit introduces an enum for the `TRACE_REASON_*` values, to
ensure callers of `send_trace_notify` do not accidentally pass in
wrong values.

Suggested-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 5c0eb01 ]

This fixes a bug where Hubble wrongly populated the `is_reply` field for
`to-network` trace events, as it assumed these events populated the
`TraceNotify.Reason` field with connection tracking state. This however
turned out to be an error, and thus TO_NETWORK needs to be removed from
the list of observation points with CT state.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit f6c7825 ]

Currently, we check for the DSR IP option and then create the CT
entry with DSR flag only for new connections (CT_NEW). If a stale
CT entry exists (without DSR flag), then the DSR is not handled
for the new flow which leads to the rev-DNAT xlation not applied.
This commit fix this problem and update the DSR flag for connections
in the CT_REOPENED state.

Signed-off-by: ivan <i.makarychev@tinkoff.ru>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 13a7125 ]

Signed-off-by: chaosbox <ram29@bskyb.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 9f50a91 ]

Using it to upgrade to a new minor Cilium version has never been
supported and may break the Helm template in subtle ways due to the lack
of default values.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 979cae5 ]

Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 2ab15e9 ]

Create the tunnel map as non-persistent instead of marking it as such in
the package level init func.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit c083925 ]

Currently, when deleteTunnelMapping(oldCIDR *cidr.CIDR, quietMode bool)
is called with quietMode = true for an inexistent entry - as could be the
case on initial node addition - we would get ENOENT on the underlying
map operation in pkg/bpf/(*map).deleteMapEntry. This would lead to the
bpf-map-sync-cilium_tunnel_map controller being started to reconcile the
error. However, in some reported cases (see cilium#16488), the controller
seems to stay in an error state failing to delete the non-existent
entry.

Avoid this situation entirely by ignoring any map delete error in case
deleteTunnelMapping is called with quietMode = true.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
@tklauser tklauser requested review from a team as code owners December 16, 2021 10:39
@tklauser tklauser requested a review from rolinh December 16, 2021 10:39
@tklauser tklauser added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Dec 16, 2021
@tklauser tklauser requested a review from gandro December 16, 2021 10:40
[ upstream commit 8b9e890 ]

The tunnel map and thus its user space cache are only needed if either
tunneling or the IPv4 egress gateway is enabled. Currently, the user
space cache of the map is created unconditionally of whether it is
actually used, leading to e.g. the bpf-map-sync-cilium_tunnel_map being
spawend unnecessarily. This controller will e.g. show up in `cilium
status --all-controllers` and might lead to confusion in setups where
tunneling and the egress gateway feature are disabled. In some reported
cases (see cilium#16488) with tunneling disabled that controller also ended up
in an errors state, not being able to recoiver. Thus, only create the
user space cache map and thus the sync controller in case it is actually
needed to avoid this class of errors.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 386029b ]

The datapath.LocalNodeConfig.EnableEncapsulation field is immutable at
runtime [1], i.e. it is defined once at agent startup. The tunnel map is
created as non-persistent, meaning that any potentially pinned map would
be deleted on startup [2], [3]. In combination this means that there
cannot possibly be a case that there are left over old tunnel map
entries in the tunnel map if encapsulation is disabled.

[1] https://github.com/cilium/cilium/blob/6c169f63ec254de7777483b6f01c261215f9ec9c/pkg/datapath/node.go#L59-L64
[2] https://github.com/cilium/cilium/blob/6c169f63ec254de7777483b6f01c261215f9ec9c/pkg/maps/tunnel/tunnel.go#L48
[3] https://github.com/cilium/cilium/blob/6c169f63ec254de7777483b6f01c261215f9ec9c/pkg/bpf/map_linux.go#L104-L106

Signed-off-by: Tobias Klauser <tobias@cilium.io>
@tklauser
Copy link
Member Author

tklauser commented Dec 16, 2021

/test-backport-1.10

Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: connectivity-check pods are not ready after timeout

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.9 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.19-kernel-5.4' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

[empty]

Failure Output


If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-5.4 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes with L7 policy Tests NodePort with L7 Policy

Failure Output

FAIL: Request from k8s1 to service http://[fd03::ffae]:10080 failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.9 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sChaosTest Connectivity demo application Endpoint can still connect while Cilium is not running

Failure Output

FAIL: Found 1 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests L2-less with Wireguard provisioned via kube-wireguarder Tests NodePort BPF

Failure Output

FAIL: Can not connect to service "http://172.16.42.1:31191" from outside cluster (1/10)

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.16-net-next so I can create a new GitHub issue to track it.

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my commits! Thanks

@tklauser
Copy link
Member Author

tklauser commented Dec 16, 2021

/test-1.16-netnext

VM provisioning failure: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/2170/

@tklauser
Copy link
Member Author

tklauser commented Dec 16, 2021

@tklauser
Copy link
Member Author

Cilium L4LB XDP test hit #18211

@tklauser
Copy link
Member Author

tklauser commented Dec 16, 2021

/test-1.19-5.4

previous failure: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-5.4/1386/ (triaging)

@tklauser
Copy link
Member Author

tklauser commented Dec 16, 2021

/test-1.19-4.9

previous failure: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-4.9/1944/ (triaging)

@tklauser
Copy link
Member Author

tklauser commented Dec 16, 2021

/test-gke

previous failure: https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/7197/

@tklauser
Copy link
Member Author

/test-1.16-netnext

@tklauser
Copy link
Member Author

@tklauser
Copy link
Member Author

/test-gke

@tklauser
Copy link
Member Author

All remaining failures are known flakes, merging.

@tklauser tklauser merged commit 5cd7077 into cilium:v1.10 Dec 21, 2021
@tklauser tklauser deleted the pr/v1.10-backport-2021-12-16 branch December 21, 2021 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants