-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.9 backports 2020-10-23 #13720
v1.9 backports 2020-10-23 #13720
Conversation
[ upstream commit 6747a84 ] The metadata resolver controller can be put in a scenario where it can never succeed if the Cilium K8s pod cache is out-of-sync with the current view of pods in the cluster. The cache can be out-of-sync if there's heavy load on the cluster and especially on the apiserver. This leaves the Cilium pod cache starved. When the pod cache is starved, Cilium may never have an up-to-date view of the pods running in the cluster, and therefore may never be able to resolve labels for pods because the fetch will return "pod not found". Eventually, kubelet (or maybe even the user) will give up on the pod and remove it, thereby Cilium begins removing the endpoint. The controller is stopped, but the goroutine waiting on the `done` channel will never receive, hence the leak. This commit fixes the goroutine leak when the controller never succeeds and therefore, never closes the `done` channel. The fix is to add a `select` statement to also watch the endpoint's `aliveCtx` which is cancelled (and the context's `Done` channel is closed) when the endpoint is deleted. This commit was validated by forcing the metadata resolver to never find a pod (manually hardcode the wrong pod name), and ensuring that the `gops stack` output does not contain an entry like: ``` goroutine 1244259 [chan receive, 1434 minutes]: github.com/cilium/cilium/pkg/endpoint.(*Endpoint).RunMetadataResolver.func1(0xc0055d2a20, 0xc0049ff600, 0xc0055d2ae0, 0x5d) /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1531 +0x34 created by github.com/cilium/cilium/pkg/endpoint.(*Endpoint).RunMetadataResolver /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1530 +0x11e ``` Fixes: #13680 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit cb9aa3e ] This controller's run interval is actually unnecessary as this controller is intended to only run once as long as it succeeds. This is evidenced by the fact that we have a `done` channel that's closed when the controller succeeds and a goroutine waiting `done` to be closed to remove the controller, thereby stopping it. In addition, in the very unlikely case that the controller is scheduled again to run _and_ the goroutine waiting for the `done` channel to be closed has not run yet, the controller could panic because it would close the channel twice. This commit prevents that as the interval defaults to 10m if not provided. If the goroutine waiting on the channel doesn't run within 10m, then we very likely have much worse problems at hand. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit e8bce62 ] As of Azure Linux kernel 5.4.0-1022 the necessary patches to the hv_netvsc have been backported, see [1]. This allows to run NodePort XDP on Azure AKS when using the Ubuntu 18.04 node image which provides said kernel version. [1] https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1877654 Update the instructions accordingly. In addition the nodePort device helm option needs to be set explicitly to eth0, as otherwise bpf_host.o would erroneously be bound to the azure0 interface. For #13627 Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 251fd53 ] Fixes: - Fix Helm option: "--set devices=ethX,ethY" does not work (missing curly braces and quotes). - Rename one occurrence of "access=ssh" into "node-access=ssh", for consistency. - Add a missing closing parenthesis at the end of the command that sets the HOST_EP_ID variable. - Use jsonpath and kubectl's options in one command to avoid piping the output to jq. Additions: - Explain what the interfaces passed to the "--devices" option are for. This is mostly copied from the description in the Host Policies description (Layer 3 Examples). - Explain that per-endpoint routing is not compatible with the host firewall and must be disabled, in particular on managed environments like GKE. - Add node label removal to the clean-up section. - Format shell session code as shell session code. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 3c97c22 ] The upstream types have a lot of information that Cilium does not need so we can slim them down. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 71b4fb5 ] In upstream Kubernetes the core, discovery and networking packages need to be in the api package. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 1314600 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 620b243 ] When Cilium is waiting for the CRDs to be available in the cluster, Cilium does not need process entire CRD nor Table objects as it is only checking for the existence of the CRD, by checking its name. Thus, we can use a slimmer version of the CRD as well as a slimmer version of the Table structures to avoid resource consumption during initialization. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit ac34b34 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 3a803bf ] During initialization of CRDs, if Cilium can't watch for POMs it can fall back to watch for v1beta1.Table, instead of processing the entire CRDs. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 228a485 ] During the endpoint restoration process, when we parse the endpoints, we assign them a reserved init identity if they don't already have an identity [0]. If we later remove the endpoint (because the corresponding K8s pod or interface are missing), we attempt to remove the identity from the identity manager. That last operation results in the following error message because the init identity was never added to the manager. level=error msg="removing identity not added to the identity manager!" identity=5 subsys=identitymanager This commit fixes it by skipping the removal attempt from the manager in the case of identity init. 0 - https://github.com/cilium/cilium/blob/80a71791320df34df5b6252b9680553e38d88d20/pkg/endpoint/endpoint.go#L819 Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 8cd003c ] An example CLI output: cuiwl@cuiwl:~$ kubectl exec -itn kube-system anetd-jgsvb -- cilium lrp list cuiwl@cuiwl:/google/src/cloud/cuiwl/LRP/google3$ kubectl exec -itn kube-system anetd-v2j52 -- cilium lrp list LRP namespace LRP name FrontendType Matching Service default lrp-addr IP + port | 169.254.169.254:8080/TCP -> kube-system lrp-svc clusterIP + named ports kube-system/kube-dns | 10.84.48.10:53/UDP -> 10.100.1.2:53(kube-system/node-local-dns-wgbjm), | 10.84.48.10:53/TCP -> 10.100.1.2:53(kube-system/node-local-dns-wgbjm), Signed-off-by: Weilong Cui <cuiwl@google.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 64e681d ] This commit is to add required argument for `vagrant port` command to avoid below error ```shell script $ vagrant port --guest 6443 The provider reported there are no forwarded ports for this virtual machine. This can be caused if there are no ports specified in the Vagrantfile or if the virtual machine is not currently running. Please check that the virtual machine is running and try again. ``` Relates to b68af48 Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit bfc0078 ] This release fixes a bug which prevents certain environment variables from being used for configuration. See the release notes[0] for details. [0]: https://github.com/cilium/hubble/releases/tag/v0.7.1 Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit f648be6 ] This commit unifies the `cilium_version` and `cilium_tag` variables in the Vagrantfile for the getting started tutorial into a single variable, `cilium_version`. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit a8e67f1 ] When submitting the backport PR using submit-backport, the script proposes to update the labels (i.e., remove needs-backport/X and add backport-pending/X): Sending pull request... Everything up-to-date #13700 Updating labels for PRs 13383 13608 12975 Set labels for all PRs above? [y/N] y Setting labels for PR 13383... ✓ Setting labels for PR 13608... ✓ Setting labels for PR 12975... ✓ The choice defaults to not updating the labels. That may give the wrong impression that it is an optional step---and if you're like me, when you're unsure what an optional step does, you skip it. We should default to setting the labels because we later rely on the labels being set (e.g., when we update them after the PR is merged). This commit changes it to the following: Sending pull request... Everything up-to-date #13700 Updating labels for PRs 13383 13608 12975 Set labels for all PRs above? [Y/n] Setting labels for PR 13383... ✓ Setting labels for PR 13608... ✓ Setting labels for PR 12975... ✓ Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit ec16cab ] If the controller that is used for label resolution fails, the prometheus metrics will increase its cardinality since the uniquely controller name was being used as a prometheus label. To avoid this, we will reference these warnings with a common subsystem name, 'resolve-labels'. Fixes: a31ab29 ("endpoint: Run labels controller under ep manager") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 770bad5 ] ".. code:: shell-session" does not work to provide syntax highlighting for shell sessions, this needs to be ".. code-block:: shell-session". Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit e168e09 ] Helm 2 support was deprecated as of commit 31f1306 ("feat(helm): Move requirements.yaml to Chart.yaml"), and will reach end of support from upstream on November 13, 2020 [0]. Remove references to Helm 2. [0] https://helm.sh/blog/helm-v2-deprecation-timeline/ Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 532bbd5 ] While proofreading the upgrade guide, I found some phrasings awkward and some duplicate text. Improve it by fixing the grammar / tense and rewording sentences that flow poorly. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 76505f7 ] This paragraph seemed to be vaguely referencing some section that might exist in the future at some point, which doesn't make any sense. We can only refer users to the documentation that exists below. Do so. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 6b4e3c0 ] Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit e62fd90 ] Since Cilium 1.6, we have provided Helm charts and have instructed users to upgrade with full YAML updates (either directly through helm or by templating first). There's no point in having a column for handling partial upgrades any more, remove it. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 7058593 ] Try to make it more clear to users how they should specify the options in the primary command for upgrading the deployment, to prevent them from using 1.8.x helm options which are no longer recognized by Cilium 1.9 Helm charts. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 88cf4df ] The Helm chart one isn't interpreted by sphinx so goes to a dead link, the api ratelimiting one was using double graves rather than single which meant it would highlight the text rather than linking it, and the preflight section vaguely referred to earlier in the document even though it's super easy to just directly link it. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit d12aacd ] Commit c8360a6 ("test/k8s: keep configmap across upgrade test") introduced the option KeepDeprecatedProbes which allowed smooth upgrade from 1.7 or earlier to 1.8 or later by generating cilium-agent DaemonSet health probes in the old style if the user specified the keepDeprecatedProbes option during helm install. However, it didn't take into account users upgrading from fresh 1.8.x installs to 1.9.x or later, where deprecated probes should never be used. Teach this setting about "upgradeCompatibility" so that when the user either directly specifies the option, or if they provide the appropriate upgrade compatibility version, the deprecated probes will only be used if the initial install was with Cilium 1.7 or earlier. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit ba7c822 ] The upgradeCompatibility option was not specified in the default instructions, but there was a more specific option that provided only one piece of upgrade compatibility. Convert the main instructions to use the common variable for providing smooth upgrade which allows control over multiple options. While we're at it, fix the indentation for the values file instructions. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 7d356ff ] Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 9de5fd6 ] The way this was written, you could never disable the l7 proxy because the helm chart ignored the setting if it was false. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit ac29aa7 ] These settings cannot be specified by default in the values.yaml file, because otherwise it would override the upgradeCompatibility value for the flag and hence break compatibility during upgrade from v1.7 (no such option) -> v1.8 (should be disabled during upgrade) -> v1.9. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 26878ec ] [Forward-cherry-pick] With this option users will be able to deploy Cilium without envoy support which is helpful for arm64 clusters. Related: #13650 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
…se cases [ upstream commit 4d07a34 ] Update the getting started guide with the kiam use case. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 92f0a9b ] Signed-off-by: Weilong Cui <cuiwl@google.com> Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 767a1b8 ] This function can suffer from connectivity loss between cluster and external IP, which is not dependant on Cilium. These retries will help reduce flakiness. Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit c1af153 ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 2eea58f ] - Use title case where appropriate. - Fix a couple syntax and punctuation errors. - helm -> Helm Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 5733f6a ] This change is only adding more unit tests to better understand the behavior of the labelsfilter as well as improving the documentation for the expectation of filtering labels. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for my change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good 👍
test-backport-1.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good for my one line change 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for my changes, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- docs: improve Host Firewall GSG #13673 -- docs: improve Host Firewall GSG (@qmonnet)
Looks good, thank you.
GKE failed in Suite-k8s-1.16.K8sIstioTest Istio Bookinfo Demo Tests bookinfo inter-service connectivity: https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/2922/ Seems like the pods did not pull the images in time? Retesting. |
retest-gke |
Once this PR is merged, you can update the PR labels via: