Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2022-04-26 #19585

Merged
merged 6 commits into from
Apr 29, 2022
Merged

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Apr 27, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 19153 19528 19452 19563 19578; do contrib/backporting/set-labels.py $pr done 1.11; done

chancez and others added 5 commits April 26, 2022 17:03
[ upstream commit 6c5e2d6 ]

Prometheus provides metrics collectors that expose go runtime and go build
information, which can be useful to server administrators, lets expose
them.

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit ec187e8 ]

Currently, when building Cilium with a Go major/minor version other than
the one specified in `GO_VERSON`, the build currently fails with:

    package github.com/cilium/cilium/cilium: build constraints exclude all Go files

It's not obvious from the compiler error message that this is due to
mismatching Go version.

This change adds a `check-go-version` target which checks the Go
compiler version used for building Cilium (as specified in `$(GO)`
against the version pinned in the `GO_VERSION` file, i.e. the version
used to build Cilium in CI. This check is required to pass in the
`precheck` target which should surface mismatching Go versions in a more
user-friendly way.

Example with matching version:

    $ go version
    go version go1.18.1 linux/amd64
    $ make check-go-version

Example with mismatching version:

    $ go1.17.9 version
    go version go1.17.9 linux/amd64
    $ make GO=go1.17.9 check-go-version
    Installed Go version 1.17 does not match requested Go version 1.18
    make: *** [Makefile:602: check-go-version] Error 1

Suggested-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit aafc70b ]

It'll be useful to reuse this in-place zombie sort in an upcoming patch,
split it out in preparation.

Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit ac93cb4 ]

[ Backporter's notes: Minor conflicts in endpoint.go, fqdn/cache.go ]

Commit f6ce522 ("FQDN: Added garbage collector functions.")
introduced a per-host limit on the number of IPs to be associated in the
DNS cache, but at the time we did not support keeping FQDN entries alive
beyond DNS TTL ("zombie entries"). These were later added in commit
f629372 ("fqdn: Add and use DNSZombieMappings in Endpoint"), but at
that time no such per-host limit was imposed on these zombie entries.

Commit 5923daf ("fqdn: keep IPs alive if their name is alive")
later adjusted the zombie garbage collection to allow zombies to stay
alive as long as any IP that shares the same FQDN is marked as alive.
Unfortunately, this lead to situations where a very high number of DNS
cache entries remain in the cache beyond the DNS TTL, simply because one
IP for the given name continues to be used.

In the case of something like Amazon S3, where DNS TTLs are known to be
low, and IP recycling high, if an app constantly made requests via
ToFQDNs policy towards names hosted by this service, this could lead to
thousands of stale FQDN mappings accumulating in the cache. For each of
these mappings, Cilium would allocate corresponding identities, and when
this is combined with a permissive pod policy, this could lead to
policymaps becoming full, and error messages in the logs like:

    msg="Failed to add PolicyMap key" ...
    error="Unable to update element for map with file descriptor 67: argument list too long"

This could also prevent new pods from being scheduled on nodes, as
Cilium would be unable to implement the full requested policy for the
new endpoints.

In order to mitigate this situation, extend the per-host limit
configuration to apply separately also to zombie entries. This allows up
to 'ToFQDNsMaxIPsPerHost' FQDN entries that are alive (ie below DNS TTL)
in addition to a further 'ToFQDNsMaxIPsPerHost' zombie entries
corresponding to connections which remain alive beyond the DNS TTL.

Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit b3b0502 ]

Due to some CORS policy, the requests being performed from
docs.cilium.io to readthedocs.org were being denied. This was causing
the warning banner to never show up in the documentation. To avoid this
problem a page redirect was configured in readthedocs settings to
redirect docs.cilium.io/version to readthedocs.org/api/v2/version which
will hopefully fix the issue and the API endpoint was set to
docs.cilium.io.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer requested a review from a team as a code owner April 27, 2022 00:14
@joestringer joestringer added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Apr 27, 2022
@joestringer
Copy link
Member Author

joestringer commented Apr 27, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed:

Click to show.

Test Name

K8sServicesTest Checks graceful termination of service endpoints Checks client terminates gracefully on service endpoint deletion

Failure Output

FAIL: Timed out after 60.001s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create one.

@joestringer
Copy link
Member Author

@joestringer
Copy link
Member Author

/ci-external-workloads-v1.11

@joestringer
Copy link
Member Author

/test-1.17-4.9

@aanm
Copy link
Member

aanm commented Apr 28, 2022

The fix for the conformance aws-cni test is here

@aanm
Copy link
Member

aanm commented Apr 28, 2022

/ci-awscni-1.11

@joestringer
Copy link
Member Author

Latest aws-cni breakage during Update AWS VPC CNI plugin step still seems related to GH workflow breakages, not this PR:

Run kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.7.10/config/v1.7/aws-k8s-cni.yaml
  kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.7.10/config/v1.7/aws-k8s-cni.yaml
  shell: /usr/bin/bash -e {0}
  env:
    clusterName: cilium-cilium-[2](https://github.com/cilium/cilium/runs/6210086880?check_suite_focus=true#step:8:2)238999029
    region: us-east-2
    cilium_cli_version: v0.10.4
    check_url: https://github.com/cilium/cilium/actions/runs/2238999029
    AWS_DEFAULT_REGION: us-east-2
    AWS_REGION: us-east-2
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
clusterrolebinding.rbac.authorization.k8s.io/aws-node unchanged
clusterrole.rbac.authorization.k8s.io/aws-node unchanged
Warning: spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].key: beta.kubernetes.io/os is deprecated since v1.14; use "kubernetes.io/os" instead
Warning: spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[1].key: beta.kubernetes.io/arch is deprecated since v1.14; use "kubernetes.io/arch" instead
daemonset.apps/aws-node configured
error: unable to recognize "https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.7.10/config/v1.7/aws-k8s-cni.yaml": no matches for kind "CustomResourceDefinition" in version "apiextensions.k[8](https://github.com/cilium/cilium/runs/6210086880?check_suite_focus=true#step:8:8)s.io/v1beta1"
serviceaccount/aws-node unchanged
Error: Process completed with exit code 1.

@joestringer
Copy link
Member Author

External workloads hit #19603 again.

@joestringer
Copy link
Member Author

Only AWS-CNI (described above), External workloads (known flake) and k8s-1.22-kernel-4.19 (known flake) are currently broken. This PR should be good from a CI perspective.

[ upstream commit 7d30daf ]

It looks that we can't do a Page redirect from 'robots.txt' in
readthedocs settings. This commit is another attempt of having a
manually defined robots.txt file.

Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm force-pushed the pr/v1.11-backport-2022-04-26 branch from c1a95aa to 6244aca Compare April 28, 2022 23:50
@aanm aanm merged commit 23ceb7a into cilium:v1.11 Apr 29, 2022
@joestringer joestringer deleted the pr/v1.11-backport-2022-04-26 branch May 2, 2022 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants