Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2022-04-26 #19573

Merged
merged 3 commits into from
May 4, 2022

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Apr 26, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 19501 19556 19626; do contrib/backporting/set-labels.py $pr done 1.11; done

@jrajahalme jrajahalme added kind/backports This PR provides functionality previously merged into master. backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. labels Apr 26, 2022
@jrajahalme jrajahalme requested a review from a team as a code owner April 26, 2022 10:00
@jrajahalme jrajahalme marked this pull request as draft April 26, 2022 10:00
@jrajahalme
Copy link
Member Author

/test-backport-1.11

@jrajahalme jrajahalme marked this pull request as ready for review April 27, 2022 19:06
@jrajahalme
Copy link
Member Author

jrajahalme commented Apr 27, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed:

Click to show.

Test Name

K8sServicesTest Checks graceful termination of service endpoints Checks client terminates gracefully on service endpoint deletion

Failure Output

FAIL: Timed out after 60.001s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create one.

@jrajahalme
Copy link
Member Author

runtime VM provisioning fail, restarting

@jrajahalme
Copy link
Member Author

/test-runtime

@jrajahalme
Copy link
Member Author

/test-1.21-4.9

@jrajahalme jrajahalme added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 28, 2022
@jrajahalme
Copy link
Member Author

@cilium/tophat CI passed, marking this as ready-to-merge.

@jrajahalme jrajahalme marked this pull request as draft April 28, 2022 17:20
@jrajahalme
Copy link
Member Author

This will need backport of #19626 as well to prevent a data race.

@aditighag
Copy link
Member

aditighag commented Apr 28, 2022

Removing the ready-to-merge label - #19573 (comment).

This showed up in the list of PRs to be merged by tophat because I wasn't filtering draft PRs.

@aditighag aditighag removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 28, 2022
@joestringer
Copy link
Member

joestringer commented Apr 28, 2022

I should probably qualify my approval above in that there is also a question on the v1.10 backport that I raised here which makes me wonder whether policy updates are being triggered too early during startup, which could lead to unexpected policy behaviour.

EDIT: Resolved, see #19574 (review) .

@jrajahalme jrajahalme marked this pull request as ready for review April 29, 2022 19:04
@jrajahalme
Copy link
Member Author

Added backport of #19626

@aditighag
Copy link
Member

@jrajahalme Can you rebase the PR so that we can trigger CI tests?

@aditighag aditighag added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Apr 29, 2022
@jrajahalme jrajahalme removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label May 2, 2022
@jrajahalme
Copy link
Member Author

/test-backport-1.11

@jrajahalme
Copy link
Member Author

@aditighag rebased :-)

@aditighag
Copy link
Member

@aditighag rebased :-)

Still seeing this 😕

Screen Shot 2022-05-02 at 7 43 38 AM

[ upstream commit b61a347 ]

ipcache SupportDump() and SupportsDelete() open the map to probe for the
support if the map is not already open and also schedule the
bpf-map-sync-cilium_ipcache controller. If the controller is run before
initMaps(), initMaps will fail as the controller will leave the map open
and initMaps() assumes this not be the case.

Solve this by not trying to detect dump support, but try dump and see if
it succeeds.

This fixes Cilium Agent crash on kernels that do not support ipcache dump
operations and when certain Cilium features are enabled on slow machines
that caused the scheduled controller to run too soon.

Fixes: 19360
Fixes: 19495
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Cilium Maintainers <maintainer@cilium.io>
[ upstream commit 2e5f35b ]

Move local identity allocator initialization to
NewCachingIdentityAllocator() so that it is initialized when the
allocator is returned to the caller. Also make the events channel and
start the watcher in NewCachingIdentityAllocator(). Close() will no
longer GC the local identity allocator or stop the watcher. Now that the
locally allocated identities are persisted via the bpf ipcache map across
restarts, recycling them at runtime via Close() would be inappropriate.

This is then used in daemon bootstrap to restore locally allocated
identities before new policies can be received via Cilium API or k8s API.

This fixes the issue where CIDR policies were received from k8s before
locally allocated (CIDR) identities were restored, causing the identities
derived from the received policy to be newly allocated with different
numeric identity values, ultimately causing policy drops during Cilium
restart.

Fixes: cilium#19360
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
[ upstream commit 916765b ]

InitK8sSubsystem() starts all k8s watchers concurrently, some of which do
call into K8sCacheIsSynced() via ipcache/metadata.InjectLabels(), and
possibly also from elsewhere. Initialize k8sCachesSynced before any
watchers are started to make this access safe. This fixes data race
detected by race detection builds.

Fixes: cilium#19614
Fixes: cilium#19556
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the pr/v1.11-backport-2022-04-26 branch from 961baad to 911fec0 Compare May 2, 2022 15:22
@jrajahalme
Copy link
Member Author

jrajahalme commented May 2, 2022

@aditighag rebased to a local v1.11 branch by accident, now this should be all-OK.

% git log --pretty=oneline pr/v1.11-backport-2022-04-26
911fec0c7a3137b11d97ec7578793c77861c1ee9 (HEAD -> pr/v1.11-backport-2022-04-26, origin/pr/v1.11-backport-2022-04-26, refs/patches/pr/v1.11-backport-2022-04-26/daemon-initialize) daemon: Initialize k8sCachesSynced channel before Initk8sSubsystem()
3c6ae82b93367cce13dc19fe08a6cf5a62051643 (refs/patches/pr/v1.11-backport-2022-04-26/identity-initialize-local) identity: Initialize local identity allocator early
7cb0d86517ba94a58b053b2ed47c76d83cf05291 (refs/patches/pr/v1.11-backport-2022-04-26/daemon-do-not-try-to-detect) daemon: Do not try to detect Dump support
23ceb7ad726173f3e513a87958312f8a7e4362ea (upstream/v1.11) add robots.txt to Cilium documentation

@joestringer
Copy link
Member

joestringer commented May 2, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Transparent encryption DirectRouting Check connectivity with transparent encryption and direct routing

Failure Output

FAIL: Connectivity test between nodes failed

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create one.

Job 'Cilium-PR-Runtime-net-next' failed:

Click to show.

Test Name

RuntimePrivilegedUnitTests Run Tests

Failure Output

FAIL: Failed to run privileged unit tests

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-Runtime-net-next so I can create one.

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed:

Click to show.

Test Name

K8sServicesTest Checks graceful termination of service endpoints Checks client terminates gracefully on service endpoint deletion

Failure Output

FAIL: Timed out after 60.000s.

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create one.

@jrajahalme
Copy link
Member Author

retesting runtime tests due to unit test timeout

@jrajahalme
Copy link
Member Author

/test-runtime

@jrajahalme
Copy link
Member Author

/test-1.22-4.19

@jrajahalme
Copy link
Member Author

/test-1.21-4.9

@jrajahalme
Copy link
Member Author

/test-1.19-4.9

@jrajahalme
Copy link
Member Author

Restarted tests due to provisioning fails and unrelated flakes (checked agent logs).

@jrajahalme jrajahalme added the release-blocker/1.11 This issue will prevent the release of the next version of Cilium. label May 4, 2022
@joestringer joestringer merged commit ac57f6f into cilium:v1.11 May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. release-blocker/1.11 This issue will prevent the release of the next version of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants