-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 Backports 2023-05-03 #25250
v1.11 Backports 2023-05-03 #25250
Conversation
[ upstream commit 32473b9 ] Log number of backends that are identified as leaked, and hence, skipped and deleted during restore. This is useful for debugging when debug logs are not enabled. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 662c96b ] Log orphan backends that are not associated with any services, and cleaned up post agent restart. This can be useful for debugging when debug logs are not enabled. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit c32b001 ] During backends restore, the agent previously released backend ID in case of conflict. More importantly, the ID that was released was already restored for another backend. This will release conflicted IDs from the ID allocator, while still leaving backend entries in the backends map. The ID allocator can then assign the released ID to another backend once the restore is complete. This commit removes releasing conflicted backend IDs. The proper clean-up would happen in certain cases when backends are deleted as duplicate or orphan where they are not associated with any of the services. Fixes: 51b62f5 ("service: Add methods to acquire backend ID for svc v2") Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit ec8d3ce ] This commit revises the fix made in commit 5311f81 that handles leaked backends. Consider the following leaked backend scenarios where (1) and (2a) were previously handled: (1) Backend entries leaked, no duplicates: previously, such backends would be deleted as orphan backends after sync with kubernetes api server. (2) Backend entries leaked with duplicates: (a) backend with overlapping L3nL4Addr hash is associated with service(s) Sequence of events: Backends were leaked prior to agent restart, but there was at least one service that the backend by hash is associated with. s.backendByHash will have a non-zero reference count for the overlapping L3nL4Addr hash. (b) none of the backends are associated with services Sequence of events: All the services these backends were associated with were deleted prior to agent restart. s.backendByHash will not have an entry for the backends hash. This commit updates the logic to handle all the scenarios in one place. Fixes: b79a4a5 (pkg/service: Gracefully terminate service backends) Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commit looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits LGTM, thanks @joamaki!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my commit, thanks 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my commits. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks1
[ upstream commit e1a48bb ] The commit adds a unit test case to validate various leaked backends scenarios handled in restore services. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit b0ada08 ] The commit documents the known service backends leak issue along with the cilium versions that have the fixes. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9e83a6f ] Cilium is currently affected by a known bug (cilium#24692) when NodePorts are handled by the KPR implementation, which occurs when the same NodePort is used both in the local and the remote cluster. This causes all traffic targeting that NodePort to be redirected to a local backend, regardless of whether the destination node belongs to the local or the remote cluster. This affects also the clustermesh-apiserver NodePort service, which is configured by default with a fixed port. Hence, let's add a warning message to the corresponding values file setting. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 11e1bcc ] This is to add a small docs for version matrix between Cilium and Cilium envoy versions, which is useful with the upcoming work to move envoy proxy out of Cilium agent container. Co-authored-by: ZSC <zacharysarah@users.noreply.github.com> Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit db3e015 ] Before this patch, we would hit a controller-gen[1] bug when the temporary file would be of the form tmp.0oXXXXXX. This patch uses a custom mktemp template that will not trigger the bug. [1]: kubernetes-sigs/controller-tools#734 Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9a38aec ] We've been distributing ARM architecture images for Cilium for almost two years, but neglected to mention this up front in the system requirements or the main docs page. Add this to the docs. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit e695e48 ] Running the test in a cpu constrained environment, such as: ``` docker run -v $(pwd):$(pwd) -w $(pwd) --cpus=0.1 -it golang:bullseye ./inctimer.test -test.v ``` I can fairly consistency reproduce a flake where the inctimer.After does not fire in time. If I allow it to wait for an additional couple of ms, this seems to be sufficient to prevent failure. It appears that goroutine scheduling latency can be significantly delayed in cpu restricted environments. This seems unavoidable, so to fix the flake I'll allow the test to wait another 2ms to see if the inctimer eventually fires. This will also log an error for delayed test fires, so if there is any other issues we can more easily debug them in the future. Fixed: cilium#25202 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
67e69a2
to
6e40b96
Compare
Hey, looks like I forgot to fix up pkg/service/service_test.go. After fixing it the same way as in the v1.12 backports I get a test failure:
Enabling debug logging I get:
@aditighag Any idea what's going on? Are we missing something important in v1.11? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed the differences around service name +namespace changes.
SessionAffinity: true, | ||
SessionAffinityTimeoutSec: 100, | ||
HealthCheckNodePort: 0, | ||
Name: "", | ||
Namespace: "", | ||
LoadBalancerSourceRanges: []*cidr.CIDR{}, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert these changes? Looks like there were accidentally added in the backport commit?
Name: "svc1", | ||
Namespace: "ns1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the only difference between 1.11 and newer branches.
Since this branch comes from Jussi's fork, I cannot edit commits to apply Aditi's change. I'll move this to another PR the branch owned by me. |
STATUS
commands from upstream tests' Jenkinsfile #25046 (@nbusseneau)Once this PR is merged, you can update the PR labels via:
or with