Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backport: In service recovery, don't skip if one of the service recovery fails #23922

Merged

Conversation

jaredledvina
Copy link

@jaredledvina jaredledvina commented Feb 21, 2023

This is a backport of #18422 as without it, issues like #23551 can be greatly exacerbated when agents are restarted (i.e. any daemonset update). I directly cherry-picked 018856602b7637b8ae1c796f4ed02fe1bbeb5905 to the v1.11 branch and only had a small merge conflict in pkg/logging/logfields/logfields.go which was easily fixable.

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Thanks for contributing!

Fixes: N/A

In service recovery, don't skip if one of the service recovery fails

@jaredledvina jaredledvina requested a review from a team as a code owner February 21, 2023 23:03
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Feb 21, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Feb 21, 2023
@aditighag
Copy link
Member

Hi @jaredledvina Have you tried testing with this fix? Furthermore, we should also have #23749 to fix the leaked entries. /cc @hemanthmalla
AFAICS, this PR just sidesteps #23551 issue, right?

@maintainer-s-little-helper
Copy link

Commit 5834ae50bff7d911932e1d38f912d2ee9014d4c8 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 24, 2023
@jaredledvina
Copy link
Author

Hi @aditighag - Thanks, yeah we've deployed #23858 & this change which fixes the issue from two different angles.

I don't know if I'd describe it as side-steps but, it significantly helps prevent any duplicate entries in the map from ballooning up the backends map with orphaned entries whenever the agent is restarted. This change along with your fix resolves similar problems from two directions. I think it's worth having this backported even with 23858 merged as on restart any map with duplicate entries is likely to hit this issue.

Hope that makes sense!

@jaredledvina jaredledvina force-pushed the jared.ledvina/cherry-pick-18422-1.11 branch from 5834ae5 to d478b18 Compare February 24, 2023 19:01
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 24, 2023
@aditighag
Copy link
Member

#23749

Sure, I'm fine with this backporting PR if required. It's missing the usual format though. Typically developers start with nominating PRs for backport. Here is the complete workflow - https://docs.cilium.io/en/v1.13/contributing/release/backports/#creating-the-backports-branch. It's missing the upstream commit tag in the commit description.

@hemanthmalla IIUC, we still need your fix so that backends can be reused - #23749, right?

@aditighag
Copy link
Member

aditighag commented Mar 21, 2023

#23749

Sure, I'm fine with this backporting PR if required. It's missing the usual format though. Typically developers start with nominating PRs for backport. Here is the complete workflow - https://docs.cilium.io/en/v1.13/contributing/release/backports/#creating-the-backports-branch. It's missing the upstream commit tag in the commit description.

@jaredledvina I nominated the original PR for backports, and fixed up the commit description locally by checking out this PR. However, I'm not able to push to your fork. Can you fix up the commit description with - [upstream commit https://github.com/cilium/cilium/commit/018856602b7637b8ae1c796f4ed02fe1bbeb5905], and push the revision to this PR?

@aditighag
Copy link
Member

/test-backport-1.11

@jaredledvina jaredledvina force-pushed the jared.ledvina/cherry-pick-18422-1.11 branch from d478b18 to 0a55833 Compare March 21, 2023 22:15
@jaredledvina
Copy link
Author

@aditighag - Sorry for the delay, we've been having an exciting 2 weeks over here. Just pushed the updated commit message, hope that's correct lemme know if I can help with anything else here. Thanks!

@aditighag
Copy link
Member

@jaredledvina There are failures. I compared the backport line by line, it looks fine to me. Can you rebase your PR? I was able to rebase it locally without any conflicts. Also, please sign-off the backport commit. You can take a look at some of the other backport PRs for reference.

@jaredledvina jaredledvina force-pushed the jared.ledvina/cherry-pick-18422-1.11 branch from 0a55833 to af2d6ff Compare March 22, 2023 00:25
@aditighag
Copy link
Member

aditighag commented Mar 22, 2023

/test-backport-1.11

Job 'Cilium-PR-K8s-1.17-kernel-4.9' failed:

Click to show.

Test Name

K8sNode Node labels updates are reflected in CiliumNode objects

Failure Output

FAIL: Failed waiting for log-gatherer pods: timed out waiting for pods with filter -l k8s-app=cilium-test-logs to be ready: 4m0s timeout expired

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.17-kernel-4.9 so I can create one.

Job 'Cilium-PR-K8s-1.21-kernel-4.9' failed:

Click to show.

Test Name

K8sKafkaPolicyTest Kafka Policy Tests KafkaPolicies

Failure Output

FAIL: Found 2 k8s-app=cilium logs matching list of errors that must be investigated:

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.21-kernel-4.9 so I can create one.

@jaredledvina
Copy link
Author

Hm, I think the Cilium-PR-K8s-1.17-kernel-4.9 failure might be #23365

The Cilium-PR-K8s-1.21-kernel-4.9 looks like it's #24394

@aditighag
Copy link
Member

aditighag commented Mar 22, 2023

Also, please sign-off the backport commit.

@jaredledvina The backport PR is good to go once you address this. Just add a sign-off line below the original author's sign-off. We need not run the entire suite again.

Screen Shot 2023-03-22 at 9 24 48 AM

[upstream commit cilium@0188566]

Reduce number of connections interrupted due to svc/backend
ID changes if restoration fails

Fixed by not return error instead logging it and continue.

Logging number of failures and success in backend restoration
with new variables

Signed-off-by: Gaurav Yadav <gaurav.dev.iiitm@gmail.com>
Signed-off-by: Jared Ledvina <jared.ledvina@datadoghq.com>
@jaredledvina jaredledvina force-pushed the jared.ledvina/cherry-pick-18422-1.11 branch from af2d6ff to 56417a8 Compare March 22, 2023 20:12
@jaredledvina
Copy link
Author

Sorry about that @aditighag, I'm used to my git config automatically adding that and missed that it didn't work here. Should be signed-off correctly now.

@aditighag aditighag merged commit ba156b4 into cilium:v1.11 Mar 22, 2023
@aditighag aditighag added the affects/v1.11 This issue affects v1.11 branch label Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.11 This issue affects v1.11 branch backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. kind/community-contribution This was a contribution made by a community member.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants