Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logging statements in CES usage and reduce code reuse #22428

Merged
merged 2 commits into from Feb 28, 2023
Merged

Improve logging statements in CES usage and reduce code reuse #22428

merged 2 commits into from Feb 28, 2023

Conversation

yanggangtony
Copy link
Contributor

Signed-off-by: yanggang gang.yang@daocloud.io

/kind bug
fix panic when map return nil.
fix: #22417

@yanggangtony yanggangtony requested a review from a team as a code owner November 29, 2022 16:13
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 29, 2022
@joestringer
Copy link
Member

@Weil0ng , could you take a look at this or pull in someone with CES background?

@joestringer
Copy link
Member

@yanggangtony do you have an example stack trace for the failure? That would be helpful also to include in the commit message and PR description.

@yanggangtony
Copy link
Contributor Author

yanggangtony commented Nov 30, 2022

@joestringer
thanks for review.
the stack is displayd in #22417.
Is that can explain it?

@joestringer
Copy link
Member

Ah yes, I missed it since it wasn't in the commit message or PR description. That one should do 👍

@joestringer joestringer added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.12 labels Nov 30, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Nov 30, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.12.5 Nov 30, 2022
Copy link
Member

@nebril nebril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I think it's a good catch. Please consider expanding it a bit as explained in the comment.

operator/pkg/ciliumendpointslice/manager.go Outdated Show resolved Hide resolved
@yanggangtony
Copy link
Contributor Author

@nebril
thanks for review.
Just updated the code .

@aanm aanm added the kind/community-contribution This was a contribution made by a community member. label Dec 1, 2022
@hongkunyoo
Copy link

hongkunyoo commented Dec 1, 2022

Hello, thanks for handling my issue.
I'm just curious, what is the meaning of CES not found in map desiredCESs?
Is this a bug or an acceptable situation? (I understand not handling a missing CES is a bug which leads to nil pointer dereference. On the other hand, I'm wondering whether missing CES in the map is a bug)
I'm just wondering why CES is missing in the desiredCES map in the first place.
Please understand that I am very new to cilium and have limited knowledge.

@yanggangtony
Copy link
Contributor Author

@hongkunyoo
Hi, your question is good..But i have no idea about that..
I am also freshman to cilium..
That fix code just avoiding the nil pointer dereference..
Maybe we can continue deep into the code to found the answer

@joestringer
Copy link
Member

I agree, it would be very useful to investigate how it got into this state in the first place. If there's a way to reproduce the original bug, that would be very helpful.

Either way, I think that if we catch the error to avoid a nil pointer exception and log the error, that is better than the current state. Then over time hopefully we can find out more about the underlying bug that causes this to happen.

@joestringer
Copy link
Member

@yanggangtony the code doesn't compile, please investigate.

@yanggangtony
Copy link
Contributor Author

@joestringer
thanks for reminding.
Sorry for push code in a hurry.
Had updated and compiled local.

@aanm aanm requested a review from nebril December 5, 2022 09:09
@nebril
Copy link
Member

nebril commented Dec 6, 2022

/test

@yanggangtony
Copy link
Contributor Author

rebase master code .

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several reviewers have asked for the commit description to be expanded. Adding the stack trace there would probably be good enough.

@yanggangtony
Copy link
Contributor Author

Several reviewers have asked for the commit description to be expanded. Adding the stack trace there would probably be good enough.

Ah, thanks for reminder.
Had updated the commit message.

@pchaigno pchaigno self-requested a review December 8, 2022 00:21
@aanm aanm added the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Dec 9, 2022
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't take too deep of a look but I noticed some aspects of the logging that deviates from the usual approach to logging in Cilium. I'm also not sure I understand whether the new logging situations are expected or unexpected conditions. Typically log.Info() should be limited to mostly once-off messages that do not recur frequently. If we're hiding an error condition, then maybe we should be reflecting those errors somewhere but I'm not exactly sure where. Probably someone with better CES and/or k8s knowledge could provide more concrete feedback in this regard. cc @Weil0ng @cilium/sig-k8s

operator/pkg/ciliumendpointslice/manager.go Outdated Show resolved Hide resolved
operator/pkg/ciliumendpointslice/manager.go Outdated Show resolved Hide resolved
operator/pkg/ciliumendpointslice/manager.go Outdated Show resolved Hide resolved
@yanggangtony
Copy link
Contributor Author

@varunmar
thanks for the suggestion.
rebased .

Copy link

@varunmar varunmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Looks good to me now.

@varunmar
Copy link

varunmar commented Feb 7, 2023

Weilong is out of office for several weeks. I'm fine with this change though - can we merge it?

@yanggangtony
Copy link
Contributor Author

thanks for review..

@pchaigno
Copy link
Member

pchaigno commented Feb 8, 2023

/test

@joestringer joestringer added this to Needs backport from master in 1.12.8 Feb 13, 2023
@joestringer joestringer removed this from Needs backport from master in 1.12.7 Feb 13, 2023
Copy link
Contributor

@Weil0ng Weil0ng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was OOO, thank you for spotting the issue and sending in the mitigation! Change LGTM. Just curious, is the cluster under heavy pod churn when you encountered the panic?

@Weil0ng
Copy link
Contributor

Weil0ng commented Feb 17, 2023

/cc @dlapcevic

Copy link
Contributor

@dlapcevic dlapcevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I already fixed the nil access panic bug in getCESQueueDelayInSeconds() - #22884
The explanation of how this happens is in the description.

I was unaware of this PR and the issue, we should’ve closed #22417 when my PR was merged.

I think there is no way it will try to access nil pointer right now, because all parts of the code that use getCESTrackerOnly() have checks for it, either by checking if it’s nil or using getCESName() to see if it exists.

getCESName() is also reliable because createCES() and insertCEP() are used together, so if cepNametoCESName exists, a cesTracker for that CEP will also exist.
Code:
getCESTrackerOnly - https://github.com/cilium/cilium/blob/master/operator/pkg/ciliumendpointslice/ceptocesmap.go#L89
getCESName - https://github.com/cilium/cilium/blob/master/operator/pkg/ciliumendpointslice/ceptocesmap.go#L47
createCES - https://github.com/cilium/cilium/blob/master/operator/pkg/ciliumendpointslice/manager.go#L200
insertCEP - https://github.com/cilium/cilium/blob/master/operator/pkg/ciliumendpointslice/ceptocesmap.go#L34

Anyway, it’s good to remove getCESTrackerOnly() to avoid future issues with unsafe access.

LGTM

@pchaigno
Copy link
Member

@sayboras I believe all team review requests are already covered, that is cilium/cli and cilium/operator.

@pchaigno pchaigno removed their request for review February 28, 2023 19:53
@joestringer joestringer changed the title fix panic when map return nil. Improve logging statements in CES usage and reduce code reuse Feb 28, 2023
@joestringer joestringer added release-note/misc This PR makes changes that have no direct user impact. and removed needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch release-note/bug This PR fixes an issue in a previous release of Cilium. labels Feb 28, 2023
getCESTrackerOnly() is just a slightly different version of
getCESTracker(), we only need one such function. Remove the other one.

Signed-off-by: yanggang <gang.yang@daocloud.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer
Copy link
Member

joestringer commented Feb 28, 2023

Thanks for your review @dlapcevic!

I took a closer look and I couldn't see any bug that this PR was fixing. I've updated the PR title accordingly.

Additionally, I did the following:

  • Split the PR into refactors vs. logging changes,
  • Removed extraneous whitespace changes,
  • Improved the duplicate logging statements to make the logs unique,
  • Rebased against the tip of the main branch,
  • Removed references to the panic that was already fixed by another change in-tree.

No functional changes should have been made since the prior test run, so if this passes the basic checks then I will be confident to merge it into the tree.

Reuse logfields for common names, add some missing fields from logging
statements, and add additional debugging statements in error conditions.

Signed-off-by: yanggang <gang.yang@daocloud.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer merged commit 6e7125e into cilium:master Feb 28, 2023
@nebril nebril added this to Needs backport from master in 1.12.9 Mar 10, 2023
@nebril nebril removed this from Needs backport from master in 1.12.8 Mar 10, 2023
@michi-covalent michi-covalent added this to Needs backport from master in 1.12.10 Apr 14, 2023
@michi-covalent michi-covalent removed this from Needs backport from master in 1.12.9 Apr 14, 2023
@thorn3r thorn3r added this to Needs backport from main in 1.12.11 May 16, 2023
@thorn3r thorn3r removed this from Needs backport from main in 1.12.10 May 16, 2023
@thorn3r thorn3r removed this from Needs backport from main in 1.12.11 May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cilium operator invalid memory address or nil pointer dereference in getCESQueueDelayInSeconds