Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-eks-nodeagent container logs errors on startup and shutdown #162

Closed
rtomadpg opened this issue Dec 6, 2023 · 8 comments
Closed

aws-eks-nodeagent container logs errors on startup and shutdown #162

rtomadpg opened this issue Dec 6, 2023 · 8 comments
Labels
duplicate This issue or pull request already exists

Comments

@rtomadpg
Copy link

rtomadpg commented Dec 6, 2023

What happened:

After upgrading VPC-CNI from v1.14.1-eksbuild.1 to v1.15.4-eksbuild.1 all the aws-eks-nodeagent containers logged:

aws-node-np4cq aws-eks-nodeagent 2023-12-06 16:14:59.823264484 +0000 UTC Logger.check error: failed to get caller

And, when I delete a random aws-node pod, I see this:

aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131300614 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131410269 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131480895 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131594396 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131647113 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131669285 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131694685 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.13179858 +0000 UTC Logger.check error: failed to get caller

I believe these errors comes from the uber-go/zap dependency, see https://github.com/uber-go/zap/blob/5acd569b6a5264d4c7433cbb278a8336d491715c/logger.go#L398

As I am unsure this error is signalling something is (really) wrong and this error was not logged in this project yet, I created the bug.

Attach logs

Let me know if needed.

What you expected to happen:

No errors getting logged.

How to reproduce it (as minimally and precisely as possible):

  • Upgrade to the mentioned version
  • Check the aws-node pod logs
  • Or, delete a aws-node pod. New pod will log the errors.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.27.7-eks-4f4795d
  • CNI Version: v1.15.4-eksbuild.1
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a):
Linux <hostname redacted> 5.10.192-183.736.amzn2.x86_64 aws/amazon-vpc-cni-k8s#1 SMP Wed Sep 6 21:15:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
@rtomadpg rtomadpg added the bug Something isn't working label Dec 6, 2023
@jdn5126
Copy link
Contributor

jdn5126 commented Dec 6, 2023

@rtomadpg just curious, did you notice the comment with:

For Network Policy issues, please file at https://github.com/aws/aws-network-policy-agent/issues

when you opened this issue? We are trying to improve the experience here with triaging Network Policy agent issues, so I am wondering if you think there is a better way this could have been noticed.

@jdn5126
Copy link
Contributor

jdn5126 commented Dec 6, 2023

As for this issue, this is the same as #103. This error log is harmless, and a fix is in progress

@jdn5126 jdn5126 transferred this issue from aws/amazon-vpc-cni-k8s Dec 6, 2023
@jdn5126 jdn5126 added duplicate This issue or pull request already exists and removed bug Something isn't working labels Dec 6, 2023
@rtomadpg
Copy link
Author

rtomadpg commented Dec 6, 2023

Ouch, so sorry! I checked the new bug flow and indeed that comment is there. Very clearly.
I guess I was too eager to file the bug (end of work day here) and I overlooked that part.

@rtomadpg rtomadpg closed this as completed Dec 6, 2023
@rtomadpg
Copy link
Author

rtomadpg commented Dec 6, 2023

@jdn5126 maybe a suggestion: when errors are logged by a container named "aws-eks-nodeagent" it's not immediately clear that's related to "Network Policy issues" or "aws-network-policy-agent". Maybe a mention of "aws-eks-nodeagent" in that comment will reduce wrongly filed issues?

@jdn5126
Copy link
Contributor

jdn5126 commented Dec 6, 2023

Ouch, so sorry! I checked the new bug flow and indeed that comment is there. Very clearly. I guess I was too eager to file the bug (end of work day here) and I overlooked that part.

Oh no worries, I was just curious if there was a better setup through GitHub. Good call, I can expand the comment

@lsabreu96
Copy link

lsabreu96 commented Mar 5, 2024

Hi everyone, sorry jumpin in on a closed thread.

I'm facing the same issue, but without the network policy error mentioned here.
I'm tryint to upgrade a managed worker group to 1.25 but the aws-node daemonset keeps failing in aws-eks-nodeagent container, causing the pod to restart

Any ideas ?
The VPC CNI plugin version is on v1.15.1-eksbuild.1

@jdn5126
Copy link
Contributor

jdn5126 commented Mar 5, 2024

@lsabreu96 the error log from this issue is harmless. If you are seeing the aws-eks-nodeagent container crashing, please file a new issue with the logs from the crash, which you can find in /var/log/aws-routed-eni/network-policy-agent.log on the affected node.

@koenkarsten
Copy link

For anyone reaching this thread because the aws-eks-nodeagent container is crashing with UTC Logger.check error: failed to get caller: For me the issue was mixing EKS k8s version 1.24 with aws-network-policy-agent:v1.0.4-eksbuild.1 and amazon-k8s-cni:v1.15.1-eksbuild.1 (These versions were automatically provisioned by EKS). Upgrading to k8s version 1.25 fixes the container crashing loop, as mentioned on the README of this repo (You’ll need a Kubernetes cluster version 1.25+ to run against.).

So not commenting to reopen this issue, just provide information if anyone still running 1.24 lands here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

4 participants