-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent report of dropped logs from hubble. #15445
Comments
What is the second sysdump ( Was 10.8.2.59 associated to a pod at any point? In the FQDN cache, I see that - matchLabels:
k8s:app: cockroachdb
k8s:io.kubernetes.pod.namespace: crdb-prod2 Were the policy drops happening when the sysdump was taken? |
Hey @pchaigno Regards. |
When was the sysdump collected? In the following series of events correct?
It would be nice to get a sysdump before the pod is deleted, ideally while the drops are still happening. |
Hey @pchaigno, Regards. |
@gtsjamesbond Just to confirm, there are no real packet drops, but only Hubble reported drops, correct? I think this context was missing in the Slack conversation. I'm not sure if we need the |
@aditighag so this is one thing that we were unsure of, we are getting the hubble reports of drops, BUT the application is behaving as expected (like there are no drops), so this is leading us to believe something is not correct, or we are misinterpreting the meaning of the drop reports. This is one of the things we wanted to verify, namely if the drop reports that we are seeing are infact accurate, or false reports. If false reports, then we would like to determine why to see if there is any way to "tune these out", so that we arent alerted if we dont need to be. Regards. |
@pchaigno @aditighag {destination { |
Regards. |
So if pod1 on node1 is talking to pod2 on node2, you see Hubble reported drops on node1, but the packets reach node2/pod2? |
If the drops are intermittent, they might not cause connectivity issues. That seems more likely than Hubble getting packet drop notifications out of nowhere 🙂 |
@pchaigno I see your point. What is of more interest to our team, is why we would get the drop to begin with if the target IP should be covered by one of the rules? Is what we are seeing an artifact of a network issue manifesting? or CNP actually causing a drop? We want to make sure that we have the proper interpretation of the drop logs that we are seeing. We also see this intermittently with a target we have identified using FQDN. Any ideas there? Regards. |
My guess would be that the drops are caused by some transient issue in the resolution of the security identity. The destination identity reported by Hubble is
I think that would help understand why the identity resolution failed. |
@pchaigno here are two drops logged in the last hour to different destinations, and also the sysdump
|
@gtsjamesbond I think you forgot to attach the sysdump. The Hubble log is also ill-formatted. |
@pchaigno Attaching dump again. |
@pchaigno @aditighag , were you able to gain any insight from the latest attached logs uploaded on March 25? Regards. |
Busy with the 1.10 release, I'll try to take a look (most likely) after the release unless someone else beats me to it. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
@aditighag as you showed some interest, I've assigned this to you :) |
have the same behavior on some pods |
@othmane399 What cilium version is this? We've had a couple of fixes around identity propagation delays - #16302 that could be potentially related. Can you check if you have the fix? |
@aditighag I'm in 1.10.3 so basically yes I have the changes.. |
I am experiencing something very similar to this. In my case, the drops reported by Hubble are real (see tests below). For context, we use Traefik (as a DaemonSet) to forward traffic to
Making a HTTP request to
However a request to another The relevant Cilium Endpoint
CiliumNetworkPolicy applied to Traefik
Any assistance on how to further troubleshoot is very welcomed. Thank you |
@joaoubaldo Could you please open a new issue with the same information as above plus a full dump of those Hubble drops ( |
This issue has been automatically marked as stale because it has not |
This issue has been automatically marked as stale because it has not |
This issue has not seen any activity since it was marked stale. |
I have the same issue, it was random, not repeating, and the TCP flag is always not SYN. My version:
|
Anybody knows how I can have the same drop traffic log as the author? What to enable please? |
@smeeklai you can get that info with something like |
Hello,
we’ve been noticing some unusual behavior with cilium/hubble and we’re curious if anyone else has seen anything similar. Essentially, we’ve written network policies that allow traffic to a particular CIDR block/port, but we’re seeing intermittent drop logs from hubble with the log code 133 (policy interruption) where we know the traffic should be allowed (and usually is allowed). However, we haven’t actually actually noticed any service interruption due to those drops, so we’re unsure about whether these are real intermittent cilium drops or if hubble might be falsely reporting drops. Has anyone seen anything similar, whether the cause might be false reporting by hubble or real intermittent drops by cilium? We’re running cilium v1.8.6 on kubernetes 1.18.
here’s an example of a drop:
policy:
How to reproduce the issue
cilium-sysdump-prod2-20210324-012008.zip
cilium-sysdump-prod3-20210324-012151.zip
The text was updated successfully, but these errors were encountered: