-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.14.0 - Policy Denies occurring when they should not be #27169
Comments
Thanks for this report, would you be able to grab a sysdump file and upload it please so we can better debug the issue? (Using https://docs.cilium.io/en/stable/operations/troubleshooting/#automatic-log-state-collection) |
@jwitko In addition to getting that sysdump, can you ensure you don't have a broad deny policy. Deny policies take precedence over allow policies, no matter how narrowly tailored the CIDR selection is. So, a broad CIDR deny-policy will deny the traffic of a narrow CIDR allow-policy. |
I do not have any deny policies at all. I have enforcement mode set to always. |
Hi! What do you mean with "switched from IPTables to BPF."? What did you change and how did you change it? Did you notice anything that could be timing related? In particular there's a setting called And could you share how you are configuring and deploying Cilium? If we can reproduce the issue ourselves, then we could get our own sysdump |
Thank you all for the attempted assistance. I have taken an alternative approach and will circle back since I cannot properly assist in trouble-shooting this at the moment. |
@dylandreimerink @margamanterola @nathanjsweet I've encountered the same situation, and I've opened #27210. As mentioned in this issue, this problem is easy to reproduce and I don't think it has anything with host routing mode, it is related to logic in |
I upgraded to 1.14 from 1.13 yesterday and also had a simple egress to CIDR policy being denied. Was working fine for about 10 mins. No other deny rules in place. Had same policy on another deployment which worked fine. A roll-back to 1.13 fixed it. Happy to upgrade again and get a sysdump if required. |
Similarly to what @AlHood77 mentioned, things work fine for about 10 minutes then
Seems closely related to #24502. Previously we tried upgrading from 1.12.9 to 1.13.1 and were running into the issue of kube-apiserver IP addresses disappearing from the bpf ipcache after 10 minutes. Decided to stick with cilium 1.12.9 as it was working great for us. Now trying to upgrade from 1.12.9 to 1.14.0 causing very similar issue, yet I suppose it's slightly different because if I look in the cilium bpf ipcache list, I can still see the kube-apiserver IP's:
Once the error is thrown, approximately after 10 minutes after Downgrading from 1.14.0 to 1.13.5 seems to work. The bpf ipcache still holds the kube-apiserver IP's. However, one thing I noticed is in 1.13.5, running |
Is there an existing issue for this?
What happened?
I upgraded my cluster from 1.13-latest to 1.14.0.
I then switched from IPTables to BPF.
In audit mode everything works fine and audits all seem expected.
When audit mode is disabled I see lots of policy denies that I can't seem to explain.
Cilium Version
1.14.0
Kernel Version
Linux 5.14.0-284.18.1.el9_2.x86_64 #1 Thu Jun 29 17:06:27 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
1.25.8
Sysdump
No response
Relevant log output
cilium status --verbose:
Here is an example.
Anything else?
It is very difficult to test and/or get logs because when I leave audit mode an massive amount of DROPs occur. Cilium does not seem to be respecting the network policies that worked perfectly fine before the move to bpf.
In this image (IPs blacked out) we can see an example of some drops.
These drops should not happen due to a
CiliumClusterwideNetworkPolicy
When attempting to lookup the ID:
Code of Conduct
The text was updated successfully, but these errors were encountered: