New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Use nproc --all
for __NR_CPUS__
#12121
Conversation
This uses `-D__NR_CPUS__=$(nproc --all)` (or `GetNumPossibleCPUs` when invoked from Go) to compile the datapath. This fixes an issue where cilium monitor fails to report any events on AKS, due to the `perf_event_array` map duplicates being created with different max_entries sizes, presumably causing the datapath to write to the first one, while the agent is reading from the second one. This bug occurs for example on AKS due to the present/possible cpuset on the VMs. The default Standard_D2s_v3 node size has 2 present CPUs, but 128 possible CPUs in /sys/devices/system/cpu. Fixes: #12070 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
test-me-please |
This is an alternative to #12119 - I have just validated that this fixes the issue on AKS - marking ready for review. |
Please check out #12070 (comment) for @pchaigno's great explanation of why this was not causing more troubles beforehand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There's two bugs fixed here in a way:
Do we want to backport this to v1.7 or even v1.6? |
(Given this is only events and signals map, this shouldn't have upgrade implications.) |
I believe that the core issue here where cilium doesn't report any flows is unique to v1.8 because v1.8 began opening (creating) this map prior to datapath provisioning. However if someone were to hotplug CPUs on v1.7 or earlier, they could plausibly also hit this. The fix itself looks pretty harmless, v1.7 backport is reasonable to me. |
The |
This uses
-D__NR_CPUS__=$(nproc --all)
(orGetNumPossibleCPUs
wheninvoked from Go) to compile the datapath.
This fixes an issue where
cilium monitor
fails to report any eventson AKS, due to the
perf_event_array
map duplicates being createdwith different
max_entries
sizes, presumably causing the datapathto write to the first one, while the agent is reading from the second
one.
This bug occurs for example on AKS due to the present/possible cpuset on
the VMs. The default Standard_D2s_v3 node size has 2 present CPUs, but
128 possible CPUs in
/sys/devices/system/cpu
.Fixes: #12070