Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPERM for sys_bpf with Ubuntu 18.04 with kernel 5.5 #9988

Closed
brb opened this issue Jan 29, 2020 · 7 comments
Closed

EPERM for sys_bpf with Ubuntu 18.04 with kernel 5.5 #9988

brb opened this issue Jan 29, 2020 · 7 comments
Assignees
Labels
priority/high This is considered vital to an upcoming release.

Comments

@brb
Copy link
Member

brb commented Jan 29, 2020

While trying to update the ubuntu-next VM image in the CI to run with the latest kernel (#9657), we discovered that after cilium-agent has been installed, bpf(2) fails with EPERM which results in endpoint regeneration failures.

Creating a dummy map neither from the cilium-agent nor the host works:

vagrant@k8s1:~$ sudo bpftool map create /sys/fs/bpf/tc/globals/cilium_policy_01821 type hash key 16 value 16 entries 10000 name foobar
Error: map create failed: Operation not permitted

SELinux is not installed, and the kernel doesn't have the lockdown.

Also, the EPERM problem starts to occur after cilium-agent has been started on the node. I've set ulimit -a unlimited, but it didn't help. After ftracing, it might be that EPERM is returned by security_bpf(), as a probe for bpf_get_file_flag() was not fired, and there are no relevant calls in-between (unfortunately I was not able to get $retval for a kretprobe of security_bpf()).

Might be related to #9402.

@brb brb added priority/high This is considered vital to an upcoming release. needs/triage This issue requires triaging to establish severity and next steps. labels Jan 29, 2020
@brb brb assigned brb and borkmann Jan 29, 2020
@Antiarchitect
Copy link
Contributor

Antiarchitect commented Feb 5, 2020

Having similar issue centos 7, kernel 5.5.1-1:

level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID=e15e82f776 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=598 file-path=598_next_fail identity=24220 ipv4=10.32.0.30 ipv6= k8sPodName=core-cilium/hubble-mr2x2 subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=0s bpfWaitForELF=0s bpfWriteELF=0s buildDuration="336.045µs" containerID=e15e82f776 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=598 error="Unable to create map /sys/fs/bpf/tc/globals/cilium_ep_config_00598: operation not permitted" identity=24220 ipv4=10.32.0.30 ipv6= k8sPodName=core-cilium/hubble-mr2x2 mapSync=0s policyCalculation=0s prepareBuild="63.781µs" proxyConfiguration=0s proxyPolicyCalculation=0s proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint waitingForCTClean=0s waitingForLock="1.125µs"
level=error msg="endpoint regeneration failed" containerID=e15e82f776 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=598 error="Unable to create map /sys/fs/bpf/tc/globals/cilium_ep_config_00598: operation not permitted" identity=24220 ipv4=10.32.0.30 ipv6= k8sPodName=core-cilium/hubble-mr2x2 subsys=endpoint

@Antiarchitect
Copy link
Contributor

Unable to create map /sys/fs/bpf/tc/globals/cilium_ep_config_00861: operation not permitted

@brb
Copy link
Member Author

brb commented Feb 5, 2020

@Antiarchitect Thanks for the info. Can you check in your dmesg whether you have lockdown enabled?

@Antiarchitect
Copy link
Contributor

@brb - Please check your Slack

@brb brb assigned pchaigno and qmonnet and unassigned brb and borkmann Feb 5, 2020
@joestringer
Copy link
Member

joestringer commented Feb 7, 2020

Weirdly enough, after I upgraded my local Cilium dev VM kernel to net-next I'm observing this... but not for the cilium-agent or cilium processes, only for bpftrace or bpftool.

I started digging around in apparmor but I don't think that's it, bpftrace only has a "complain" mode profile, and aa-audit tells me there's no profile active:

# aa-audit bpftrace
Profile for /usr/local/bin/bpftrace not found, skipping

dmesg | grep -i lock in this setup doesn't turn up any lockdown-related things.

Feel free to reach out to me on slack if you have thoughts on what kinds of queries you'd like from that environment. I don't want to flood this thread too much given my issue is more bpftrace-related, but it seems similar enough that there may be some commonality.

Weirdly, I see other conflicts of behaviour:

  • If I do something like a cilium bpf ct list global or even cilium bpf ct flush global in this env, those commands work
  • If I do the bpftool commands from the issue description, it doesn't work with the same behaviour that you describe in the first post on this issue:
    # bpftool map create /sys/fs/bpf/tc/globals/cilium_policy_01821 type hash key 16 value 16 entries 
    10000 name foobar
    Error: map create failed: Operation not permitted
    

More info with the bpftool command:

# strace bpftool map create /sys/fs/bpf/tc/globals/cilium_policy_01821 type hash key 16 value 16 entries 10000 name foobar 2>&1 | tail -n 8 
prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = 0
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=16, value_size=16, max_entries=10000, map_flags=0, inner_map_fd=0, ...}, 112) = -1 EPERM (Operation not permitted)
write(2, "Error: ", 7Error: )                  = 7
write(2, "map create failed: Operation not"..., 42map create failed: Operation not permitted) = 42
write(2, "\n", 1
)                       = 1
exit_group(-1)                          = ?
+++ exited with 255 +++

This last command is successful on my Ubuntu 19.10 kernel 5.3.

@brb brb removed the needs/triage This issue requires triaging to establish severity and next steps. label Feb 24, 2020
@brb brb closed this as completed Feb 24, 2020
@brb
Copy link
Member Author

brb commented Feb 24, 2020

@borkmann Thanks for tracking this down!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high This is considered vital to an upcoming release.
Projects
None yet
Development

No branches or pull requests

6 participants