New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle errno better and enoent better for batch operations #112
Handle errno better and enoent better for batch operations #112
Conversation
It's possible to get an errno value from a successful libbpf call as errno isn't reset before each CGO call. Instead of checking errno unconditionally after each libbpf call, only check it once we've determined we have actually hit an error condition. This fixes a bunch of breakages and can be verified by running the selftests and ensuring they all pass.
Certain bpf calls can return ENOENT like when a map is empty. This can also happen during batch operations when we've read all the data. In this situation, we should still propagate the error but we should also return the data to the user.
@@ -339,7 +339,7 @@ func NewModuleFromFileArgs(args NewModuleArgs) (*Module, error) { | |||
} | |||
|
|||
obj, errno := C.bpf_object__open_file(bpfFile, &opts) | |||
if errno != nil { | |||
if obj == nil && errno != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is checking errno not already enough to make sure there's an error? It seems that Go does clear errno before each C function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the comment of @grantseltzer
libbpf might return a non-null value in some cases.
For example: https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L7080.
In that case, we will mistakenly return that no error occured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if Go does clear it before the call it's still possible that something within the call sets errno
but we still don't actually have an error condition, the call that we originally made did succeed.
Without this change a few of the selftests error out. Also in this patch is a fix for map batch operations where errno
could equal ENOENT
. This happens when we've read an entire map, however we may still have gotten some data back so this patch fixes things to ensure whatever data we got is still returned, even if it's less than the batch amount.
@@ -339,7 +339,7 @@ func NewModuleFromFileArgs(args NewModuleArgs) (*Module, error) { | |||
} | |||
|
|||
obj, errno := C.bpf_object__open_file(bpfFile, &opts) | |||
if errno != nil { | |||
if obj == nil && errno != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the comment of @grantseltzer
libbpf might return a non-null value in some cases.
For example: https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L7080.
In that case, we will mistakenly return that no error occured
@@ -381,7 +381,7 @@ func NewModuleFromBufferArgs(args NewModuleArgs) (*Module, error) { | |||
} | |||
|
|||
obj, errno := C.bpf_object__open_mem(bpfBuff, bpfBuffSize, &opts) | |||
if errno != nil { | |||
if obj == nil && errno != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
_, errno := C.bpf_object__load(m.obj) | ||
if errno != nil { | ||
ret, errno := C.bpf_object__load(m.obj) | ||
if ret < 0 && errno != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good test, as bpf_object__load returns an int, and uses libbpf_err to set errno in case of an error:
static inline int libbpf_err(int ret)
{
if (ret < 0)
errno = -ret;
return ret;
}
Same for all places that don't return a pointer
@@ -430,7 +430,7 @@ func (m *Module) GetMap(mapName string) (*BPFMap, error) { | |||
cs := C.CString(mapName) | |||
bpfMap, errno := C.bpf_object__find_map_by_name(m.obj, cs) | |||
C.free(unsafe.Pointer(cs)) | |||
if errno != nil { | |||
if bpfMap == nil && errno != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is ok to check for null bpfMap:
https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L9394
This inconsistency in libbpf is really confusing. I think it should be fixed before 1.0
@@ -1057,7 +1067,7 @@ func doAttachKprobeLegacy(prog *BPFProg, kp string, isKretprobe bool) (*BPFLink, | |||
cbool := C.bool(isKretprobe) | |||
link, errno := C.attach_kprobe_legacy(prog.prog, cs, cbool) | |||
C.free(unsafe.Pointer(cs)) | |||
if errno != nil { | |||
if link == nil && errno != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is implemented in libbpfgo - it is ok to check for nil
@@ -1094,7 +1104,7 @@ func (m *Module) InitRingBuf(mapName string, eventsChan chan []byte) (*RingBuffe | |||
} | |||
|
|||
rb, errno := C.init_ring_buf(bpfMap.fd, C.uintptr_t(slot)) | |||
if errno != nil { | |||
if rb == nil && errno != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is implemented in libbpfgo - it is ok to check for nil
@derekparker Do you have thoughts on Yaniv's latest comments? Overall I like your approach! |
Yeah, I'll update the code to take the feedback into account! Sorry for the delay. |
No worries! |
Hi @derekparker, we're running into stability issues in tracee with the current state of libbpfgo, caused by my original error handling changes. Even with your changes we still have failing tests. In #119 we decided to revert back to the original state of error handling so that we can return to stability while we figure error handling out properly, while simultaneously improving tests to avoid this issue in the future. The way forward would be to cherry pick the original commits that were reverted in #112 and base your changes on top of that, in addition to whatever fixes are needed on top of that. I understand that's a good amount of work and i'm not sure what your team's priorities are, so please feel free to let me handle this work if you'd like me to. Happy to hear your thoughts, as always! |
@grantseltzer you can feel free to pick up this work if you'd like! |
I'm closing this for now as we're currently working in some other PR dealing with error handling. |
Details in the commits but essentially we should only check errno if we detect an error condition (commit 1) and we should also still return information from batch operations even if we get
ENOENT
because that just means we've read everything. Not returning the information to the user will cause that data to be lost.