Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errno better and enoent better for batch operations #112

Conversation

derekparker
Copy link
Contributor

Details in the commits but essentially we should only check errno if we detect an error condition (commit 1) and we should also still return information from batch operations even if we get ENOENT because that just means we've read everything. Not returning the information to the user will cause that data to be lost.

It's possible to get an errno value from a successful libbpf call as
errno isn't reset before each CGO call. Instead of checking errno
unconditionally after each libbpf call, only check it once we've
determined we have actually hit an error condition.

This fixes a bunch of breakages and can be verified by running the
selftests and ensuring they all pass.
Certain bpf calls can return ENOENT like when a map is empty. This can
also happen during batch operations when we've read all the data. In
this situation, we should still propagate the error but we should also
return the data to the user.
@@ -339,7 +339,7 @@ func NewModuleFromFileArgs(args NewModuleArgs) (*Module, error) {
}

obj, errno := C.bpf_object__open_file(bpfFile, &opts)
if errno != nil {
if obj == nil && errno != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is checking errno not already enough to make sure there's an error? It seems that Go does clear errno before each C function call.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the comment of @grantseltzer
libbpf might return a non-null value in some cases.
For example: https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L7080.
In that case, we will mistakenly return that no error occured

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if Go does clear it before the call it's still possible that something within the call sets errno but we still don't actually have an error condition, the call that we originally made did succeed.

Without this change a few of the selftests error out. Also in this patch is a fix for map batch operations where errno could equal ENOENT. This happens when we've read an entire map, however we may still have gotten some data back so this patch fixes things to ensure whatever data we got is still returned, even if it's less than the batch amount.

@@ -339,7 +339,7 @@ func NewModuleFromFileArgs(args NewModuleArgs) (*Module, error) {
}

obj, errno := C.bpf_object__open_file(bpfFile, &opts)
if errno != nil {
if obj == nil && errno != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the comment of @grantseltzer
libbpf might return a non-null value in some cases.
For example: https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L7080.
In that case, we will mistakenly return that no error occured

@@ -381,7 +381,7 @@ func NewModuleFromBufferArgs(args NewModuleArgs) (*Module, error) {
}

obj, errno := C.bpf_object__open_mem(bpfBuff, bpfBuffSize, &opts)
if errno != nil {
if obj == nil && errno != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

_, errno := C.bpf_object__load(m.obj)
if errno != nil {
ret, errno := C.bpf_object__load(m.obj)
if ret < 0 && errno != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good test, as bpf_object__load returns an int, and uses libbpf_err to set errno in case of an error:

static inline int libbpf_err(int ret)
{
	if (ret < 0)
		errno = -ret;
	return ret;
}

Same for all places that don't return a pointer

@@ -430,7 +430,7 @@ func (m *Module) GetMap(mapName string) (*BPFMap, error) {
cs := C.CString(mapName)
bpfMap, errno := C.bpf_object__find_map_by_name(m.obj, cs)
C.free(unsafe.Pointer(cs))
if errno != nil {
if bpfMap == nil && errno != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is ok to check for null bpfMap:
https://github.com/libbpf/libbpf/blob/master/src/libbpf.c#L9394

This inconsistency in libbpf is really confusing. I think it should be fixed before 1.0

@@ -1057,7 +1067,7 @@ func doAttachKprobeLegacy(prog *BPFProg, kp string, isKretprobe bool) (*BPFLink,
cbool := C.bool(isKretprobe)
link, errno := C.attach_kprobe_legacy(prog.prog, cs, cbool)
C.free(unsafe.Pointer(cs))
if errno != nil {
if link == nil && errno != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is implemented in libbpfgo - it is ok to check for nil

@@ -1094,7 +1104,7 @@ func (m *Module) InitRingBuf(mapName string, eventsChan chan []byte) (*RingBuffe
}

rb, errno := C.init_ring_buf(bpfMap.fd, C.uintptr_t(slot))
if errno != nil {
if rb == nil && errno != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is implemented in libbpfgo - it is ok to check for nil

@grantseltzer
Copy link
Contributor

@derekparker Do you have thoughts on Yaniv's latest comments? Overall I like your approach!

@derekparker
Copy link
Contributor Author

@derekparker Do you have thoughts on Yaniv's latest comments? Overall I like your approach!

Yeah, I'll update the code to take the feedback into account! Sorry for the delay.

@grantseltzer
Copy link
Contributor

No worries!

@grantseltzer
Copy link
Contributor

Hi @derekparker, we're running into stability issues in tracee with the current state of libbpfgo, caused by my original error handling changes. Even with your changes we still have failing tests. In #119 we decided to revert back to the original state of error handling so that we can return to stability while we figure error handling out properly, while simultaneously improving tests to avoid this issue in the future.

The way forward would be to cherry pick the original commits that were reverted in #112 and base your changes on top of that, in addition to whatever fixes are needed on top of that. I understand that's a good amount of work and i'm not sure what your team's priorities are, so please feel free to let me handle this work if you'd like me to. Happy to hear your thoughts, as always!

@derekparker
Copy link
Contributor Author

@grantseltzer you can feel free to pick up this work if you'd like!

@grantseltzer grantseltzer self-assigned this Feb 7, 2022
@rafaeldtinoco
Copy link
Contributor

I'm closing this for now as we're currently working in some other PR dealing with error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants