-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tetragon: extract linux_binprm member using CO:RE #1986
tetragon: extract linux_binprm member using CO:RE #1986
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/label release-note/minor |
Thank you for adding it, so this is useful feature as we need the binprm. One point using the path like that for binprm context will limit us in the future, so this binprm in your patches handles the binary being executed, not the interpreter nor executable IIRC how it is called if the execve could have happened from an fd passed file... I will check later to see again the details, but just to say we need one layer maybe of abstraction since strictly speaking path here could be one of those depending on how which stage you are executing the binary and its format. Or we could rename the path to something more explicit, maybe... Anyway this good feature as right now we have this https://github.com/cilium/tetragon/blob/main/examples/tracingpolicy/process-exec/process-exec-elf-begin.yaml but it just catches the late binary elf or flat execution not other misc or shebang execution, so it was on our todo list for a while ;-) We will review, just wait more for feedback before doing extra changes in cases ;-) |
Hey @tixxdz can we maybe have the conversation on the issue as I started here and had similar argument as you have. See #1983 (comment). |
The only change I will maybe make is to fix the CI failure that for some reason did not appear in my personal branch =) |
overall lgtm one nitpick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I have added some comments. There are also some errors in the CI, it would be great if you can fix them. Let me know if anything does not make sense.
I would also like to have a test for that. You can use https://github.com/cilium/tetragon/blob/main/pkg/sensors/tracing/kprobe_test.go as an inspiration. Something like defining a tracing policy to use that, execute a file, and then check that the event that you are getting have the correct fields.
bpf/process/types/basic.h
Outdated
struct linux_binprm *bprm = (struct linux_binprm *)arg; | ||
arg = (unsigned long)(&bprm->file); | ||
|
||
// fallthrough to file_ty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that this is correct here. You expect to go through linux_binprm_type
, file_ty
, and path_ty
but as it is now, my understanding is that it goes through linux_binprm_type
, kiocb_type
, file_ty
, and path_ty
. kiocb_type
affects the value of arg which can cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've switched this around a bit to be (hopefully) correct now!
@@ -55,7 +55,7 @@ type KProbeArg struct { | |||
// +kubebuilder:validation:Minimum=0 | |||
// Position of the argument. | |||
Index uint32 `json:"index"` | |||
// +kubebuilder:validation:Enum=auto;int;uint32;int32;uint64;int64;char_buf;char_iovec;size_t;skb;sock;string;fd;file;filename;path;nop;bpf_attr;perf_event;bpf_map;user_namespace;capability;kiocb;iov_iter;cred;load_info;module;syscall64; | |||
// +kubebuilder:validation:Enum=auto;int;uint32;int32;uint64;int64;char_buf;char_iovec;size_t;skb;sock;string;fd;file;filename;path;nop;bpf_attr;perf_event;bpf_map;user_namespace;capability;kiocb;iov_iter;cred;load_info;module;syscall64;linux_binprm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you are changing the CRD you have also to update the version of that in
const CustomResourceDefinitionSchemaVersion = "1.1.3" |
bpf/process/types/basic.h
Outdated
@@ -2378,6 +2383,12 @@ read_call_arg(void *ctx, struct msg_generic_kprobe *e, int index, int type, | |||
case iov_iter_type: | |||
size = copy_iov_iter(ctx, orig_off, arg, argm, e, data_heap); | |||
break; | |||
case linux_binprm_type: { | |||
struct linux_binprm *bprm = (struct linux_binprm *)arg; | |||
arg = (unsigned long)(&bprm->file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that you should use CORE here as the linux_binprm can have different layout in different kernels. Something like arg = (unsigned long)_(&bprm->file);
in a similar way to the line 2396 would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I've never seen _
used this way! Is this something particular to Tetragon? I expected to see BPF_CORE_READ*
et al for CO:RE ops =)
cc: @vparla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a macro you can find in bpf/lib/bpf_helpers.h
:
#define _(P) (__builtin_preserve_access_index(P))
I also think this is important and you can do perfring tests, it will be more efficient than most of the tests out there, as example, you could use: tetragon/pkg/sensors/tracing/kprobe_test.go Lines 3764 to 3841 in 56d978b
I can help you if you don't get something in the test. |
@mtardy do you want to convert this to a suggestion so we can use it directly? |
Not sure I understand, do you want me to write the tests? Here is a more helpful hint maybe: func typeLinuxBinprmPerfringTest(t *testing.T, operator string, values []string) {
testutils.CaptureLog(t, logger.GetLogger().(*logrus.Logger))
ctx, cancel := context.WithTimeout(context.Background(), tus.Conf().CmdWaitTime)
defer cancel()
if err := observer.InitDataCache(1024); err != nil {
t.Fatalf("observertesthelper.InitDataCache: %s", err)
}
option.Config.HubbleLib = tus.Conf().TetragonLib
tus.LoadSensor(t, base.GetInitialSensor())
tus.LoadSensor(t, testsensor.GetTestSensor())
sm := tus.GetTestSensorManager(ctx, t)
policy := tracingpolicy.GenericTracingPolicy{
Metadata: v1.ObjectMeta{
Name: "match-binaries",
},
Spec: v1alpha1.TracingPolicySpec{
[...] // <-- spec using the type
},
}
err := sm.Manager.AddTracingPolicy(ctx, &policy)
assert.NoError(t, err)
var tailPID, headPID int
ops := func() {
// <-- trigger the hook
}
events := perfring.RunTestEvents(t, ctx, ops)
for _, ev := range events {
if kprobe, ok := ev.(*tracing.MsgGenericKprobeUnix); ok {
// <-- verify you get the type value/matching whatever correct
}
}
} |
My fault, I thought you had written the tests, but only as a comment. |
@mtardy tests added, but they're failing for an unknown reason |
264116a
to
9cd7ba6
Compare
I see you are pushing a lot of commits, I'll prefer taking a look when it's stable, just tell me :) |
I think it's okay to go now, sorry for the many commits! I now know that CI needs to be manually triggered, so I'll reach out in future before spamming the project with PRs. I think dave@dev:~/tetragon$ sudo go test -count=1 ./pkg/sensors/tracing -run TestLinuxBinprmExtractPath
[sudo] password for dave:
ok github.com/cilium/tetragon/pkg/sensors/tracing 1.000s |
Hey, checkpatch gives you details on what could be done for each commit to make the PR ready to be merged (we just rebase and merge)? For that could you maybe rebase on main and also rebase your branch to squash some commits. I guess you could keep a few, don't squash everything into one but we don't need commits like "whitespace" or "go fmt" when we merge. See the "logical commit" explanation here https://tetragon.io/docs/contribution-guide/making-changes/, for example here you can still separate bpf/api/tests changes maybe. |
Sure! Rather than rebasing/squashing individual commits in this 2 week-old branch, is it okay if I just recreate this PR on top of today's |
bace983
to
6d4c213
Compare
6d4c213
to
481ae13
Compare
bpf/process/types/basic.h
Outdated
@@ -2398,6 +2392,12 @@ read_call_arg(void *ctx, struct msg_generic_kprobe *e, int index, int type, | |||
arg = (unsigned long)file; | |||
} | |||
// fallthrough to file_ty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now this will rewrite the arg from kiocb_type that is expecting to also fall through? I think my abuse of fallthroughs has reached its limit of usability? Its a bit tricky to get this good on older kernels. We don't want to inline the code in duplicate chunks repeating here for insns count reasons.
Maybe an inline would get optimized to a goto by compiler so it would be ok? Other option would be a goto to file handler. Anyways as is it needs a fix right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if there are no cases where struct linux_binrpm
wraps a struct kiocb
, we should be okay, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump @jrfastab
edit: Rebased =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think its safe to fall through here. In the above we probe_read kiocb->ki_filp
but in below we read bprm->file. I doubt those struct layouts are the same and don't have any way to guarentee that across kernel versions anyways. It really needs to be a break and keep separate cases separate.
Otherwise as I read it now. If I tried to get the kiocb_type I would get garbage from whatever happened to be at the offset of file in a linx_binprm from a kiocb struct.
481ae13
to
fb64858
Compare
Thanks for the tips! I'm trying to run the tests as you mentioned, by running
Logging in to the VM as root, we can see that qemu has started the proper image:
However, when I'm in the VM, I can't find a
Is this expected? 🤔 |
e676a71
to
e4e0640
Compare
Hey @tixxdz, I was actually able to run tetragon as you suggested, on 4.19 using vmtests/qemu. The verifier error we're getting here is: I tried changing buffer sizes to 512, 256, 64 and 8, but the same error occurs in each case. I've gone ahead and changed the length of the buffer back to Tracing policy: apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "sample-no-exec-id"
spec:
kprobes:
- call: "security_bprm_check"
syscall: false
args:
- index: 0
type: "linux_binprm"
returnArg:
index: 0
type: "int"
selectors:
- matchArgs:
- index: 0
operator: "Equal"
values:
- "/usr/bin/sample-exec" Testing log: vmtests-4.19-binprm.txt |
cbccf48
to
2609ee6
Compare
The verifier for 4.19 returns E2BIG in only a handful of places... Interestingly, the 1million insn limit was introduced in 5.2, which makes sense given our results (tests are passing on kernels >= 5.14). Bummer we have to support < 5.2 =(. |
could you just enable it for large programs? like
also it'd be great to split the changes and have at least bpf and pkg changes in separate commits, thanks |
17eb9ff
to
f1e6a1d
Compare
Thanks for the tips! The
Would be happy to do this, but IIUC this seems to contradict with an earlier request by @jrfastab .
Thanks! |
Yes we have some users on old kernels |
@olsajiri we can surely merge this as it is now, then we split the read_call_arg() stuff later? Also please let me know if you plan todo it, cause there is also another PR by a contributor that could be affected after this change, this way you save us the trouble ;-) |
@dwindsor Generally as a matter of style we try to break commits up between BPF and userspace when it makes sense. My comment was intended to be specific to the single patch that just added CO-RE annotations to the reads in BPF. I didn't want to have BPF patch without the CORE style lookups and then add it later because if we bisect for some reason at that point it could introduce a bug. The bug being without CO-RE you will do some fixed offset lookup that probably only works correctly on some subset of kernels. |
@dwindsor I'm OK to take this as a single patch. Just take it as a hint for next time. |
Does anyone know what this CI error means: https://github.com/cilium/tetragon/actions/runs/7920381905/job/21630651516?pr=1986 |
This is complaining about a broken link in README.md that's not even touched by this PR. The link is broken in README.md |
struct linux_binprm contains valuable context regarding execution of new programs. Extract the path member from struct linux_binprm using CO:RE and make it available for use in TracingPolicy's. Signed-off-by: David Windsor <dawindso@cisco.com>
f1e6a1d
to
a240afc
Compare
One of the CI test failed, I tracked it here: #2110 and scheduled another run, thanks |
Thanks a lot! Looks good. |
See #1983 .