Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: extract linux_binprm member using CO:RE #1986

Merged
merged 1 commit into from
Feb 16, 2024

Conversation

dwindsor
Copy link
Collaborator

See #1983 .

@dwindsor dwindsor requested a review from a team as a code owner January 16, 2024 16:51
@dwindsor dwindsor requested a review from tpapagian January 16, 2024 16:51
Copy link

netlify bot commented Jan 16, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 763b419
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/65ce4d0372b3340008634352
😎 Deploy Preview https://deploy-preview-1986--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kkourt kkourt self-requested a review January 16, 2024 17:13
@mtardy mtardy self-requested a review January 16, 2024 17:16
@dwindsor
Copy link
Collaborator Author

/label release-note/minor

@tpapagian tpapagian added the release-note/minor This PR introduces a minor user-visible change label Jan 16, 2024
@tixxdz
Copy link
Member

tixxdz commented Jan 16, 2024

Thank you for adding it, so this is useful feature as we need the binprm.

One point using the path like that for binprm context will limit us in the future, so this binprm in your patches handles the binary being executed, not the interpreter nor executable IIRC how it is called if the execve could have happened from an fd passed file... I will check later to see again the details, but just to say we need one layer maybe of abstraction since strictly speaking path here could be one of those depending on how which stage you are executing the binary and its format. Or we could rename the path to something more explicit, maybe...

Anyway this good feature as right now we have this https://github.com/cilium/tetragon/blob/main/examples/tracingpolicy/process-exec/process-exec-elf-begin.yaml but it just catches the late binary elf or flat execution not other misc or shebang execution, so it was on our todo list for a while ;-)

We will review, just wait more for feedback before doing extra changes in cases ;-)

@mtardy
Copy link
Member

mtardy commented Jan 16, 2024

Thank you for adding it, so this is useful feature as we need the binprm.

One point using the path like that for binprm context will limit us in the future, so binprm handles the binary being executed, not the interpreter, and executable IIRC how it is called if the execve could have happened from an fd passed file... I will check later to see again the details, but just to say we need one layer maybe of abstraction since strictly speaking path here could be one of those depending on how which stage you are executing the binary and its format. Or we could rename the path to something more explicit, maybe...

Anyway this good feature as right now we have this https://github.com/cilium/tetragon/blob/main/examples/tracingpolicy/process-exec/process-exec-elf-begin.yaml but it just catches the late binary elf or flat execution not other misc or shebang execution, so it was on our todo list for a while ;-)

We will review, just wait more for feedback before doing extra changes in cases ;-)

Hey @tixxdz can we maybe have the conversation on the issue as I started here and had similar argument as you have. See #1983 (comment).

@dwindsor
Copy link
Collaborator Author

Thank you for adding it, so this is useful feature as we need the binprm.

One point using the path like that for binprm context will limit us in the future, so binprm handles the binary being executed, not the interpreter, and executable IIRC how it is called if the execve could have happened from an fd passed file... I will check later to see again the details, but just to say we need one layer maybe of abstraction since strictly speaking path here could be one of those depending on how which stage you are executing the binary and its format. Or we could rename the path to something more explicit, maybe...

Anyway this good feature as right now we have this https://github.com/cilium/tetragon/blob/main/examples/tracingpolicy/process-exec/process-exec-elf-begin.yaml but it just catches the late binary elf or flat execution not other misc or shebang execution, so it was on our todo list for a while ;-)

We will review, just wait more for feedback before doing extra changes in cases ;-)

The only change I will maybe make is to fix the CI failure that for some reason did not appear in my personal branch =)

@jrfastab
Copy link
Contributor

overall lgtm one nitpick.

Copy link
Member

@tpapagian tpapagian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have added some comments. There are also some errors in the CI, it would be great if you can fix them. Let me know if anything does not make sense.

I would also like to have a test for that. You can use https://github.com/cilium/tetragon/blob/main/pkg/sensors/tracing/kprobe_test.go as an inspiration. Something like defining a tracing policy to use that, execute a file, and then check that the event that you are getting have the correct fields.

struct linux_binprm *bprm = (struct linux_binprm *)arg;
arg = (unsigned long)(&bprm->file);

// fallthrough to file_ty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that this is correct here. You expect to go through linux_binprm_type, file_ty, and path_ty but as it is now, my understanding is that it goes through linux_binprm_type, kiocb_type, file_ty, and path_ty. kiocb_type affects the value of arg which can cause issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've switched this around a bit to be (hopefully) correct now!

@@ -55,7 +55,7 @@ type KProbeArg struct {
// +kubebuilder:validation:Minimum=0
// Position of the argument.
Index uint32 `json:"index"`
// +kubebuilder:validation:Enum=auto;int;uint32;int32;uint64;int64;char_buf;char_iovec;size_t;skb;sock;string;fd;file;filename;path;nop;bpf_attr;perf_event;bpf_map;user_namespace;capability;kiocb;iov_iter;cred;load_info;module;syscall64;
// +kubebuilder:validation:Enum=auto;int;uint32;int32;uint64;int64;char_buf;char_iovec;size_t;skb;sock;string;fd;file;filename;path;nop;bpf_attr;perf_event;bpf_map;user_namespace;capability;kiocb;iov_iter;cred;load_info;module;syscall64;linux_binprm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are changing the CRD you have also to update the version of that in

const CustomResourceDefinitionSchemaVersion = "1.1.3"
(i.e. from 1.1.3 to 1.1.4).

@@ -2378,6 +2383,12 @@ read_call_arg(void *ctx, struct msg_generic_kprobe *e, int index, int type,
case iov_iter_type:
size = copy_iov_iter(ctx, orig_off, arg, argm, e, data_heap);
break;
case linux_binprm_type: {
struct linux_binprm *bprm = (struct linux_binprm *)arg;
arg = (unsigned long)(&bprm->file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that you should use CORE here as the linux_binprm can have different layout in different kernels. Something like arg = (unsigned long)_(&bprm->file); in a similar way to the line 2396 would be fine.

Copy link
Collaborator Author

@dwindsor dwindsor Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I've never seen _ used this way! Is this something particular to Tetragon? I expected to see BPF_CORE_READ* et al for CO:RE ops =)

cc: @vparla

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a macro you can find in bpf/lib/bpf_helpers.h:

#define _(P) (__builtin_preserve_access_index(P))

@mtardy
Copy link
Member

mtardy commented Jan 18, 2024

I would also like to have a test for that. You can use https://github.com/cilium/tetragon/blob/main/pkg/sensors/tracing/kprobe_test.go as an inspiration. Something like defining a tracing policy to use that, execute a file, and then check that the event that you are getting have the correct fields.

I also think this is important and you can do perfring tests, it will be more efficient than most of the tests out there, as example, you could use:

// matchBinariesPerfringTest checks that the matchBinaries do correctly
// filter the events i.e. it checks that no other events appear.
func matchBinariesPerfringTest(t *testing.T, operator string, values []string) {
testutils.CaptureLog(t, logger.GetLogger().(*logrus.Logger))
ctx, cancel := context.WithTimeout(context.Background(), tus.Conf().CmdWaitTime)
defer cancel()
if err := observer.InitDataCache(1024); err != nil {
t.Fatalf("observertesthelper.InitDataCache: %s", err)
}
option.Config.HubbleLib = tus.Conf().TetragonLib
tus.LoadSensor(t, base.GetInitialSensor())
tus.LoadSensor(t, testsensor.GetTestSensor())
sm := tus.GetTestSensorManager(ctx, t)
matchBinariesTracingPolicy := tracingpolicy.GenericTracingPolicy{
Metadata: v1.ObjectMeta{
Name: "match-binaries",
},
Spec: v1alpha1.TracingPolicySpec{
KProbes: []v1alpha1.KProbeSpec{
{
Call: "fd_install",
Selectors: []v1alpha1.KProbeSelector{
{
MatchBinaries: []v1alpha1.BinarySelector{
{
Operator: operator,
Values: values,
},
},
},
},
},
},
},
}
err := sm.Manager.AddTracingPolicy(ctx, &matchBinariesTracingPolicy)
assert.NoError(t, err)
var tailPID, headPID int
ops := func() {
tailCmd := exec.Command("/usr/bin/tail", "/etc/passwd")
headCmd := exec.Command("/usr/bin/head", "/etc/passwd")
err := tailCmd.Start()
assert.NoError(t, err)
tailPID = tailCmd.Process.Pid
err = headCmd.Start()
assert.NoError(t, err)
headPID = headCmd.Process.Pid
err = tailCmd.Wait()
assert.NoError(t, err)
err = headCmd.Wait()
assert.NoError(t, err)
}
events := perfring.RunTestEvents(t, ctx, ops)
tailEventExist := false
for _, ev := range events {
if kprobe, ok := ev.(*tracing.MsgGenericKprobeUnix); ok {
if int(kprobe.ProcessKey.Pid) == tailPID {
tailEventExist = true
continue
}
if int(kprobe.ProcessKey.Pid) == headPID {
t.Error("kprobe event triggered by /usr/bin/head should be filtered by the matchBinaries selector")
break
}
}
}
if !tailEventExist {
t.Error("kprobe event triggered by /usr/bin/tail should be present, unfiltered by the matchBinaries selector")
}
}

I can help you if you don't get something in the test.

@dwindsor
Copy link
Collaborator Author

I would also like to have a test for that. You can use https://github.com/cilium/tetragon/blob/main/pkg/sensors/tracing/kprobe_test.go as an inspiration. Something like defining a tracing policy to use that, execute a file, and then check that the event that you are getting have the correct fields.

I also think this is important and you can do perfring tests, it will be more efficient than most of the tests out there, as example, you could use:

// matchBinariesPerfringTest checks that the matchBinaries do correctly
// filter the events i.e. it checks that no other events appear.
func matchBinariesPerfringTest(t *testing.T, operator string, values []string) {
testutils.CaptureLog(t, logger.GetLogger().(*logrus.Logger))
ctx, cancel := context.WithTimeout(context.Background(), tus.Conf().CmdWaitTime)
defer cancel()
if err := observer.InitDataCache(1024); err != nil {
t.Fatalf("observertesthelper.InitDataCache: %s", err)
}
option.Config.HubbleLib = tus.Conf().TetragonLib
tus.LoadSensor(t, base.GetInitialSensor())
tus.LoadSensor(t, testsensor.GetTestSensor())
sm := tus.GetTestSensorManager(ctx, t)
matchBinariesTracingPolicy := tracingpolicy.GenericTracingPolicy{
Metadata: v1.ObjectMeta{
Name: "match-binaries",
},
Spec: v1alpha1.TracingPolicySpec{
KProbes: []v1alpha1.KProbeSpec{
{
Call: "fd_install",
Selectors: []v1alpha1.KProbeSelector{
{
MatchBinaries: []v1alpha1.BinarySelector{
{
Operator: operator,
Values: values,
},
},
},
},
},
},
},
}
err := sm.Manager.AddTracingPolicy(ctx, &matchBinariesTracingPolicy)
assert.NoError(t, err)
var tailPID, headPID int
ops := func() {
tailCmd := exec.Command("/usr/bin/tail", "/etc/passwd")
headCmd := exec.Command("/usr/bin/head", "/etc/passwd")
err := tailCmd.Start()
assert.NoError(t, err)
tailPID = tailCmd.Process.Pid
err = headCmd.Start()
assert.NoError(t, err)
headPID = headCmd.Process.Pid
err = tailCmd.Wait()
assert.NoError(t, err)
err = headCmd.Wait()
assert.NoError(t, err)
}
events := perfring.RunTestEvents(t, ctx, ops)
tailEventExist := false
for _, ev := range events {
if kprobe, ok := ev.(*tracing.MsgGenericKprobeUnix); ok {
if int(kprobe.ProcessKey.Pid) == tailPID {
tailEventExist = true
continue
}
if int(kprobe.ProcessKey.Pid) == headPID {
t.Error("kprobe event triggered by /usr/bin/head should be filtered by the matchBinaries selector")
break
}
}
}
if !tailEventExist {
t.Error("kprobe event triggered by /usr/bin/tail should be present, unfiltered by the matchBinaries selector")
}
}

I can help you if you don't get something in the test.

@mtardy do you want to convert this to a suggestion so we can use it directly?

@mtardy
Copy link
Member

mtardy commented Jan 19, 2024

@mtardy do you want to convert this to a suggestion so we can use it directly?

Not sure I understand, do you want me to write the tests?

Here is a more helpful hint maybe:

 func typeLinuxBinprmPerfringTest(t *testing.T, operator string, values []string) { 
 	testutils.CaptureLog(t, logger.GetLogger().(*logrus.Logger)) 
 	ctx, cancel := context.WithTimeout(context.Background(), tus.Conf().CmdWaitTime) 
 	defer cancel() 
  
 	if err := observer.InitDataCache(1024); err != nil { 
 		t.Fatalf("observertesthelper.InitDataCache: %s", err) 
 	} 
  
 	option.Config.HubbleLib = tus.Conf().TetragonLib 
 	tus.LoadSensor(t, base.GetInitialSensor()) 
 	tus.LoadSensor(t, testsensor.GetTestSensor()) 
 	sm := tus.GetTestSensorManager(ctx, t) 
  
 	policy := tracingpolicy.GenericTracingPolicy{ 
 		Metadata: v1.ObjectMeta{ 
 			Name: "match-binaries", 
 		}, 
 		Spec: v1alpha1.TracingPolicySpec{ 
 			[...] // <-- spec using the type
 		}, 
 	} 
  
 	err := sm.Manager.AddTracingPolicy(ctx, &policy) 
 	assert.NoError(t, err) 
  
 	var tailPID, headPID int 
 	ops := func() { 
           // <-- trigger the hook
 	} 
 	events := perfring.RunTestEvents(t, ctx, ops) 
  
 	for _, ev := range events { 
 		if kprobe, ok := ev.(*tracing.MsgGenericKprobeUnix); ok { 
 			// <-- verify you get the type value/matching whatever correct
 		} 
 	} 
 } 

@dwindsor
Copy link
Collaborator Author

@mtardy do you want to convert this to a suggestion so we can use it directly?

Not sure I understand, do you want me to write the tests?

Here is a more helpful hint maybe:

 func typeLinuxBinprmPerfringTest(t *testing.T, operator string, values []string) { 
 	testutils.CaptureLog(t, logger.GetLogger().(*logrus.Logger)) 
 	ctx, cancel := context.WithTimeout(context.Background(), tus.Conf().CmdWaitTime) 
 	defer cancel() 
  
 	if err := observer.InitDataCache(1024); err != nil { 
 		t.Fatalf("observertesthelper.InitDataCache: %s", err) 
 	} 
  
 	option.Config.HubbleLib = tus.Conf().TetragonLib 
 	tus.LoadSensor(t, base.GetInitialSensor()) 
 	tus.LoadSensor(t, testsensor.GetTestSensor()) 
 	sm := tus.GetTestSensorManager(ctx, t) 
  
 	policy := tracingpolicy.GenericTracingPolicy{ 
 		Metadata: v1.ObjectMeta{ 
 			Name: "match-binaries", 
 		}, 
 		Spec: v1alpha1.TracingPolicySpec{ 
 			[...] // <-- spec using the type
 		}, 
 	} 
  
 	err := sm.Manager.AddTracingPolicy(ctx, &policy) 
 	assert.NoError(t, err) 
  
 	var tailPID, headPID int 
 	ops := func() { 
           // <-- trigger the hook
 	} 
 	events := perfring.RunTestEvents(t, ctx, ops) 
  
 	for _, ev := range events { 
 		if kprobe, ok := ev.(*tracing.MsgGenericKprobeUnix); ok { 
 			// <-- verify you get the type value/matching whatever correct
 		} 
 	} 
 } 

My fault, I thought you had written the tests, but only as a comment.

@dwindsor
Copy link
Collaborator Author

@mtardy tests added, but they're failing for an unknown reason

@dwindsor dwindsor force-pushed the pr/dwindsor/path-from-linux-binprm branch from 264116a to 9cd7ba6 Compare January 23, 2024 16:02
@mtardy
Copy link
Member

mtardy commented Jan 23, 2024

@mtardy tests added, but they're failing for an unknown reason

I see you are pushing a lot of commits, I'll prefer taking a look when it's stable, just tell me :)

@dwindsor
Copy link
Collaborator Author

I see you are pushing a lot of commits, I'll prefer taking a look when it's stable, just tell me :)

I think it's okay to go now, sorry for the many commits! I now know that CI needs to be manually triggered, so I'll reach out in future before spamming the project with PRs.

I think d02c9f20 fixed the failing CI test! At least it does so on my system:

dave@dev:~/tetragon$ sudo go test -count=1 ./pkg/sensors/tracing -run TestLinuxBinprmExtractPath
[sudo] password for dave:
ok  	github.com/cilium/tetragon/pkg/sensors/tracing	1.000s

@mtardy
Copy link
Member

mtardy commented Jan 25, 2024

Hey, checkpatch gives you details on what could be done for each commit to make the PR ready to be merged (we just rebase and merge)?

For that could you maybe rebase on main and also rebase your branch to squash some commits. I guess you could keep a few, don't squash everything into one but we don't need commits like "whitespace" or "go fmt" when we merge. See the "logical commit" explanation here https://tetragon.io/docs/contribution-guide/making-changes/, for example here you can still separate bpf/api/tests changes maybe.

@dwindsor
Copy link
Collaborator Author

dwindsor commented Jan 26, 2024

Hey, checkpatch gives you details on what could be done for each commit to make the PR ready to be merged (we just rebase and merge)?

For that could you maybe rebase on main and also rebase your branch to squash some commits. I guess you could keep a few, don't squash everything into one but we don't need commits like "whitespace" or "go fmt" when we merge. See the "logical commit" explanation here https://tetragon.io/docs/contribution-guide/making-changes/, for example here you can still separate bpf/api/tests changes maybe.

Sure! Rather than rebasing/squashing individual commits in this 2 week-old branch, is it okay if I just recreate this PR on top of today's main? I've done so in #2028 , it might prove easier to follow in the end?

@mtardy mtardy force-pushed the pr/dwindsor/path-from-linux-binprm branch from bace983 to 6d4c213 Compare January 26, 2024 19:21
@dwindsor dwindsor force-pushed the pr/dwindsor/path-from-linux-binprm branch from 6d4c213 to 481ae13 Compare January 26, 2024 19:29
@@ -2398,6 +2392,12 @@ read_call_arg(void *ctx, struct msg_generic_kprobe *e, int index, int type,
arg = (unsigned long)file;
}
// fallthrough to file_ty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now this will rewrite the arg from kiocb_type that is expecting to also fall through? I think my abuse of fallthroughs has reached its limit of usability? Its a bit tricky to get this good on older kernels. We don't want to inline the code in duplicate chunks repeating here for insns count reasons.

Maybe an inline would get optimized to a goto by compiler so it would be ok? Other option would be a goto to file handler. Anyways as is it needs a fix right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if there are no cases where struct linux_binrpm wraps a struct kiocb, we should be okay, no?

Copy link
Collaborator Author

@dwindsor dwindsor Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump @jrfastab

edit: Rebased =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its safe to fall through here. In the above we probe_read kiocb->ki_filp but in below we read bprm->file. I doubt those struct layouts are the same and don't have any way to guarentee that across kernel versions anyways. It really needs to be a break and keep separate cases separate.

Otherwise as I read it now. If I tried to get the kiocb_type I would get garbage from whatever happened to be at the offset of file in a linx_binprm from a kiocb struct.

@kkourt kkourt added the needs-rebase This PR needs to be rebased because it has merge conflicts. label Jan 29, 2024
@dwindsor dwindsor force-pushed the pr/dwindsor/path-from-linux-binprm branch from 481ae13 to fb64858 Compare January 30, 2024 02:48
@dwindsor
Copy link
Collaborator Author

dwindsor commented Feb 12, 2024

Much appreciated! lgtm assuming all tests pass.
There are still things that we can improve in the test as an example, but as long as we get the event personally I'm fine, so I would say let's merge it, and we may improve or change some minor stuff before releasing maybe... ;-)

Kudos to you for the help along the way! Still one CI failure, but only for the kernel 4.19 case:

154/507 tests failed 😞 (took: 12m17.91s, skipped:74)
Error: Process completed with exit code 1.

This one is odd because the other tests in this suite succeeded...

Ok this one is probably a kernel bpf limitation, triggered by bpf verifier and errors could be a bit generic... :-/

So usually how we reproduce and track those is documented here: https://github.com/cilium/tetragon/tree/main/tests/vmtests , you may need to run with sudo when starting the vm.

We have the verbose 3 flag option to dump verifier logs in Tetragon, but can't recall how to use that in tests. So how I usually do it is I compile tetragon on host, run the vm with old kernel. Check tetragon-vmtests-run --help it has a flag --just-boot where then you login as root and run tetragon with --tracing-policy yournewpolicy.yaml --verbose 3 to get a dump of bpf errors. You will also need to pass the --btf tests/vmtests/test-data/kernels/4.19/boot/btf-4.19.262 to tetragon.

Could be two things: you have a buffer with 1024 MAX_STRING what happens if you make it 128 or 256 , or those fallthrough we have are some bpf optimizations and I guess we just reached the limit and we may be forced to work this part of the code unfortunately... :-/

Also can you run the test on 4.9 with a small buffer? just follow https://github.com/cilium/tetragon/tree/main/tests/vmtests for 4.19 to run your single test is a good start.

Thanks for the tips!

I'm trying to run the tests as you mentioned, by running tetragon-vm-tests with --just-boot:

dave@dev:~/src/github.com/dwindsor/tetragon$ sudo ./tests/vmtests/tetragon-vmtests-run --kernel tests/vmtests/test-data/kernels/4.19/boot/vmlinuz-4.19.262 --base tests/vmtests/test-data/images/base.qcow2 --qemu-disable-kvm --just-boot

Logging in to the VM as root, we can see that qemu has started the proper image:

root@tetragon:~# uname -a
Linux tetragon 4.19.262 #1 SMP Thu Oct 27 16:52:10 UTC 2022 x86_64 GNU/Linux

However, when I'm in the VM, I can't find a tetragon binary, only tetragon-tester which doesn't appear to take any command-line options:

root@tetragon:~# sudo /usr/sbin/tetragon-tester --help
Linux tetragon 4.19.262 #1 SMP Thu Oct 27 16:52:10 UTC 2022
.....
Running test pkg.filters.TestPodRegexFilterInvalidEvent > succeeded after 518.03142ms
Running test pkg.filters.TestPolicyNamesFilterInvalidEvent > succeeded after 489.0174ms
Running test pkg.filters.TestPolicyNamesFilterCorrectValue > succeeded after 494.60026ms
Running test pkg.filters.TestPolicyNamesFilterEmptyValue > succeeded after 483.02676ms
Running test pkg.filters.TestPolicyNamesFilterNilValue > succeeded after 495.10991ms

Is this expected? 🤔

@dwindsor dwindsor force-pushed the pr/dwindsor/path-from-linux-binprm branch 7 times, most recently from e676a71 to e4e0640 Compare February 13, 2024 19:51
@dwindsor
Copy link
Collaborator Author

Hey @tixxdz, I was actually able to run tetragon as you suggested, on 4.19 using vmtests/qemu.

The verifier error we're getting here is: load program: argument list too long

I tried changing buffer sizes to 512, 256, 64 and 8, but the same error occurs in each case. I've gone ahead and changed the length of the buffer back to MAX_STRING, because this doesn't seem to be the cause.

Tracing policy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "sample-no-exec-id"
spec:
  kprobes:
  - call: "security_bprm_check"
    syscall: false
    args:
    - index: 0
      type: "linux_binprm"
    returnArg:
      index: 0
      type: "int"
    selectors:
      - matchArgs:
          - index: 0
            operator: "Equal"
            values:
              - "/usr/bin/sample-exec"

Testing log: vmtests-4.19-binprm.txt

@dwindsor dwindsor force-pushed the pr/dwindsor/path-from-linux-binprm branch 2 times, most recently from cbccf48 to 2609ee6 Compare February 14, 2024 19:18
@dwindsor
Copy link
Collaborator Author

dwindsor commented Feb 14, 2024

The verifier for 4.19 returns E2BIG in only a handful of places...

Interestingly, the 1million insn limit was introduced in 5.2, which makes sense given our results (tests are passing on kernels >= 5.14). Bummer we have to support < 5.2 =(.

@olsajiri
Copy link
Contributor

The verifier for 4.19 returns E2BIG in only a handful of places...

Interestingly, the 1million insn limit was introduced in 5.2, which makes sense given our results (tests are passing on kernels >= 5.14). Bummer we have to support < 5.2 =(.

could you just enable it for large programs? like

index 26398aad7363..9934a4e67d70 100644
--- a/bpf/process/types/basic.h
+++ b/bpf/process/types/basic.h
@@ -2568,6 +2568,7 @@ read_call_arg(void *ctx, struct msg_generic_kprobe *e, int index, int type,
                        return -1;
                }
        } break;
+#ifdef __LARGE_BPF_PROG
        case linux_binprm_type: {
                struct linux_binprm *bprm = (struct linux_binprm *)arg;
                struct file *file;
@@ -2577,6 +2578,7 @@ read_call_arg(void *ctx, struct msg_generic_kprobe *e, int index, int type,
                path_arg = _(&file->f_path);
                size = copy_path(args, path_arg);
        } break;
+#endif
        case filename_ty: {
                struct filename *file;
                probe_read(&file, sizeof(file), &arg);

also it'd be great to split the changes and have at least bpf and pkg changes in separate commits,
plus could you please add some example under examples/tracingpolicy ?

thanks

@dwindsor dwindsor force-pushed the pr/dwindsor/path-from-linux-binprm branch 2 times, most recently from 17eb9ff to f1e6a1d Compare February 15, 2024 18:09
@dwindsor
Copy link
Collaborator Author

dwindsor commented Feb 15, 2024

Thanks for the tips! The #ifdef fixes the failing test for 4.19. Example tracingpolicy added. 👍

also it'd be great to split the changes and have at least bpf and pkg changes in separate commits, plus could you please add some example under examples/tracingpolicy ?

Would be happy to do this, but IIUC this seems to contradict with an earlier request by @jrfastab .

thanks

Thanks!

@tixxdz
Copy link
Member

tixxdz commented Feb 15, 2024

The verifier for 4.19 returns E2BIG in only a handful of places...

Interestingly, the 1million insn limit was introduced in 5.2, which makes sense given our results (tests are passing on kernels >= 5.14). Bummer we have to support < 5.2 =(.

Yes we have some users on old kernels

@tixxdz
Copy link
Member

tixxdz commented Feb 15, 2024

@olsajiri we can surely merge this as it is now, then we split the read_call_arg() stuff later? Also please let me know if you plan todo it, cause there is also another PR by a contributor that could be affected after this change, this way you save us the trouble ;-)

@jrfastab
Copy link
Contributor

@dwindsor Generally as a matter of style we try to break commits up between BPF and userspace when it makes sense. My comment was intended to be specific to the single patch that just added CO-RE annotations to the reads in BPF. I didn't want to have BPF patch without the CORE style lookups and then add it later because if we bisect for some reason at that point it could introduce a bug. The bug being without CO-RE you will do some fixed offset lookup that probably only works correctly on some subset of kernels.

@jrfastab
Copy link
Contributor

@dwindsor I'm OK to take this as a single patch. Just take it as a hint for next time.

@dwindsor
Copy link
Collaborator Author

dwindsor commented Feb 15, 2024

Does anyone know what this CI error means:

https://github.com/cilium/tetragon/actions/runs/7920381905/job/21630651516?pr=1986

@dwindsor
Copy link
Collaborator Author

Does anyone know what this CI error means:

https://github.com/cilium/tetragon/actions/runs/7920381905/job/21630651516?pr=1986

This is complaining about a broken link in README.md that's not even touched by this PR. The link is broken in README.md HEAD.

struct linux_binprm contains valuable context regarding execution of new programs.
Extract the path member from struct linux_binprm using CO:RE and make it available
for use in TracingPolicy's.

Signed-off-by: David Windsor <dawindso@cisco.com>
@dwindsor dwindsor force-pushed the pr/dwindsor/path-from-linux-binprm branch from f1e6a1d to a240afc Compare February 15, 2024 22:58
@tixxdz
Copy link
Member

tixxdz commented Feb 15, 2024

One of the CI test failed, I tracked it here: #2110 and scheduled another run, thanks

@jrfastab
Copy link
Contributor

Thanks a lot! Looks good.

@jrfastab jrfastab merged commit 9889f27 into cilium:main Feb 16, 2024
38 of 39 checks passed
@mtardy mtardy linked an issue Feb 16, 2024 that may be closed by this pull request
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR introduces a minor user-visible change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract path from linux_binprm in security_bprm_check
7 participants