Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] playing around process tree ideas #3363

Closed
wants to merge 2 commits into from

Conversation

rafaeldtinoco
Copy link
Contributor

commit ba731a8 (HEAD -> newproctree, rafaeldtinoco/newproctree)
Author: Rafael David Tinoco rafaeldtinoco@gmail.com
Date: Fri Aug 4 00:48:11 2023

feature(events): turn SchedProcessXXX control plane enabled

- ADD parent_start_time argument to event context (*)

- move functions containing inline asm to the end of their files
- create buffer_memcpy() for map-to-map buffer copy in ebpf

- sched_process_fork:
  - create a signal handler for a SchedProcessFork signal event
  - only SUBMIT the SchedProcessFork event if picked by a policy
  - ALWAYS submit the SchedProcessFork signal event (args only)

- sched_process_exec:
  - create a signal handler for a SchedProcessExec signal event
  - only SUBMIT the SchedProcessExec event if picked by a policy
  - ALWAYS submit the SchedProcessFork signal event (args only)

- create a list of essential events coming from the control plane
  package, instead of a function within Tracee type.

- add a 'Control' boolean to EventState to control whether the
  event, and its dependant events, should only be configured
  because of the control plane. This is needed because some
  events are too complex to have duplicated probes (they involve
  having tailCalls and other dependencies), so the same event
  probes are used to submit the regular AND the signal events.

(*) parent_start_time argument to event context:

  The reason to have parent start time argument added is to have an unique
  identifier, using 'host_tid' + 'process start time' (using the murmur3
  hashing function), on each submitted event. This way, the process tree
  is able to identify process parent using the process hash (and each ever
  existing process node entry in the process tree is hashed and unique).

  => The reasoning will be cleared in next commits.

NOTE: There are NO logical changes to eBPF code but to copy the scratch
buffer and submit it into the signal events perfbuffer.

commit de1ce50
Author: Rafael David Tinoco rafaeldtinoco@gmail.com
Date: Wed Aug 2 18:38:21 2023

feature(events): process unique identifier murmur hashing

By having a hash to represent each ever existent process, we can keep
track of the relationship between processes and its childs using these
identifiers instead of fragile PID/PPID relationship.

The eBPF murmur hasing function is added, despite not being currently
used, because it has been tested together with the userland portion, to
make sure the hashes are exactly the same (so both sides are talking the
same language).

NOTE: This code will be used by the process tree implementation in next
      commits.

@rafaeldtinoco
Copy link
Contributor Author

rafaeldtinoco commented Aug 4, 2023

@NDStrahilevitz I already have more code on top of these changes, but I want you to take a look at it now (I know you are currently time constrained, thus the rush). Mainly because you created the control plane.

I think most of my changes are either described in the git log OR the code is clear enough, let me know if not, please.

Overall idea (I'm supporting with these changes and using in code still not pushed) is that the process tree is being built on top of a hash that uniquely describes a task.

But, for this review, I'm mostly interested in the control plane logic I created. Let me see if you can spot a corner case of some sort, please.

A good way of testing is following my comments about "debug" and execute:

./dist/tracee -o none -e uname
./dist/tracee -o json -e uname
./dist/tracee -o none -e uname,sched_process_{fork,exec,exit}
./dist/tracee -o json -e uname,sched_process_{fork,exec,exit}

So you can see the events being submitted independently of the control plane, for example.

Cheers

By having a hash to represent each ever existent process, we can keep
track of the relationship between processes and its childs using these
identifiers instead of fragile PID/PPID relationship.

The eBPF murmur hashing function is added, despite not being currently
used, because it has been tested together with the userland portion, to
make sure the hashes are exactly the same (so both sides are talking the
same language).

> This code will be used by the process tree implementation in next
> commits.
- ADD parent_start_time argument to event context (*)

- move functions containing inline asm to the end of their files
- create buffer_memcpy() for map-to-map buffer copy in ebpf

- sched_process_fork:
  - create a signal handler for a SchedProcessFork signal event
  - only SUBMIT the SchedProcessFork event if picked by a policy
  - ALWAYS submit the SchedProcessFork signal event (args only)

- sched_process_exec:
  - create a signal handler for a SchedProcessExec signal event
  - only SUBMIT the SchedProcessExec event if picked by a policy
  - ALWAYS submit the SchedProcessFork signal event (args only)

- sched_process_exit:
  - create a signal handler for a SchedProcessExit signal event
  - only SUBMIT the SchedProcessExit event if picked by a policy
  - ALWAYS submit the SchedProcessExit signal event (args only)
  - The SchedProcessExit signal event has extra 2 arguments added,
    both will be needed for the process tree implementation.

- create a list of essential events coming from the control plane
  package, instead of a function within Tracee type.

- add a 'Control' boolean to EventState to control whether the
  event, and its dependant events, should only be configured
  because of the control plane. This is needed because some
  events are too complex to have duplicated probes (they involve
  having tailCalls and other dependencies), so the same event
  probes are used to submit the regular AND the signal events.

(*) parent_start_time argument to event context:

  The reason to have parent start time argument added is to have an unique
  identifier, using 'host_tid' + 'process start time' (using the murmur3
  hashing function), on each submitted event. This way, the process tree
  is able to identify process parent using the process hash (and each ever
  existing process node entry in the process tree is hashed and unique).

  => The reasoning will be cleared in next commits.

NOTE: There are NO logical changes to eBPF code but to copy the scratch
buffer and submit it into the signal events perfbuffer.

// DEBUG (remove only when process tree is implemented)
file, _ = os.Open("/dev/null")
// file = os.Stdout
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test this by exchanging comment in between these 2 lines.

// TODO: deal with argv and envp if ever needed

// DEBUG (remove only when process tree is implemented)
file, _ = os.Open("/dev/null")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test this by exchanging comment in between these 2 lines.

}

// DEBUG (remove only when process tree is implemented)
file, _ = os.Open("/dev/null")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can test this by exchanging comment in between these 2 lines.

@rafaeldtinoco
Copy link
Contributor Author

Check #3364 (I'll continue adding commits on top of that PR, the base is the same).

@rafaeldtinoco rafaeldtinoco deleted the newproctree branch September 27, 2023 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant