Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Event Structure #2870

Open
itaysk opened this issue Mar 14, 2023 · 35 comments
Open

New Event Structure #2870

itaysk opened this issue Mar 14, 2023 · 35 comments

Comments

@itaysk
Copy link
Collaborator

itaysk commented Mar 14, 2023

#2355 changed the primary user experience of Tracee to be event oriented (previously events were considered internal and hidden from the user). Therefore:

  1. The event schema needs to be formalized and stabilized. Since it's no longer internal.
  2. The event structure needs to be generalized. Since events are will now be used for detections, captures and more.

Following is the updated event schema based on the comments below:

  1. timestamp
  2. name
  3. id - machine readable id (integer). Note: current event id isn't good since it is architecture specific
  4. // version - use semver where major is a breaking change in the event (e.g. one of the event's fields under data has been changed or removed), minor is a non breaking change (e.g. a new field was added to the event under data) and patch (e.g. a bug fix). Since this data is static, we may remove this or make optional
  5. // tags - since this data is static, we may remove this or make optional
  6. labels - doesn't exist. For future use.
  7. policies
    1. matched
    2. actions - doesn't exist, for future use - list of actions taken (currently the only action we have is print).
  8. context
    1. process
      1. executable
        1. path
        2. name - the binary name (basename of the path) - doesn't exist, consider adding (in another issue)
      2. uniqueId - unique id of the process
      3. pid
      4. hostPid
      5. executionTime - time of last exec. Doesn't exist, consider adding (in another issue)
      6. realUser
        1. id
        2. name - doesn't exist, consider adding (in another issue)
      7. user - effective user. Doesn't exist, consider adding (in another issue)
        1. id
        2. name
      8. ancestors - process ancestors array. Only direct parent will be populated by default with the following fields:
        1. uniqueId
        2. pid
        3. hostPid
        4. Other ancestor fields may be populated by threat detection events
      9. thread
        1. startTime
        2. name (aka "comm")
        3. tid
        4. hostTid
        5. capabilities - doesn't exist, consider adding (in another issue)
        6. syscall - the syscall that triggered this event
        7. compat - boolean. moved from flags.compat
        8. userStackTrace - if enabled, will be here
    2. container
      1. id
      2. name
      3. image
        1. id
        2. repoDigest
        3. name
      4. isRunning - boolean. moved from flags
      5. startTime - Timestamp of container start time. Doesn’t exist. Will replace started
      6. pid - entrypoint's pid. Doesn’t exists, consider adding
    3. k8s
      1. pod
        1. name
        2. uid
        3. labels
      2. namespace
        1. name
  9. data
    1. Any relevant field (per-event schema)
    2. returnValue (if relevant will appear here)
    3. triggeredBy (will appear on threat detection events)
      1. name
      2. id
      3. data
  10. threat (if relevant will appear here) - static data about threats (can be omitted)
    1. description
    2. mitre
      1. tactic
        1. name
      2. technique
        1. name
        2. id
    3. severity

We also discussed versioning the event schema, but not including the version with each event, for efficiency.

@itaysk
Copy link
Collaborator Author

itaysk commented Mar 14, 2023

while this represent the complete event schema, I also think we should try to emit minimal events. For example, I suppose most people won't need the full description with every event, or the full stack trace. We already allow conditionally including some elements (stack trace, syscall, exec-env), and I think we need to expand this control to all event fields. for discussing this I'll create another issue.

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Mar 16, 2023

@itaysk Do you think we should maybe use the protocol.Event feature for this (see types/protocol)? We could add a ContentType header to differentiate events. Or do we want to stick with trace.Event with split up context fields?

I've had this thought as well in relation to the recent Metadata field we've added for the finding events, it could've fit into either the protocol.EventHeader or that Findings can have a different payload body, but be considered an event through the protocol.

@itaysk
Copy link
Collaborator Author

itaysk commented Mar 20, 2023

Actually I'm not very familiar with how protocol is used now. I thought it's supposed to become obsolete in the unified binary approach. Can you elaborate on your proposal or perhaps add an example?

@NDStrahilevitz
Copy link
Collaborator

The protocol is more a way to differentiate different payloads going into the rules engine.
For example in CNDR it is used to define the ability to receive events sent directly from CNDR into the signature (for example to set data storage in the signature).
The same way the protocol can define a different payload for signature inputs, it can be used to define different event payloads with different shapes (for example, before we changed detections to be trace.Event we could have defined it it to be a different payload in the protocol.Event instead).
I am not sure if we want to currently support different payloads for events going into the engine (since the engine is now supposed to be internal).

@NDStrahilevitz
Copy link
Collaborator

May I suggest we move the event definition to a proto file while we're at it?
Should make it easier to integrate tracee into gRPC systems later.

@itaysk
Copy link
Collaborator Author

itaysk commented Mar 21, 2023

The protocol is more a way to differentiate different payloads going into the rules engine.
For example in CNDR it is used to define the ability to receive events sent directly from CNDR into the signature (for example to set data storage in the signature).
The same way the protocol can define a different payload for signature inputs, it can be used to define different event payloads with different shapes (for example, before we changed detections to be trace.Event we could have defined it it to be a different payload in the protocol.Event instead).
I am not sure if we want to currently support different payloads for events going into the engine (since the engine is now supposed to be internal).

I'm still not sure there's a use case for this, or I didn't fully understand it, but regardless seems like we reached the same conclusion that the "rule engine" is gone now.

+1 for proto

@rafaeldtinoco
Copy link
Contributor

rafaeldtinoco commented Mar 22, 2023

Has the event type serialization, using protobufs, ever been tested? I'm particularly concerned about:

Protocol buffers tend to assume that entire messages can be loaded into memory at once and are not larger than an object graph. For data that exceeds a few megabytes, consider a different solution; when working with larger data, you may effectively end up with several copies of the data due to serialized copies, which can cause surprising spikes in memory usage.

I mean, we will not serialize eBPF events, for sure, but if we're considering converting the Tracee type to protobuf, in a high rate, coming from an external source, then we should do some measurement before taking the decision, IMO.

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Mar 22, 2023

Has the event type serialization, using protobufs, ever been tested? I'm particularly concerned about:

Protocol buffers tend to assume that entire messages can be loaded into memory at once and are not larger than an object graph. For data that exceeds a few megabytes, consider a different solution; when working with larger data, you may effectively end up with several copies of the data due to serialized copies, which can cause surprising spikes in memory usage.

I've actually written a PR (#2070) once where I added a ebpf -> rules serialization with protobuf. From my measurement in that PR, protobuf serialization was quicker than both json and gob ONLY if the struct was initially in the protobuf form already - conversion from trace.Event to proto.Event was the overhead in that printer format.
When taking conversion time into consideration it was quicker than both.

@rafaeldtinoco
Copy link
Contributor

Good to know that.

@yanivagman
Copy link
Collaborator

I updated the struct to have process related context grouped together (pids, tids, comm, star time, namespaces, cgroup, uid).
Also moved processorId to be part of the context

@rafaeldtinoco rafaeldtinoco changed the title New Event Strucutre New Event Structure Apr 10, 2023
@josedonizetti josedonizetti modified the milestones: v0.14.0, v0.15.0 Apr 24, 2023
@itaysk
Copy link
Collaborator Author

itaysk commented Apr 27, 2023

Adding this here for future reference and consideration: https://www.elastic.co/blog/ecs-elastic-common-schema-otel-opentelemetry-faq

@yanivagman yanivagman modified the milestones: v0.15.0, v0.16.0 May 18, 2023
NDStrahilevitz added a commit to NDStrahilevitz/tracee that referenced this issue May 24, 2023
The field changes were causing compatibility issues for integrated
products.
We will re-introduce these fields as part of the new Context field in
the new event structure
(see aquasecurity#2870)
NDStrahilevitz added a commit to NDStrahilevitz/tracee that referenced this issue May 24, 2023
The field changes were causing compatibility issues for integrated
products.
We will re-introduce these fields as part of the new Context field in
the new event structure
(see aquasecurity#2870)
@itaysk
Copy link
Collaborator Author

itaysk commented May 29, 2023

After talking with @yanivagman we thought about some slight changes:

  1. rename args to fields
  2. rename kubernetes to pod (and remove pod prefix from internal fields)
  3. remove parent information from process intro:
  4. new ancestors field which is an array of process (not all fields will be implemented so need to omitEmpty
  5. move flags.containerStarted to container.started
  6. move flags.isCompat to process.compat

TBD:

  1. severity - location and name
  2. version - maybe move to root

I've updated the description to reflect this

@yanivagman
Copy link
Collaborator

Let's also take the opportunity to only expose the fields mentioned here to the user. Any other (internal) fields should not be part of the trace.Event.
We can do this by embedding the trace.Event struct into an internal struct used by the pipeline only where we can also add some extra fields (e.g. matchedActions, matchedPolicies bitmap, cgroup id, etc.)

@yanivagman
Copy link
Collaborator

yanivagman commented Jun 8, 2023

More detailed struct with some comments added:

  1. timestamp
  2. id - better if this was an integer, but we also need to keep backwards compatibility with e.g. TRC-XXX ids. Maybe call this differently so we can add integer ID in the future?
  3. name
  4. metadata
    1. description
    2. severity (or priority for non signature events?)
    3. tags
    4. version (this version describes the event fields version, and not the schema of the event structure)
    5. misc - map[string]interface{}
  5. context
    1. process - all process related context
      1. executionTime - doesn't exist, consider adding
      2. name
      3. id
      4. namespaceId
      5. userId
      6. thread
        1. startTime
        2. id
        3. namespaceId
        4. mountNamespaceId - consider removing
        5. pidNamespaceId - consider removing
        6. utsName - consider removing
        7. syscall - moved here
        8. stackTrace - if enabled, will be here
        9. compat - moved from flags.compat
    2. ancestors - array of all process ancestors, in a structure similar to process. Element 0 is parent.
    3. container - all container related context
      1. id
      2. name
      3. image
      4. imageDigest
      5. started - moved from flags - consider setting this as a timestamp of container start time and not boolean
    4. pod - all pod related context (from kubernetes)
      1. name
      2. namespace
      3. uid
      4. sandbox
    5. processorId - moved here
  6. fields - renamed from args (or should we call it data)?
    1. every event attribute
    2. returnValue (if relevant will appear here)
  7. matchedPolicies

@itaysk
Copy link
Collaborator Author

itaysk commented Jun 10, 2023

id - better if this was an integer

why?

executionTime - doesn't exist, consider adding

agree, but let's discuss in a separate issue as it's about adding an entirely new feature? (and won't break the event structure)

thread

what's the motivation for adding this layer?

mountNamespaceId - consider removing
pidNamespaceId - consider removing
utsName - consider removing

+1

ancestors

since we're discussing the "event structure", having a field in the root implies it's an "event" field. In this case, I'm internally reading this as "event ancestors" but the meaning is "process ancestors". should we relocate it under process or prefix it with process to clarify?

started - moved from flags - consider setting this as a timestamp of container start time and not boolean

agree, but let's discuss in a separate issue as it's about adding an entirely new feature?

pod - all pod related context (from kubernetes)

if this is a kubernetes thing, should we add kubernetes to the name? if this is not, are we sure the structure will work for other pod implementation?

sandbox

What does this mean? from reading the code I think I can gather that it is looking at container lable io.kubernetes.docker.type==sandbox but I don't understand what/why/how it's related to kubernetes pod. @NDStrahilevitz can you please explain?

matchedPolicies

This is the only field in the event that is Tracee-specific, and not meaningful by itself. For example, if I'm streaming all events to splunk and then someone else at another time sees this event there, all other field would make sense since they describe what happened, but to understand this field I need to understand trace and how it was configured started.
I won't turn this into a debate about the usefulness of this event, but at least I'd suggest to prefix with "tracee". or even better, if we have intentions to add more tracee-specific info int the future (#3153) then better to put it under a "tracee" level.

@josedonizetti josedonizetti modified the milestones: v0.18.0, v0.19.0 Sep 26, 2023
@josedonizetti
Copy link
Collaborator

Moved it to next release, the event structure is finished and merged, now it is missing integrating it on tracee internals.

@mcherny
Copy link

mcherny commented Nov 22, 2023

Few comments:

  1. Threat is static information, and hence should be available outside of event (including grpc interface)
  2. The event header should also include the "text ID" of event (ART-X, TRC-X etc.)
  3. To be able to use numeric ID, consumer need to be able to use the definitions of ids without dependency on tracee ecosystem
    3.1 The id definition must be managed in backward compatible manner ( i.e. only add, no edits, no deletes)

@mcherny
Copy link

mcherny commented Nov 22, 2023

The data we need and is not in context currently:

  • under process section missing full process commandline
  • under k8s pod section missing deployment and type

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Nov 22, 2023

  • under process section missing full process commandline

This is available in the sched_process_exec event and (I think) in the process tree and its data source.
This isn't to claim we shouldn't put it in the event context, rather that it is already available through alternative means.

  • under k8s pod section missing deployment and type

@itaysk we should be able to add these through container labels like other kubernetes data we already get in container enrichment. I can open an issue for this if agreed.

@rafaeldtinoco
Copy link
Contributor

This is available in the sched_process_exec event and (I think) in the process tree and its data source.

When process tree was being created there was a specific discussion about COMM versus BINARY PATH and where the info should go (https://github.com/aquasecurity/tracee/pull/3364/files#diff-773e2917cb050cc42ce31d36b08db6c9e3da89ab6dff75f8a9b3eba5171316d3R12).

Alon and I agreed that COMM would be part of the TaskInfo structure (for the Process or Thread) and the Binary Path would be part of the File Info structure (for the Binary and the Interpreter of the BInary).

The COMM (Process Name for the process tree) does not pick the args (its basically procfs comm field), that would come together with argv array (which is an argument for the exec event).

We can introduce COMM + ARGS in the process tree if needed (or something else), no problem.

@josedonizetti
Copy link
Collaborator

Hey @mcherny, thank you for the questions:

Few comments:

  1. Threat is static information, and hence should be available outside of event (including grpc interface)

Do you have an example of how this will be used? Because the Threat information is part of specific even (behaviour events), I'm considering this should be a part of the event definition as optional, and not have its own API. Thoughts?

  1. The event header should also include the "text ID" of event (ART-X, TRC-X etc.)

We didn't add those because they are internal to Aqua, our plan was to actually remove all together for opensource signatures, and let the internal project handle the translation between those ids, and tracee ids.

  1. To be able to use numeric ID, consumer need to be able to use the definitions of ids without dependency on tracee ecosystem
    3.1 The id definition must be managed in backward compatible manner ( i.e. only add, no edits, no deletes)

Agree! Right now the ids are stable for base events but dynamic for signatures, I need to look into how we can make it always stable.

@mcherny
Copy link

mcherny commented Nov 23, 2023

  • under process section missing full process commandline

This is available in the sched_process_exec event and (I think) in the process tree and its data source. This isn't to claim we shouldn't put it in the event context, rather that it is already available through alternative means.

Assuming we need this on each event that this may be relevant (e.x. file open), how would I consume by alternative means if I need it outside of tracee process, that is at gRPC client?

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Nov 23, 2023

Assuming we need this on each event that this may be relevant (e.x. file open), how would I consume by alternative means if I need it outside of tracee process, that is at gRPC client?

In general, if someone needs info on something on each event of its kind, for a particular usecase, a signature/derived event is probably called for.

@yanivagman
Copy link
Collaborator

yanivagman commented Nov 24, 2023

Few comments:

  1. Threat is static information, and hence should be available outside of event (including grpc interface)

That's why we wrote:

threat (if relevant will appear here) - static data about threats (can be omitted)

There is an advantage to include this information in the OSS project when a threat is detected (not so frequent) so user can get information about the threat without consulting the documentation

  1. The event header should also include the "text ID" of event (ART-X, TRC-X etc.)

In continuation to the first point, this is also static data and can be added to the threat section by an internal mapping if required by some project

  1. To be able to use numeric ID, consumer need to be able to use the definitions of ids without dependency on tracee ecosystem
    3.1 The id definition must be managed in backward compatible manner ( i.e. only add, no edits, no deletes)

Agree. We have an old issue opened for that #1098

@yanivagman
Copy link
Collaborator

  • under process section missing full process commandline

This is available in the sched_process_exec event and (I think) in the process tree and its data source. This isn't to claim we shouldn't put it in the event context, rather that it is already available through alternative means.

Assuming we need this on each event that this may be relevant (e.x. file open), how would I consume by alternative means if I need it outside of tracee process, that is at gRPC client?

Sounds like a reasonable addition to tracee that should be optional. We should discuss such additions in a separate issue and keep this issue for event structure to support the already existing features of tracee

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented May 2, 2024

I've been doing some work in the capture area. Considering we want to merge it into the "everything is an event" scheme at some point, shouldn't we also include an artifact field in the structure (I recall @yanivagman making this sort of suggestion some time ago)? I assume that otherwise, each capture will be its own event.

@yanivagman
Copy link
Collaborator

I've been doing some work in the capture area. Considering we want to merge it into the "everything is an event" scheme at some point, shouldn't we also include an artifact field in the structure (I recall @yanivagman making this sort of suggestion some time ago)? I assume that otherwise, each capture will be its own event.

Eventually capture will be an action taken by some event. In that case, the data about it will be part of policies->actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants