Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: writeable data source #3725

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ env:
CONTAINERS_DATA_SOURCE
PROCTREE_DATA_SOURCE
DNS_DATA_SOURCE
WRITABLE_DATA_SOURCE
jobs:
#
# DOC VERIFICATION
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -682,6 +682,8 @@ E2E_INST_SRC := $(shell find $(E2E_INST_DIR) \
-type f \
-name '*.go' \
! -name '*_test.go' \
! -path '$(E2E_INST_DIR)/scripts/*' \
! -path '$(E2E_INST_DIR)/datasourcetest/*' \
)

.PHONY: e2e-inst-signatures
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Containers Data Source

The [container enrichment](../../install/container-engines.md) feature gives Tracee the ability to extract details about active containers and link this information to the events it captures.
The [container enrichment](../../../install/container-engines.md) feature gives Tracee the ability to extract details about active containers and link this information to the events it captures.

The [data source](./overview.md) feature makes the information gathered from active containers accessible to signatures. When an event is captured and triggers a signature, that signature can retrieve information about the container using its container ID, which is bundled with the event being analyzed by the signature.
The [data source](../overview.md) feature makes the information gathered from active containers accessible to signatures. When an event is captured and triggers a signature, that signature can retrieve information about the container using its container ID, which is bundled with the event being analyzed by the signature.

## Internal Data Organization

From the [data-sources documentation](./overview.md), you'll see that searches use keys. It's a bit like looking up information with a specific tag (or a key=value storage).
From the [data-sources documentation](../overview.md), you'll see that searches use keys. It's a bit like looking up information with a specific tag (or a key=value storage).

The `containers data source` operates straightforwardly. Using `string` keys, which represent the container IDs, you can fetch `map[string]string` values as shown below:

Expand All @@ -26,7 +26,7 @@ From the structure above, using the container ID lets you access details like th

## Using the Containers Data Source

> Make sure to read [Golang Signatures](../../events/custom/golang.md) first.
> Make sure to read [Golang Signatures](../../../events/custom/golang.md) first.

### Signature Initialization

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ To switch on the `DNS Cache` feature, run the command:
sudo tracee --output option:sort-events --output json --output option:parse-arguments --dnscache enable --events <event_type>
```

The underlying structure is populated using the core [net_packet_dns](../../events/builtin/network/net_packet_dns.md) event and its payload.
The underlying structure is populated using the core [net_packet_dns](../../../events/builtin/network/net_packet_dns.md) event and its payload.

## Command Line Option

Expand All @@ -32,7 +32,7 @@ Consider for your usecase, how many query trees would you like to store? If you

## Internal Data Organization

From the [data-sources documentation](./overview.md), you'll see that searches use keys. It's a bit like looking up information with a specific tag (or a key=value storage).
From the [data-sources documentation](../overview.md), you'll see that searches use keys. It's a bit like looking up information with a specific tag (or a key=value storage).

The `dns data source` operates straightforwardly. Using `string` keys, which represent some network address (a domain or IP), you can fetch `map[string]string` values as shown below:

Expand All @@ -48,7 +48,7 @@ Any address found in the cache, and other related addresses, will be returned in

## Using the Containers Data Source

> Make sure to read [Golang Signatures](../../events/custom/golang.md) first.
> Make sure to read [Golang Signatures](../../../events/custom/golang.md) first.

### Signature Initialization

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,11 @@ This enhancement aims to offer developers and sysadmins a more detailed and gran

## Using the Process Tree

The process tree is only available internally, to tracee's components, but, through the [datasource](./overview.md) mechanism, signatures are able to query the tree data using the data source process tree API.
The process tree is only available internally, to tracee's components, but, through the [datasource](../overview.md) mechanism, signatures are able to query the tree data using the data source process tree API.

### Accessing the Process Tree Data Source

> Make sure to read [Golang Signatures](../../events/custom/golang.md) first.
> Make sure to read [Golang Signatures](../../../events/custom/golang.md) first.

During the signature initialization, get the process tree data source instance:

Expand Down Expand Up @@ -197,7 +197,7 @@ func (sig *e2eProcessTreeDataSource) checkProcess(eventObj *trace.Event) error {
}
```

From the [data-sources documentation](./overview.md), you'll see that searches use keys. It's a bit like looking up information with a specific tag (or a key=value storage).
From the [data-sources documentation](../overview.md), you'll see that searches use keys. It's a bit like looking up information with a specific tag (or a key=value storage).

In the provided example, the `eventObj.ProcessEntityId` key (which is the process hash accompanying the event being handled) is utilized alongside the `datasource.ProcKey{}` argument to search for a process in the process tree. The resulting process is the one associated with the event under consideration.

Expand Down
51 changes: 51 additions & 0 deletions docs/docs/advanced/data-sources/custom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Custom data sources

Custom data sources are currently supported through the plugin mechanism.

!!! Attention
Eventually you will find out that Golang Plugins aren't very useful if
you consider all the problems that emerge from using it:

1. **Can't use different go versions** (need to compile the go plugin
with the exact same version that was used to build Tracee).

2. Both Tracee and your golang plugin data source must be built with the
**exact same GOPATH** or you will get a "plugin was built with a
different version of package XXX" error.

3. Any **dependency** you have in your plugin should be of the **same
version** with the dependencies of Tracee.

4. Compiling tracee statically is sometimes useful to have a **complete
portable eBPF tracing/detection solution**. One good example when
statically compiling tracee is a good idea is to have a single
binary capable of running in GLIBC (most of them) and MUSL (Alpine)
powered Linux distros.

At the end, creating a golang data source plugin won't have the practical
effects as a plugin mechanism should have, so it is preferred to have
built-in data source (re)distributed with newer binaries (when you
need to add/remove data sources from your environment) **FOR NOW**.

There are two main reasons to write your own data source:

1. To provide a stable "tracee-native" querying API for some externally owned data you need in a signature (for example some DB access)
1. To provide an externally writable and internally readable data source in a data source (for example configuration)

An example for an implementation of the latter is given [here](./write.md).

# Integrating into a plugin

Since Data Sources should usually be supplied alongside a relevant data source, providing them is as easy
as using another symbol in the plugin.

Simply add the following symbol in your plugin entrypoint:
```golang
var ExportedDataSources = []detect.DataSource{
...
mydatasource.New(someDependency),
}
```

And the data source will be available in data sources through the specified namespace and id given
in your code.
11 changes: 7 additions & 4 deletions docs/docs/advanced/data-sources/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,17 @@ container lifecycle events.

## What data sources can I use

For now, only the built-in data sources from Tracee are at your disposal.
Looking ahead, there are plans to enable integration of data sources into Tracee
either as plugins or extensions.
Tracee offer three built-in data sources out of the box.
There is also support for plugging in external data sources through the golang
plugin mechanism, similar to how signatures are currently supplied (see [here](../../events/custom/golang.md)).
However, there are known technical limitation to this approach, and the aim is to replace it
in the future.

Currently, two primary data source exist:
Currently, the following data source are provided out of the box:

1. Containers: Provides metadata about containers given a container id.
1. Process Tree: Provides access to a tree of ever existing processes and threads.
1. DNS Cache: Provides access to relaated DNS queries of a given address (IP or domain).

This list will be expanded as other features are developed.

Expand Down
172 changes: 172 additions & 0 deletions docs/docs/advanced/data-sources/write.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Writable Data Sources

Since v0.20.0 tracee includes a new `DataSourceService` in its gRPC server. This service includes the ability
to write generic data into a specified data source, both through streaming and unary methods.
However, in order to utilize this feature, a speciailized `WritableDataSource` must be specified in the RPC arguments.
These data sources are currently only available through custom data sources, meaning that no built-in data sources support this feature.

## How to use

### Implementing a writable data source
Let us implement an example data source which will give us a configurable threshold for reporting some finding.

Start by adding a file `threshold_datasource.go`:
```golang
package datasourcetest

import (
"encoding/json"

"github.com/aquasecurity/tracee/types/detect"
)

type thresholdDataSource struct {
threshold int
}

func (ctx *e2eWritable) Get(key interface{}) (map[string]interface{}, error) {
keyVal, ok := key.(string)
if !ok {
return nil, detect.ErrKeyNotSupported
}

if keyVal != "threshold" {
return nil, detect.ErrKeyNotSupported
}

return map[string]interface{}{
"threshold": ctx.threshold,
}, nil
}

func (ctx *e2eWritable) Version() uint {
return 1
}

func (ctx *e2eWritable) Keys() []string {
return []string{"string:\"threshold\""}
}

func (ctx *e2eWritable) Schema() string {
schema := map[string]interface{}{
"threshold": "int",
}

s, _ := json.Marshal(schema)
return string(s)
}

func (ctx *e2eWritable) Namespace() string {
return "my_namespace"
}

func (ctx *e2eWritable) ID() string {
return "threshold_datasource"
}

func (ctx *e2eWritable) Write(data map[interface{}]interface{}) error {
threshold, ok := data["threshold"]
if !ok {
return detect.ErrFailedToUnmarshal
}

// Currently we pass the gRPC values directly, so numbers are sent as float64
thresholdFloat, ok := threshold.(float64)
if !ok {
return detect.ErrFailedToUnmarshal
}

ctx.threshold = int(thresholdFloat)
return nil
}

func (ctx *e2eWritable) Values() []string {
return []string{"string"}
}
```

!!! Note
Unpacking values from the given data dictionary has a specific quirk about value unwrapping.
Currently only the gRPC API is given for writing to data sources, which uses the struct.proto package for passing generic values.
There is currently no abstraction layer over it, which is why we unpacked the threshold value as float64 in the example, despite wanting
it as an int in the end.

### Using in a signature
Now we can use this data source just like we would any other in a signature through the following code:
```golang
func (sig *mySig) Init(ctx detect.SignatureContext) error {
...
thresholdDataSource, ok := ctx.GetDataSource("my_namespace", "threshold_datasource")
if !ok {
return fmt.Errorf("threshold data source not registered")
}
if thresholdDataSource.Version() > 1 {
return fmt.Errorf("threshold data source version not supported, please update this signature")
}
sig.thresholdData = thresholdDataSource
}
```

### Writing to the data source
The following is a short example for a go program which will implement a client for out threshold data source. Note that this is a minimal outline, and you should modify it based on your specific usecase:
```golang
package main

import (
"context"
"flag"
"fmt"
"os"

"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/protobuf/types/known/structpb"

"github.com/aquasecurity/tracee/api/v1beta1"
)

func printAndExit(msg string, args ...any) {
fmt.Printf(msg, args...)
os.Exit(1)
}

func main() {
traceeAddressPtr := flag.String("key", "", "key to set in the data source")
thresholdPtr := flag.Int("value", "", "key to set in the data source")
flag.Parse()

traceeAddress := *traceeAddressPtr
threshold := *thresholdPtr

if traceeAddress == "" {
printAndExit("empty address given\n")
}
if threshold == 0 {
printAndExit("empty threshold given\n")
}
if threshold < 0 {
printAndExit("negative threshold given\n")
}

conn, err := grpc.Dial(
traceeAddress,
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
if err != nil {
printAndExit("failed to dial tracee grpc server: %v\n", err)
}
client := v1beta1.NewDataSourceServiceClient(conn)
_, err = client.Write(context.Background(), &v1beta1.WriteDataSourceRequest{
Id: "my_namespace",
Namespace: "threshold_datasource",
Key: structpb.NewStringValue("threshold"),
Value: structpb.NewNumberValue(float64(threshold)),
})

if err != nil {
printAndExit("failed to write to data source: %v\n", err)
}
}
```

With all these steps completed, you are ready to impelement and use your own writable data source!
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ require (
github.com/Masterminds/sprig/v3 v3.2.3
github.com/aquasecurity/libbpfgo v0.6.0-libbpf-1.3
github.com/aquasecurity/libbpfgo/helpers v0.4.6-0.20231123142329-37c4b843a539
github.com/aquasecurity/tracee/api v0.0.0-20231013014739-b32a168ee6a8
github.com/aquasecurity/tracee/types v0.0.0-20231128135314-cfe4d6426ccc
github.com/aquasecurity/tracee/api v0.0.0-20231213190735-f6f40e03b772
github.com/aquasecurity/tracee/types v0.0.0-20231219022131-aa8b62c87118
github.com/containerd/containerd v1.7.0
github.com/docker/docker v24.0.7+incompatible
github.com/golang/protobuf v1.5.3
Expand Down
8 changes: 4 additions & 4 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@ github.com/aquasecurity/libbpfgo v0.6.0-libbpf-1.3 h1:mhDe1mAZR80LjnsCnteS+R2/Ee
github.com/aquasecurity/libbpfgo v0.6.0-libbpf-1.3/go.mod h1:0rEApF1YBHGuZ4C8OYI9q5oDBVpgqtRqYATePl9mCDk=
github.com/aquasecurity/libbpfgo/helpers v0.4.6-0.20231123142329-37c4b843a539 h1:axIHZ3la2/wcqMYO9TUyKO/lMGYizEKyNIodbwQBOkE=
github.com/aquasecurity/libbpfgo/helpers v0.4.6-0.20231123142329-37c4b843a539/go.mod h1:1fGKke5pgH4xYvZ7HqDbLSi/R5zfRFH2K+c9kLp9L34=
github.com/aquasecurity/tracee/api v0.0.0-20231013014739-b32a168ee6a8 h1:NGzPDvQofEG04CoPZjSSRoFMxnSd3Brh39BY1dmdyZM=
github.com/aquasecurity/tracee/api v0.0.0-20231013014739-b32a168ee6a8/go.mod h1:l1W65+m4KGg2i61fiPaQ/o4OQCrNtNnkPTEdysF5Zpw=
github.com/aquasecurity/tracee/types v0.0.0-20231128135314-cfe4d6426ccc h1:T3yH0mYENclyBdxwbof0+5hVk7bFFB+aaPKESqS1Zg4=
github.com/aquasecurity/tracee/types v0.0.0-20231128135314-cfe4d6426ccc/go.mod h1:kHvgUMXGq5QEqSLPgu4RwGSJEoCuMQJnEkGk8OAcSUc=
github.com/aquasecurity/tracee/api v0.0.0-20231213190735-f6f40e03b772 h1:xYLphnE5GLb6mYwFv6jfpxcibtaTgjWwOP0r7sTDWBk=
github.com/aquasecurity/tracee/api v0.0.0-20231213190735-f6f40e03b772/go.mod h1:QJG2PABXucOsFVO85tQsKxV4c1GUhcjww/Kw+Wv7Y/c=
github.com/aquasecurity/tracee/types v0.0.0-20231219022131-aa8b62c87118 h1:l3dliAP3LCLAc0LO4s0AZCObPCLyTpwotIHXk7Sxzkw=
github.com/aquasecurity/tracee/types v0.0.0-20231219022131-aa8b62c87118/go.mod h1:kHvgUMXGq5QEqSLPgu4RwGSJEoCuMQJnEkGk8OAcSUc=
github.com/arbovm/levenshtein v0.0.0-20160628152529-48b4e1c0c4d0 h1:jfIu9sQUG6Ig+0+Ap1h4unLjW6YQJpKZVmUzxsD4E/Q=
github.com/arbovm/levenshtein v0.0.0-20160628152529-48b4e1c0c4d0/go.mod h1:t2tdKJDJF9BV14lnkjHmOQgcvEKgtqs5a1N3LNdJhGE=
github.com/benbjohnson/clock v1.1.0/go.mod h1:J11/hYXuz8f4ySSvYwY0FKfm+ezbsZBKZxNJlLklBHA=
Expand Down
8 changes: 6 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -589,8 +589,12 @@ nav:
- Forensics: docs/advanced/forensics.md
- Data Sources:
- Overview: docs/advanced/data-sources/overview.md
- Containers: docs/advanced/data-sources/containers.md
- Process Tree: docs/advanced/data-sources/process-tree.md
- Custom: docs/advanced/data-sources/custom.md
- Write to a Data Source: docs/advanced/data-sources/write.md
- Builtin:
- Containers: docs/advanced/data-sources/builtin/containers.md
- Process Tree: docs/advanced/data-sources/builtin/process-tree.md
- DNS Cache: docs/advanced/data-sources/builtin/dns.md
- CLI Flags:
- scope: docs/flags/scope.1.md
- events: docs/flags/events.1.md
Expand Down
12 changes: 7 additions & 5 deletions pkg/cmd/flags/grpc.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,13 @@ func PrepareGRPCServer(listenAddr string) (*grpc.Server, error) {
return nil, errfmt.Errorf("grpc address cannot be empty")
}

// cleanup listen address if needed, for example if a panic happened
if _, err := os.Stat(addr[1]); err == nil {
err := os.Remove(addr[1])
if err != nil {
return nil, errfmt.Errorf("failed to cleanup gRPC listening address (%s): %v", addr[1], err)
// cleanup listen address if needed (unix socket), for example if a panic happened
if addr[0] == "unix" {
if _, err := os.Stat(addr[1]); err == nil {
err := os.Remove(addr[1])
if err != nil {
return nil, errfmt.Errorf("failed to cleanup gRPC listening address (%s): %v", addr[1], err)
}
}
}

Expand Down
Loading
Loading