Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent map iteration and write in finding #3748

Closed
NDStrahilevitz opened this issue Dec 7, 2023 · 6 comments · Fixed by #3779
Closed

Concurrent map iteration and write in finding #3748

NDStrahilevitz opened this issue Dec 7, 2023 · 6 comments · Fixed by #3779
Assignees
Labels
Milestone

Comments

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Dec 7, 2023

Description

No reproducer yet.

Concurrent iteration happens at this code section:
image
On the range.
Not clear from the logs where the concurrent write is, and there doesn't seem to be a write in this function, nor any other goroutine accessing a finding at the same time.

Output of tracee version:

tracee commit a57c534

Output of uname -a:

(paste your output here)

Additional details

relevant section of the map iteration from the log (no write section was found for some reason):

fatal error: concurrent map iteration and map write

goroutine 79 [running]:
github.com/aquasecurity/tracee/pkg/ebpf.getArguments({0xc00397edb0, {{{{...}, {...}, {...}}, 0x0}, {0x2ccafc0, 0xc0038645a0}}, {{0x7f7bb10d4380, 0x7}, {0x7f7bb10d200c, ...}, ...}}, ...)
	/tracee/pkg/ebpf/finding.go:65 +0xc5
github.com/aquasecurity/tracee/pkg/ebpf.newEvent(_, {0xc00397edb0, {{{{...}, {...}, {...}}, 0x0}, {0x2ccafc0, 0xc0038645a0}}, {{0x7f7bb10d4380, 0x7}, ...}}, ...)
	/tracee/pkg/ebpf/finding.go:28 +0x7b
github.com/aquasecurity/tracee/pkg/ebpf.FindingToEvent({0xc00397edb0, {{{{...}, {...}, {...}}, 0x0}, {0x2ccafc0, 0xc0038645a0}}, {{0x7f7bb10d4380, 0x7}, {0x7f7bb10d200c, ...}, ...}})
	/tracee/pkg/ebpf/finding.go:24 +0x13e
github.com/aquasecurity/tracee/pkg/ebpf.(*Tracee).engineEvents.func2()
	/tracee/pkg/ebpf/signature_engine.go:100 +0x17a
created by github.com/aquasecurity/tracee/pkg/ebpf.(*Tracee).engineEvents
	/tracee/pkg/ebpf/signature_engine.go:92 +0x7d8
@NDStrahilevitz NDStrahilevitz added this to the v0.20.0 milestone Dec 7, 2023
@NDStrahilevitz NDStrahilevitz self-assigned this Dec 7, 2023
@AlonZivony
Copy link
Collaborator

AlonZivony commented Dec 10, 2023

I am not sure if this is is connected, as the place of the crush is different.
However, this is suspicious that the same error happened in two different places at the same timeframe.
I add the crush log I got from using Tracee-rules binary (I only put here the first signatures goroutines stack trace, but there are many more in the original log file):

fatal error: concurrent map iteration and map write

goroutine 9 [running]:
reflect.mapiterinit(0x84cef2?, 0x1c0?, 0x1284320?)
	/usr/local/go/src/runtime/map.go:1375 +0x19
reflect.(*MapIter).Next(0x1214880?)
	/usr/local/go/src/reflect/value.go:1841 +0x65
encoding/json.mapEncoder.encode({0x200?}, 0xc000209580, {0x1214880?, 0xc000119c80?, 0x1214880?}, {0x10?, 0x2d?})
	/usr/local/go/src/encoding/json/encode.go:798 +0x33e
encoding/json.(*encodeState).reflectValue(0xc0016a2d48?, {0x1214880?, 0xc000119c80?, 0x80643f?}, {0xa0?, 0x9b?})
	/usr/local/go/src/encoding/json/encode.go:359 +0x78
encoding/json.(*encodeState).marshal(0x0?, {0x1214880?, 0xc000119c80?}, {0x67?, 0x2c?})
	/usr/local/go/src/encoding/json/encode.go:331 +0xfa
encoding/json.Marshal({0x1214880, 0xc000119c80})
	/usr/local/go/src/encoding/json/encode.go:160 +0x45
github.com/Masterminds/sprig/v3.toJson({0x1214880?, 0xc000119c80?})
	/root/go/pkg/mod/github.com/!masterminds/sprig/v3@v3.2.3/defaults.go:109 +0x25
reflect.Value.call({0x11cffe0?, 0x1393360?, 0x11f3ac0?}, {0x1327c70, 0x4}, {0xc002482ab0, 0x1, 0x1284860?})
	/usr/local/go/src/reflect/value.go:584 +0x8c5
reflect.Value.Call({0x11cffe0?, 0x1393360?, 0xc002911380?}, {0xc002482ab0?, 0x14c2d80?, 0x7fa4403ea390?})
	/usr/local/go/src/reflect/value.go:368 +0xbc
text/template.safeCall({0x11cffe0?, 0x1393360?, 0xc002911380?}, {0xc002482ab0?, 0x14ccb60?, 0x11f3ac0?})
	/usr/local/go/src/text/template/funcs.go:368 +0x9d
text/template.(*state).evalCall(0xc0016a3bc0, {0x1284860?, 0xc002911380?, 0x1?}, {0x11cffe0?, 0x1393360?, 0x335?}, 0x0, {0x14c2c60, 0xc000307bc0}, ...)
	/usr/local/go/src/text/template/exec.go:781 +0x905
text/template.(*state).evalFunction(0xc0016a3bc0, {0x1284860?, 0xc002911380?, 0xc0001d4a00?}, 0x0?, {0x14c2c60, 0xc000307bc0}, {0xc000044920, 0x2, 0x2}, ...)
	/usr/local/go/src/text/template/exec.go:601 +0x209
text/template.(*state).evalCommand(0xc0016a3bc0, {0x1284860?, 0xc002911380?, 0x8?}, 0x0?, {0x121d640?, 0x1bce6c0?, 0xc55e85?})
	/usr/local/go/src/text/template/exec.go:488 +0xfd
text/template.(*state).evalPipeline(0xc0016a3bc0, {0x1284860?, 0xc002911380?, 0x11affc0?}, 0xc0001128a0)
	/usr/local/go/src/text/template/exec.go:457 +0x145
text/template.(*state).walk(0xc0016a3bc0, {0x1284860?, 0xc002911380?, 0x0?}, {0x14c2b40?, 0xc000307c50})
	/usr/local/go/src/text/template/exec.go:261 +0x116
text/template.(*state).walk(0xc0016a3bc0, {0x1284860?, 0xc002911380?, 0xc000151b60?}, {0x14c2e58?, 0xc000307b60?})
	/usr/local/go/src/text/template/exec.go:274 +0x412
text/template.(*Template).execute(0xc000037680, {0x14b8ea0?, 0xc0001a2008?}, {0x1284860?, 0xc002911380?})
	/usr/local/go/src/text/template/exec.go:218 +0x2c5
text/template.(*Template).Execute(...)
	/usr/local/go/src/text/template/exec.go:201
main.setupOutput.func1({0x14b8ea0, 0xc0001a2008}, 0xc35a46?, 0xc0003a8b40?)
	/tmp/tracee-master/cmd/tracee-rules/output.go:62 +0x1ae
created by main.setupOutput
	/tmp/tracee-master/cmd/tracee-rules/output.go:58 +0x2ca

goroutine 1 [select]:
github.com/aquasecurity/tracee/pkg/signatures/engine.(*Engine).consumeSources(0xc001022000, {0x14c0018, 0xc000036e80})
	/tmp/tracee-master/pkg/signatures/engine/engine.go:205 +0xe9
github.com/aquasecurity/tracee/pkg/signatures/engine.(*Engine).Start(0xc001022000, {0x14c0018, 0xc000036e80})
	/tmp/tracee-master/pkg/signatures/engine/engine.go:121 +0x205
main.main.func1(0xc000036d80)
	/tmp/tracee-master/cmd/tracee-rules/main.go:175 +0xbdb
github.com/urfave/cli/v2.(*App).RunContext(0xc000007d40, {0x14bf678?, 0xc0001aa158}, {0xc0001a6000, 0x4, 0x4})
	/root/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:322 +0x97c
github.com/urfave/cli/v2.(*App).Run(...)
	/root/go/pkg/mod/github.com/urfave/cli/v2@v2.3.0/app.go:224
main.main()
	/tmp/tracee-master/cmd/tracee-rules/main.go:266 +0x97a

goroutine 6 [runnable]:
reflect.Value.SetString({0x11affc0?, 0xc002af17a8?, 0x198?}, {0xc00208a0c0, 0x17})
	/usr/local/go/src/reflect/value.go:2360 +0x113
encoding/json.(*decodeState).literalStore(0xc00246c090, {0xc000f56e92, 0x19, 0x116e}, {0x11affc0?, 0xc002af17a8?, 0x2?}, 0x0)
	/usr/local/go/src/encoding/json/decode.go:967 +0xa5b
encoding/json.(*decodeState).value(0xc00246c090, {0x11affc0?, 0xc002af17a8?, 0x9?})
	/usr/local/go/src/encoding/json/decode.go:388 +0x126
encoding/json.(*decodeState).object(0xc00246c090, {0x12217c0?, 0xc002af1680?, 0x0?})
	/usr/local/go/src/encoding/json/decode.go:775 +0xd45
encoding/json.(*decodeState).value(0xc00246c090, {0x12217c0?, 0xc002af1680?, 0x90?})
	/usr/local/go/src/encoding/json/decode.go:374 +0x45
encoding/json.(*decodeState).unmarshal(0xc00246c090, {0x12217c0?, 0xc002af1680?})
	/usr/local/go/src/encoding/json/decode.go:181 +0x1de
encoding/json.Unmarshal({0xc000f56cc9, 0x449, 0x1337}, {0x12217c0, 0xc002af1680})
	/usr/local/go/src/encoding/json/decode.go:108 +0x125
main.setupTraceeJSONInputSource.func1()
	/tmp/tracee-master/cmd/tracee-rules/input.go:119 +0xae
created by main.setupTraceeJSONInputSource
	/tmp/tracee-master/cmd/tracee-rules/input.go:115 +0xf8

goroutine 11 [syscall]:
os/signal.signal_recv()
	/usr/local/go/src/runtime/sigqueue.go:152 +0x2f
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/local/go/src/os/signal/signal.go:151 +0x2a

goroutine 38 [select]:
os/signal.NotifyContext.func1()
	/usr/local/go/src/os/signal/signal.go:288 +0x70
created by os/signal.NotifyContext
	/usr/local/go/src/os/signal/signal.go:287 +0x16b

goroutine 39 [chan receive]:
github.com/aquasecurity/tracee/pkg/signatures/engine.signatureStart({0x7fa43ad82238, 0x7fa43afbb1d8}, 0x0?, 0x0?)
	/tmp/tracee-master/pkg/signatures/engine/engine.go:79 +0x90
created by github.com/aquasecurity/tracee/pkg/signatures/engine.(*Engine).Start
	/tmp/tracee-master/pkg/signatures/engine/engine.go:118 +0xec

goroutine 40 [chan receive]:
github.com/aquasecurity/tracee/pkg/signatures/engine.signatureStart({0x7fa43ad81308, 0x7fa43afbc860}, 0x0?, 0x0?)
	/tmp/tracee-master/pkg/signatures/engine/engine.go:79 +0x90
created by github.com/aquasecurity/tracee/pkg/signatures/engine.(*Engine).Start
	/tmp/tracee-master/pkg/signatures/engine/engine.go:118 +0xec

@NDStrahilevitz
Copy link
Collaborator Author

I've seen yours before, and I think it's actually internal to golang's template generation.

@denisgersh

This comment was marked as off-topic.

@rafaeldtinoco rafaeldtinoco self-assigned this Dec 18, 2023
@rafaeldtinoco
Copy link
Contributor

rafaeldtinoco commented Dec 19, 2023

It is even worse, because we do FindingToEvent, and that iterates
Finding.Data, and it gets its length, etc.

Ok,

I think I know what is going on @NDStrahilevitz, @AlonZivony and @denisgersh.

The struct:

type Finding struct {
  Data        map[string]interface{}
  Event       protocol.Event // Event is the causal event of the Finding
  SigMetadata SignatureMetadata
}

Has Data, which is a map, right ? All the signatures return a Data map, through the callback, once they are triggered (to signalize a finding containing some data).

As you already probably figured out, maps aren't thread safe (read or write), and we have couple (maybe even 3 threads) touching the object that contains those returned maps.

  1. The output channel from the engine (once its initialized). This is where the the code from reflect package runs (as we can see in the stack traces). This thread does walk the Finding->Data map, looking for types to match needed templates, and so.

  2. We have couple, maybe 3, threads from the regular engineEvents pipeline stage (enqueuing and dequeueing). Right out of engine there is a copy from Finding to a new Event, that copy does iterate over the Data map.

  3. And we have the signatures ("our" internal sigs helper have a logic to aggregate data into Finding->Data before returning the map from the engine, for example).

The way I see, this is how to proceed here:

  • Forcing Data not to be a reference is the most appropriate one IMO. IF Data is a reference, there will always be the possibility of races when pipeline accesses the Data map (no synchronization between multiple parts).

  • I think this does not happen very often because of the read only nature of all threads accessing the map (though reflect might do something that might cause the race orelse we wouldn't have 2 stack traces when inside there).

@NDStrahilevitz
Copy link
Collaborator Author

@rafaeldtinoco Thanks for the analysis.

I believe what precisely is going wrong is due to how we "return" (emphasis on the quotes) findings.
Recall that signatures only return errors. Findings are pushed through the signature callback. This means that custom signatures may yet modify a finding object (and thus it's Data map) post sending it, especially if there is some shared data map (a signature sending multiple findings).

The solutions suggested are still relevant, but I would like to propose another one:
Signatures should explicitly return findings (multiple or one). This would increase similarity to the derived events mechanism, making the future combination of the two easier, and ensure that a finding cannot be modified both in the tracee pipeline and in a signature concurrently (since it is guaranteed to be returned and out of scope of the signature).

@NDStrahilevitz
Copy link
Collaborator Author

For now, guarding the Data field through accessors is likely the easiest way to guarantee safety, without debugging who introduced the bug (since as I explained, custom signatures may be the cause here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants