Skip to content

Commit

Permalink
tetragon: fix hang on error in tetragonExecute
Browse files Browse the repository at this point in the history
There has been a longstanding bug where if Tetragon encounters an error inside of
tetragonExecute, the process will hang instead of exiting as expected. When looking at the
goroutine stacktrace dump provided by the runtime on SIGABRT, we can immediately see the
problem. The main thread is stuck on a channel send inside of observer.RemoveSensors().
Further investigation reveals that the channel is never opened because InitSensorManager()
is waiting on the waitChan to be closed, which does not happen until we have loaded the
base sensor.

To fix this issue, we simply need to move the defer call into observer.RemoveSensors() to
after we indicate that InitSensorManager() is cleared to run. This patch does exactly
that. Since we haven't loaded any BPF progs yet until the base sensor has been loaded
anyway, this should be safe to do.

Signed-off-by: William Findlay <will@isovalent.com>
  • Loading branch information
willfindlay authored and jrfastab committed Nov 17, 2023
1 parent 017446b commit 87ccfc6
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions cmd/tetragon/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -324,9 +324,6 @@ func tetragonExecute() error {
if err := obs.InitSensorManager(sensorMgWait); err != nil {
return err
}
defer func() {
observer.RemoveSensors(ctx)
}()

/* Remove any stale programs, otherwise feature set change can cause
* old programs to linger resulting in undefined behavior. And because
Expand Down Expand Up @@ -472,6 +469,9 @@ func tetragonExecute() error {
close(sensorMgWait)
sensorMgWait = nil
observer.GetSensorManager().LogSensorsAndProbes(ctx)
defer func() {
observer.RemoveSensors(ctx)
}()

err = loadTpFromDir(ctx, option.Config.TracingPolicyDir)
if err != nil {
Expand Down

0 comments on commit 87ccfc6

Please sign in to comment.