-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE: Pr/mtardy/user stacktrace hangfix #2286
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
policystatemetrics needs a reference to the sensor manager so that it can collect metrics. Currently, this reference is passed using observer.GetSensorManager() at initialization time. In observer tests, we currently do not restart the metrics (see [1]) which means that if we create a new observer, then the metrics will still reference the old sensor manager. Fix this by having policystatemetrics to call observer.GetSensorManager() to get the latest version of the sensor manager. [1] https://github.com/cilium/tetragon/blob/22eb995b19207ac0ced2dd83950ec8e8aedd122d/pkg/observer/observertesthelper/observer_test_helper.go#L272-L276 Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
We should also do the same in the other operations, but we leave that as a followup. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
This patch adds a timeout for ListTracingPolicies. It can be the case that the sensor manager is stuck or misbehaving. This patch (combined with the previous one) ensures that metrics will continue after a timeout. Tested manually using: ```diff diff --git a/pkg/metrics/policystatemetrics/policystatemetrics_test.go b/pkg/metrics/policystatemetrics/policystatemetrics_test.go index 227306b65..fd581392b 100644 --- a/pkg/metrics/policystatemetrics/policystatemetrics_test.go +++ b/pkg/metrics/policystatemetrics/policystatemetrics_test.go @@ -9,6 +9,7 @@ import ( "io" "strings" "testing" + "time" "github.com/cilium/tetragon/pkg/observer" tus "github.com/cilium/tetragon/pkg/testutils/sensors" @@ -57,3 +58,22 @@ tetragon_tracingpolicy_loaded{state="load_error"} %d err = testutil.CollectAndCompare(collector, expectedMetrics(1, 0, 0, 0)) assert.NoError(t, err) } + +func TestTimeout(t *testing.T) { + reg := prometheus.NewRegistry() + + manager := tus.GetTestSensorManager(context.TODO(), t).Manager + observer.SetSensorManager(manager) + t.Cleanup(observer.ResetSensorManager) + + collector := newPolicyStateCollector() + reg.Register(collector) + + go func() { + err := manager.SleepForTesting(context.TODO(), t, 1*time.Second) + assert.NoError(t, err) + }() + + err := testutil.CollectAndCompare(collector, strings.NewReader("")) + assert.NoError(t, err) +} diff --git a/pkg/sensors/manager.go b/pkg/sensors/manager.go index eaf908340..291a58c8f 100644 --- a/pkg/sensors/manager.go +++ b/pkg/sensors/manager.go @@ -8,6 +8,8 @@ import ( "errors" "fmt" "strings" + "testing" + "time" "github.com/cilium/tetragon/api/v1/tetragon" "github.com/cilium/tetragon/pkg/k8s/apis/cilium.io/v1alpha1" @@ -96,6 +98,13 @@ func startSensorManager( logger.GetLogger().Debugf("stopping sensor controller...") done = true err = nil + + // NB(kkourt): for testing + case *sensorManagerSleep: + time.Sleep(op.d) + err = nil + default: err = fmt.Errorf("unknown sensorOp: %v", op) } @@ -421,6 +430,13 @@ type sensorCtlStop struct { retChan chan error } +// sensorManagerSleep just sleeps. Intended only for testing. +type sensorManagerSleep struct { + ctx context.Context + retChan chan error + d time.Duration +} + type LoadArg struct{} type UnloadArg = LoadArg @@ -436,5 +452,18 @@ func (s *sensorEnable) sensorOpDone(e error) { s.retChan <- e } func (s *sensorDisable) sensorOpDone(e error) { s.retChan <- e } func (s *sensorList) sensorOpDone(e error) { s.retChan <- e } func (s *sensorCtlStop) sensorOpDone(e error) { s.retChan <- e } +func (s *sensorManagerSleep) sensorOpDone(e error) { s.retChan <- e } type sensorCtlHandle = chan<- sensorOp + +func (h *Manager) SleepForTesting(ctx context.Context, t *testing.T, d time.Duration) error { + retc := make(chan error) + op := &sensorManagerSleep{ + ctx: ctx, + retChan: retc, + d: d, + } + + h.sensorCtl <- op + return <-retc +} ``` Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>
Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>
Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>
Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>
Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>
Signed-off-by: Andrey Fedotov <anfedotoff@yandex-team.ru>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
mtardy
added
the
release-note/misc
This PR makes changes that have no direct user impact.
label
Apr 2, 2024
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.