tetragon: Check final size for data event #1224

olsajiri · 2023-07-13T22:18:25Z

Adding size check on receiving side of data event to make
sure we won't use incomplete data.

Also adding stats for data events to keep track of what's
happening in there.

lambdanis · 2023-07-14T11:00:22Z

pkg/observer/data_stats.go

@@ -13,6 +13,13 @@ var (
 		Help:        "Data event statistics. For internal use only.",
 		ConstLabels: nil,
 	}, []string{"event"})
+
+	DataEventSizeHist = promauto.NewHistogramVec(prometheus.HistogramOpts{
+		Name:        consts.MetricNamePrefix + "data_event_size_histogram",


I switched all metrics to use a Namespace field instead of concatenating the prefix in #1228, can you change this one too?

netlify · 2023-07-17T12:43:53Z

✅ Deploy Preview for tetragon ready!

Name	Link
🔨 Latest commit	`6a81fa5`
🔍 Latest deploy log	https://app.netlify.com/sites/tetragon/deploys/64b8e1b09613db0008a0defc
😎 Deploy Preview	https://deploy-preview-1224--tetragon.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

lambdanis

Looks good, I left some comments/suggestions for the userspace part.

lambdanis · 2023-07-18T11:44:09Z

cmd/tetragon/flags.go

@@ -77,6 +77,8 @@ const (
 	keyEnablePidSetFilter = "enable-pid-set-filter"

 	keyEnableMsgHandlingLatency = "enable-msg-handling-latency"
+
+	keyEnableDataEventsSizeMetric = "enable-data-events-size-metric"


Is this flag needed?

From what I see, currently from all metrics only LatencyStats has to be explicitly enabled with a flag. Having a way to enable/disable metrics is useful, but having a separate flag for each of them doesn't seem scalable. If it's not necessary right now, then I would rather develop a more generic way to enable/disable metrics in the near future.

any idea how bad/slow is the histogram observe call? I got the impression it's better if it's disabled by default

maybe we could have some generic --enable-metric-hist=<hist1,hist2,..> option, or something like that

any idea how bad/slow is the histogram observe call>

Histogram observe call shouldn't be too bad I think, summaries are slow, but histograms generally ok. The cardinality also shouldn't be too bad in this case, all possible label values are known.

I'm ok with this the main problem with some of the metrics was cardinality of the labels. This doesn't look bad from that side.

ok, I removed the option

pkg/observer/data_stats.go

lambdanis · 2023-07-18T11:48:18Z

pkg/observer/data_stats.go

+	// Define a counter metric for data event statistics
+	DataEventStats = promauto.NewCounterVec(prometheus.CounterOpts{
+		Namespace:   consts.MetricsNamespace,
+		Name:        "data_event_stats_total",


Suggested change

Name: "data_event_stats_total",

Name: "data_events_total",

lambdanis · 2023-07-18T11:51:41Z

pkg/observer/data_stats.go

+	DataEventStats = promauto.NewCounterVec(prometheus.CounterOpts{
+		Namespace:   consts.MetricsNamespace,
+		Name:        "data_event_stats_total",
+		Help:        "Data event statistics. For internal use only.",


Suggested change

Help: "Data event statistics. For internal use only.",

Help: "The number of data events by type. For internal use only.",

kkourt

Thanks!

pkg/observer/data.go

Adding size check on receiving side of data event to make sure we won't use incomplete data. There was a bug keeping the loop in do_str in-effective, because rd_bytes were never incremented. The clang compilation seemed to skip the loop completely so now when it's fixed, we actually can't have 10 iteration, but only 2 in order not to reach verifier complexity. Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Adding stats for data events to keep track of what's happening in there. Signed-off-by: Jiri Olsa <jolsa@kernel.org>

lambdanis · 2023-07-20T12:17:54Z

pkg/option/config.go

@@ -75,6 +75,8 @@ type config struct {
 	EnablePidSetFilter bool

 	EnableMsgHandlingLatency bool
+
+	EnableDataEventsSizeMetric bool


This seems unused now

lambdanis · 2023-07-21T10:41:18Z

pkg/observer/data_stats.go

+
+	DataEventSizeHist = promauto.NewHistogramVec(prometheus.HistogramOpts{
+		Namespace:   consts.MetricsNamespace,
+		Name:        "data_event_stats_size",


The last nit, promise - can it be just data_event_size? Having "stats" in the metric name might be a bit confusing about what is actually measured IMO :)

Suggested change

Name: "data_event_stats_size",

Name: "data_event_size",

sure, np ;-)

Adding good/bad histograms to keep track of data events sizes. Signed-off-by: Jiri Olsa <jolsa@kernel.org>

lambdanis

looks good

olsajiri force-pushed the data_stats branch 3 times, most recently from 9fe8180 to b96a4ef Compare July 14, 2023 10:10

lambdanis reviewed Jul 14, 2023

View reviewed changes

olsajiri force-pushed the data_stats branch 5 times, most recently from 29fb310 to 6f8fd75 Compare July 17, 2023 12:43

olsajiri changed the title ~~Data stats~~ tetragon: Check final size for data event Jul 17, 2023

olsajiri marked this pull request as ready for review July 17, 2023 14:14

olsajiri requested a review from a team as a code owner July 17, 2023 14:14

olsajiri requested review from jrfastab and kkourt and removed request for jrfastab July 17, 2023 14:14

olsajiri force-pushed the data_stats branch from 6f8fd75 to d4be1bb Compare July 18, 2023 09:00

lambdanis reviewed Jul 18, 2023

View reviewed changes

olsajiri force-pushed the data_stats branch from d4be1bb to 5a00774 Compare July 18, 2023 13:14

kkourt approved these changes Jul 20, 2023

View reviewed changes

pkg/observer/data.go Show resolved Hide resolved

olsajiri added 2 commits July 20, 2023 07:26

tetragon: Add stats for data events

2c6d652

Adding stats for data events to keep track of what's happening in there. Signed-off-by: Jiri Olsa <jolsa@kernel.org>

olsajiri force-pushed the data_stats branch 2 times, most recently from 6a81fa5 to 393ed1b Compare July 20, 2023 09:24

lambdanis reviewed Jul 20, 2023

View reviewed changes

olsajiri force-pushed the data_stats branch from 393ed1b to 56fd86f Compare July 20, 2023 12:38

lambdanis reviewed Jul 21, 2023

View reviewed changes

tetragon: Add size histograms for data events

3467a12

Adding good/bad histograms to keep track of data events sizes. Signed-off-by: Jiri Olsa <jolsa@kernel.org>

olsajiri force-pushed the data_stats branch from 56fd86f to 3467a12 Compare July 21, 2023 11:58

lambdanis approved these changes Jul 21, 2023

View reviewed changes

kkourt merged commit 9b39962 into cilium:main Jul 24, 2023
20 checks passed

lambdanis mentioned this pull request Apr 25, 2024

added stats for data events #1034

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tetragon: Check final size for data event #1224

tetragon: Check final size for data event #1224

olsajiri commented Jul 13, 2023 •

edited

Loading

lambdanis Jul 14, 2023

netlify bot commented Jul 17, 2023 •

edited

Loading

lambdanis left a comment

lambdanis Jul 18, 2023

olsajiri Jul 18, 2023 •

edited

Loading

lambdanis Jul 18, 2023

jrfastab Jul 19, 2023

olsajiri Jul 20, 2023

lambdanis Jul 18, 2023

olsajiri Jul 18, 2023

lambdanis Jul 18, 2023

olsajiri Jul 18, 2023

kkourt left a comment

lambdanis Jul 20, 2023

lambdanis Jul 21, 2023

olsajiri Jul 21, 2023

lambdanis left a comment

	Help: "Data event statistics. For internal use only.",
	Help: "The number of data events by type. For internal use only.",

tetragon: Check final size for data event #1224

tetragon: Check final size for data event #1224

Conversation

olsajiri commented Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

netlify bot commented Jul 17, 2023 • edited Loading

✅ Deploy Preview for tetragon ready!

lambdanis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

olsajiri Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkourt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lambdanis left a comment

Choose a reason for hiding this comment

olsajiri commented Jul 13, 2023 •

edited

Loading

netlify bot commented Jul 17, 2023 •

edited

Loading

olsajiri Jul 18, 2023 •

edited

Loading