Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: Check final size for data event #1224

Merged
merged 3 commits into from
Jul 24, 2023
Merged

Conversation

olsajiri
Copy link
Contributor

@olsajiri olsajiri commented Jul 13, 2023

Adding size check on receiving side of data event to make
sure we won't use incomplete data.

Also adding stats for data events to keep track of what's
happening in there.

@olsajiri olsajiri force-pushed the data_stats branch 3 times, most recently from 9fe8180 to b96a4ef Compare July 14, 2023 10:10
@@ -13,6 +13,13 @@ var (
Help: "Data event statistics. For internal use only.",
ConstLabels: nil,
}, []string{"event"})

DataEventSizeHist = promauto.NewHistogramVec(prometheus.HistogramOpts{
Name: consts.MetricNamePrefix + "data_event_size_histogram",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched all metrics to use a Namespace field instead of concatenating the prefix in #1228, can you change this one too?

@olsajiri olsajiri force-pushed the data_stats branch 5 times, most recently from 29fb310 to 6f8fd75 Compare July 17, 2023 12:43
@netlify
Copy link

netlify bot commented Jul 17, 2023

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 6a81fa5
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/64b8e1b09613db0008a0defc
😎 Deploy Preview https://deploy-preview-1224--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@olsajiri olsajiri changed the title Data stats tetragon: Check final size for data event Jul 17, 2023
@olsajiri olsajiri marked this pull request as ready for review July 17, 2023 14:14
@olsajiri olsajiri requested a review from a team as a code owner July 17, 2023 14:14
@olsajiri olsajiri requested review from jrfastab and kkourt and removed request for jrfastab July 17, 2023 14:14
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left some comments/suggestions for the userspace part.

@@ -77,6 +77,8 @@ const (
keyEnablePidSetFilter = "enable-pid-set-filter"

keyEnableMsgHandlingLatency = "enable-msg-handling-latency"

keyEnableDataEventsSizeMetric = "enable-data-events-size-metric"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this flag needed?

From what I see, currently from all metrics only LatencyStats has to be explicitly enabled with a flag. Having a way to enable/disable metrics is useful, but having a separate flag for each of them doesn't seem scalable. If it's not necessary right now, then I would rather develop a more generic way to enable/disable metrics in the near future.

Copy link
Contributor Author

@olsajiri olsajiri Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea how bad/slow is the histogram observe call? I got the impression it's better if it's disabled by default

maybe we could have some generic --enable-metric-hist=<hist1,hist2,..> option, or something like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea how bad/slow is the histogram observe call>

Histogram observe call shouldn't be too bad I think, summaries are slow, but histograms generally ok. The cardinality also shouldn't be too bad in this case, all possible label values are known.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this the main problem with some of the metrics was cardinality of the labels. This doesn't look bad from that side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I removed the option

pkg/observer/data_stats.go Show resolved Hide resolved
// Define a counter metric for data event statistics
DataEventStats = promauto.NewCounterVec(prometheus.CounterOpts{
Namespace: consts.MetricsNamespace,
Name: "data_event_stats_total",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Name: "data_event_stats_total",
Name: "data_events_total",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

DataEventStats = promauto.NewCounterVec(prometheus.CounterOpts{
Namespace: consts.MetricsNamespace,
Name: "data_event_stats_total",
Help: "Data event statistics. For internal use only.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Help: "Data event statistics. For internal use only.",
Help: "The number of data events by type. For internal use only.",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pkg/observer/data.go Show resolved Hide resolved
Adding size check on receiving side of data event to make
sure we won't use incomplete data.

There was a bug keeping the loop in do_str in-effective,
because rd_bytes were never incremented. The clang compilation
seemed to skip the loop completely so now when it's fixed, we
actually can't have 10 iteration, but only 2 in order not to
reach verifier complexity.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Adding stats for data events to keep track of what's
happening in there.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
@olsajiri olsajiri force-pushed the data_stats branch 2 times, most recently from 6a81fa5 to 393ed1b Compare July 20, 2023 09:24
@@ -75,6 +75,8 @@ type config struct {
EnablePidSetFilter bool

EnableMsgHandlingLatency bool

EnableDataEventsSizeMetric bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unused now


DataEventSizeHist = promauto.NewHistogramVec(prometheus.HistogramOpts{
Namespace: consts.MetricsNamespace,
Name: "data_event_stats_size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last nit, promise - can it be just data_event_size? Having "stats" in the metric name might be a bit confusing about what is actually measured IMO :)

Suggested change
Name: "data_event_stats_size",
Name: "data_event_size",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np ;-)

Adding good/bad histograms to keep track of data events sizes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@kkourt kkourt merged commit 9b39962 into cilium:main Jul 24, 2023
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants