Streaming of stdout for metric emission in external probe #691

v-pratap · 2024-03-07T05:03:10Z

Describe the feature you'd like and the problem it will solve
Currently, the cloudprober runs the external probe and extracts all the metrics from the stdout all at once. This means that if our probe runs for 60 seconds, emitting one metric ("request_count 1") every 10 seconds, cloudprober will not extract these metrics every 10 seconds. Instead, it will extract them at the end of 60 seconds and emit "request_count 1" six times, all together.

I'm using the stackdriver surfacer. With this configuration, if all six metrics are emitted at the end, the surfacer will override all six metrics into one, leading to data loss.

To resolve this issue, we can extract the metrics in a streaming way. This means extracting data from stdout at a fixed interval, which can eliminate the data loss problem.

Related bug for this feature request : #689

manugarg · 2024-03-09T03:00:15Z

See #689 (comment) for possible implementation of this feature.

manugarg · 2024-03-29T19:03:24Z

@BrennaEpp, IIRC, you were also looking for something similar a few months back. Can you please confirm if this will help your project as well? Thanks!

- Generate metrics from external probe's stdout as soon as stdout becomes available. This helps in situations where external probe process runs for a bit but it keeps outputting metrics much more frequently, say 1 min interval with output every 10s (see #691 for one such request but it has been asked earlier as well, at least once more). - With this change stderr will also be read and logged as soon as it's available. Add and improve testing: - Don't rely on timeout for testing. That makes it unreliable on CI. - Reduce external probe process runs for the TestProbeOnceMode test. - Remove wait for the command exit. - Change in #547 appears wrong as it was causing wait to be called on only for non-windows platforms, while issue was on windows. It seems that the issue was temporary and fixed by itself.

manugarg · 2024-04-02T18:20:59Z

Fix for this has been submitted (#708).

BrennaEpp · 2024-04-02T20:16:22Z

Thank you @manugarg ! I'm pretty sure this will resolve a big issue that we've encountered (data loss). I will test it out at some point and get back to you if it doesn't completely resolve.

v-pratap · 2024-04-09T16:09:59Z

@manugarg, A big thanks to you for implementing this solution. It seems working fine for the most of the cases but I got error while emitting distribution metric, I emit latency_ms_dist metric like the following:

latency_ms_dist{op=READ} 1,3,4,5,5,6,63,3,5,3,56,5,675,6,45,4

I think the continuous streaming of stdout is breaking the line. I am getting this error:

But once I disabled this new feature I am getting no errors and everything seems fine.
Can you provide some insights here?

Also in the test file external_test.go, you have used all the environment variables which has "GO_TEST" as prefix, which led to fail the tests in our environment, In our test environment there is one extra GO_TEST_CHATTY_OUTPUT variable and it will always be there. For now, I have tweaked my code but later someone can face the same issue.

manugarg · 2024-04-09T17:13:39Z

This is likely a result of default buffer limits. It appears that your distribution metrics lines are too long -- default limit is 64kB. Does that sound right? Approximately how many numbers do you expect on these lines?

Update:
I did some testing and it seems that scanner will not break the line if buffer is too small for it, so it must be something else. In any case, before we do further debugging, it will be good to know your program's output -- how does that line look like.

manugarg · 2024-04-09T20:16:34Z

I've just now fixed an issue (#712), but symptoms of that issue should not be what you're noticing. It may be worth trying though.

As I said earlier, looking at your external program's output will help debug this issue further.

v-pratap · 2024-04-10T05:19:53Z

Hi @manugarg,

I have checked with the output closely it is actually parsing the line partially :

In actual production the metric line would have more than 50000+ values, previously it was working fine.

v-pratap · 2024-04-10T05:43:09Z

I have patched the fix of #712 and then it seems working fine. Thanks @manugarg for all of these.

manugarg · 2024-04-10T19:18:39Z

Cool. I'd definitely recommend not making these lines too long (more than 64kB) though. Instead of that, run multiple probes if you need to increase the frequency.

manugarg · 2024-04-10T19:39:49Z

To give you enough room, I am increasing the max token size to 256kB (4 times the default): #722.

v-pratap added the enhancement New feature or request label Mar 7, 2024

v-pratap mentioned this issue Mar 7, 2024

All custom metric emitted on stdout are converted into metric together for external probe #689

Closed

manugarg added this to the v0.13.4 milestone Mar 29, 2024

manugarg mentioned this issue Apr 2, 2024

[probes.external] Parse stdout as soon as it is available. #708

Merged

manugarg closed this as completed Apr 2, 2024

manugarg reopened this Apr 9, 2024

manugarg closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming of stdout for metric emission in external probe #691

Streaming of stdout for metric emission in external probe #691

v-pratap commented Mar 7, 2024

manugarg commented Mar 9, 2024

manugarg commented Mar 29, 2024

manugarg commented Apr 2, 2024

BrennaEpp commented Apr 2, 2024

v-pratap commented Apr 9, 2024

manugarg commented Apr 9, 2024 •

edited

Loading

manugarg commented Apr 9, 2024

v-pratap commented Apr 10, 2024

v-pratap commented Apr 10, 2024

manugarg commented Apr 10, 2024

manugarg commented Apr 10, 2024

Streaming of stdout for metric emission in external probe #691

Streaming of stdout for metric emission in external probe #691

Comments

v-pratap commented Mar 7, 2024

manugarg commented Mar 9, 2024

manugarg commented Mar 29, 2024

manugarg commented Apr 2, 2024

BrennaEpp commented Apr 2, 2024

v-pratap commented Apr 9, 2024

manugarg commented Apr 9, 2024 • edited Loading

manugarg commented Apr 9, 2024

v-pratap commented Apr 10, 2024

v-pratap commented Apr 10, 2024

manugarg commented Apr 10, 2024

manugarg commented Apr 10, 2024

manugarg commented Apr 9, 2024 •

edited

Loading