[PROF-4260] Add jitter to profile flushes #1807

ivoanjo · 2021-12-06T14:48:39Z

In Ruby applications that use multiple processes (such as webservers like puma), we start a profiler instance for each individual process.

If those applications fork all their processes at the same time (usually at start-up time), we get N processes that start profiling at the same time, and that will report profiles every minute at exacty the same time.

To spread out the impact of this reporting (both on the reporting app, as well as on the profiling backend), this PR adds a very small random sleep (at most 3 seconds) before each report is made.

Note that because we sleep AFTER we collect the events from the recorder, we still report exactly the same data as before -- we just may report it ever-so-slightly later.

In Ruby applications that use multiple processes (such as webservers like puma), we start a profiler instance for each individual process. If those applications fork all their processes at the same time (usually at start-up time), we get N processes that start profiling at the same time, and that will report profiles every minute at exacty the same time. To spread out the impact of this reporting (both on the reporting app, as well as on the profiling backend), this PR adds a very small random sleep (at most 3 seconds) before each report is made. Note that because we sleep AFTER we collect the events from the recorder, we still report exactly the same data as before -- we just may report it ever-so-slightly later.

AlexJF · 2021-12-09T12:29:41Z

lib/ddtrace/profiling/scheduler.rb

+        if run_loop?
+          jitter_seconds = rand * DEFAULT_FLUSH_JITTER_MAXIMUM_SECONDS # floating point number between (0.0...maximum)
+          sleep(jitter_seconds)
+        end


Nit: It feels weird to me to have this happen here and also to have it happen at every exporting attempt? The presence of the run_loop? check also seems to hint that maybe this waiting should be in the loop itself rather than deep down in the flushing function?

I'm probably oblivious to a lot of technical details here but wouldn't just waiting on the first loop iteration, outside of the loop_wait_time calculation logic allow us to just sleep once to the same effect?

I'm probably oblivious to a lot of technical details here but wouldn't just waiting on the first loop iteration, outside of the loop_wait_time calculation logic allow us to just sleep once to the same effect?

Not quite. Sleeping before starting the loop would extend the first profile reported to contain 60+slept time seconds (at most 63 seconds, in this case). I think that would be somewhat confusing for users so I'd prefer to avoid it, but maybe it's not that big of a deal anyway (there's other things that can cause the profile to be < 60 or > 60 seconds that I can think of).

The presence of the run_loop? check also seems to hint that maybe this waiting should be in the loop itself rather than deep down in the flushing function?

Yeah... It's a good point. My thinking behind putting it here is that the loop is in a generic module that is used in a bunch of places in dd-trace-rb, and this is quite a specific behavior, so it seems weird to just have the generic module be a "bag of features that may be used only once" rather than just the common parts.

and also to have it happen at every exporting attempt?

Do you mean because we pick a different random every time, or because it's inside the loop? My first version actually picked the random only once and stored it as an instance variable, but at some point I was thinking that just calling random again once per second is fine and makes the change even more self-contained.

Ah I see. Yeah most of my comments were related to sleeping on each flush but I guess I see your point. I don't have a strong opinion between keeping it as you have it here or having up to 63 second duration so lets go with whatever is easier to maintain.

I'd maybe suggest adding a variation of

Sleeping before starting the loop would extend the first profile reported to contain 60+slept time seconds (at most 63 seconds, in this case). I think that would be somewhat confusing for users so I'd prefer to avoid it

to the main if statement in this PR to remind the next person on why we have it there.

👍 added note in ddd66e4 .

ivoanjo requested review from AlexJF and a team December 6, 2021 14:48

marcotc approved these changes Dec 6, 2021

View reviewed changes

AlexJF reviewed Dec 9, 2021

View reviewed changes

AlexJF approved these changes Dec 9, 2021

View reviewed changes

Add note about scheduler sleep alternative approach

ddd66e4

ivoanjo merged commit 0aeb038 into master Dec 10, 2021

ivoanjo deleted the ivoanjo/prof-4260-add-jitter-to-profile-flush branch December 10, 2021 10:48

github-actions bot added this to the 0.55.0 milestone Dec 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-4260] Add jitter to profile flushes #1807

[PROF-4260] Add jitter to profile flushes #1807

ivoanjo commented Dec 6, 2021

AlexJF Dec 9, 2021

ivoanjo Dec 9, 2021

AlexJF Dec 9, 2021 •

edited

ivoanjo Dec 10, 2021

[PROF-4260] Add jitter to profile flushes #1807

[PROF-4260] Add jitter to profile flushes #1807

Conversation

ivoanjo commented Dec 6, 2021

AlexJF Dec 9, 2021

Choose a reason for hiding this comment

ivoanjo Dec 9, 2021

Choose a reason for hiding this comment

AlexJF Dec 9, 2021 • edited

Choose a reason for hiding this comment

ivoanjo Dec 10, 2021

Choose a reason for hiding this comment

AlexJF Dec 9, 2021 •

edited