Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-4260] Add jitter to profile flushes #1807

Merged
merged 2 commits into from
Dec 10, 2021

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Dec 6, 2021

In Ruby applications that use multiple processes (such as webservers like puma), we start a profiler instance for each individual process.

If those applications fork all their processes at the same time (usually at start-up time), we get N processes that start profiling at the same time, and that will report profiles every minute at exacty the same time.

To spread out the impact of this reporting (both on the reporting app, as well as on the profiling backend), this PR adds a very small random sleep (at most 3 seconds) before each report is made.

Note that because we sleep AFTER we collect the events from the recorder, we still report exactly the same data as before -- we just may report it ever-so-slightly later.

In Ruby applications that use multiple processes (such as webservers
like puma), we start a profiler instance for each individual process.

If those applications fork all their processes at the same time
(usually at start-up time), we get N processes that start profiling
at the same time, and that will report profiles every minute
at exacty the same time.

To spread out the impact of this reporting (both on the reporting app,
as well as on the profiling backend), this PR adds a very small
random sleep (at most 3 seconds) before each report is made.

Note that because we sleep AFTER we collect the events from the
recorder, we still report exactly the same data as before -- we just
may report it ever-so-slightly later.
@ivoanjo ivoanjo requested review from AlexJF and a team December 6, 2021 14:48
Comment on lines +123 to +126
if run_loop?
jitter_seconds = rand * DEFAULT_FLUSH_JITTER_MAXIMUM_SECONDS # floating point number between (0.0...maximum)
sleep(jitter_seconds)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It feels weird to me to have this happen here and also to have it happen at every exporting attempt? The presence of the run_loop? check also seems to hint that maybe this waiting should be in the loop itself rather than deep down in the flushing function?

I'm probably oblivious to a lot of technical details here but wouldn't just waiting on the first loop iteration, outside of the loop_wait_time calculation logic allow us to just sleep once to the same effect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably oblivious to a lot of technical details here but wouldn't just waiting on the first loop iteration, outside of the loop_wait_time calculation logic allow us to just sleep once to the same effect?

Not quite. Sleeping before starting the loop would extend the first profile reported to contain 60+slept time seconds (at most 63 seconds, in this case). I think that would be somewhat confusing for users so I'd prefer to avoid it, but maybe it's not that big of a deal anyway (there's other things that can cause the profile to be < 60 or > 60 seconds that I can think of).

The presence of the run_loop? check also seems to hint that maybe this waiting should be in the loop itself rather than deep down in the flushing function?

Yeah... It's a good point. My thinking behind putting it here is that the loop is in a generic module that is used in a bunch of places in dd-trace-rb, and this is quite a specific behavior, so it seems weird to just have the generic module be a "bag of features that may be used only once" rather than just the common parts.

and also to have it happen at every exporting attempt?

Do you mean because we pick a different random every time, or because it's inside the loop? My first version actually picked the random only once and stored it as an instance variable, but at some point I was thinking that just calling random again once per second is fine and makes the change even more self-contained.

Copy link
Contributor

@AlexJF AlexJF Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Yeah most of my comments were related to sleeping on each flush but I guess I see your point. I don't have a strong opinion between keeping it as you have it here or having up to 63 second duration so lets go with whatever is easier to maintain.

I'd maybe suggest adding a variation of

Sleeping before starting the loop would extend the first profile reported to contain 60+slept time seconds (at most 63 seconds, in this case). I think that would be somewhat confusing for users so I'd prefer to avoid it

to the main if statement in this PR to remind the next person on why we have it there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 added note in ddd66e4 .

@ivoanjo ivoanjo merged commit 0aeb038 into master Dec 10, 2021
@ivoanjo ivoanjo deleted the ivoanjo/prof-4260-add-jitter-to-profile-flush branch December 10, 2021 10:48
@github-actions github-actions bot added this to the 0.55.0 milestone Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants