Skip to content

PROF-1037: Capture thread CPU time elapsed in scope#1358

Merged
jbachorik merged 18 commits into
masterfrom
jb/PROF-1307_Scope_event_cpu_time
Apr 17, 2020
Merged

PROF-1037: Capture thread CPU time elapsed in scope#1358
jbachorik merged 18 commits into
masterfrom
jb/PROF-1307_Scope_event_cpu_time

Conversation

@jbachorik
Copy link
Copy Markdown
Contributor

@jbachorik jbachorik commented Apr 9, 2020

Add thread CPU time elapsed while the scope was active to the Scope event.

Thread CPU time is comparatively cheap to obtain (~300ns per inocation) so we should be able to enhance the Scope event with the amount of CPU time spent in each recorded scope.

@jbachorik jbachorik changed the title Initial implementation of thread cpu time per scope PROF-1037: Capture thread CPU time elapsed in scope Apr 15, 2020
@jbachorik jbachorik marked this pull request as ready for review April 15, 2020 16:37
@jbachorik jbachorik requested a review from a team as a code owner April 15, 2020 16:37
Copy link
Copy Markdown
Contributor

@devinsba devinsba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. We might want to check that JFR is supported before making the initialize call to save a class load


@Label("Thread CPU Time")
@Timespan
private volatile long cpuTime = 0L;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that scopes are locked to given thread - is there any point in this being volatile?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is guaranteed that the event is created and committed from the same thread then this does not need to be volatile. I just wasn't sure there was such a guarantee.

Comment thread dd-java-agent/agent-bootstrap/src/main/java/datadog/trace/bootstrap/Agent.java Outdated
Copy link
Copy Markdown
Contributor

@mar-kolya mar-kolya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really great to have new code tested.

In addition to requests already mentioned in the comments I think it would be possible to have integration test for profiling improved to have some verification that we are getting reasonable data in new event.

@jbachorik
Copy link
Copy Markdown
Contributor Author

jbachorik commented Apr 16, 2020

@devinsba The check would be quite tricky without reworking the initialization logic in Agent/ProfilingAgent.

In the current implementation we first fire up ProfilingAgent initialization and only then we run (possibly asynchronous) JMX etc. initialization blocks.
The ThreadCpuTime.initialize() routine needs to run within JMX initialization block to make sure we are not triggering JMX initialization at wrong time. But we should not be attempting to touch JFR classes from Agent itself since all interaction with JFR should be done in ProfilingAgent.
However, ProfilingAgent has no idea about JMX etc. initialization. It is kind of chicken&egg problem.

I would really like to avoid major rewrite of initialization protocol for Agent in this PR. If there is a simple way to achieve the JFR check which would not possibly screw up JMX initialization I am not seeing, please advice.

EDIT: Actually, scratch this. I was able to come up with something simple with @mar-kolya help so we might be good to go regarding not loading this class unnecessarily.

@jbachorik jbachorik requested a review from a team as a code owner April 16, 2020 14:32
* Initialize the thread cpu time provider only if profiling is enabled.
* ATTENTION! If ever is the thread cpu time provider used outside of profiler this check should be revisited.
*/
if (Config.get().isProfilingEnabled()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use Config here - and IIRC there are awkward implications for doing that. Could you please move this check into datadog.trace.common.util.ThreadCpuTimeAccess#enableJmx?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I thought the idea was to avoid loading that class at all :)

attribute = k
}
}
return attribute
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would def accessor = Attribute.attr("cpuTime", UnitLookup.TIMESPAN).getAccessor(scopeEventIterable.type) work instead of this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh ... Yes, it works.

Comment thread dd-trace-ot/src/main/java/datadog/trace/common/util/ThreadCpuTimeProvider.java Outdated
Comment thread dd-trace-ot/src/main/java/datadog/trace/common/util/ThreadCpuTimeAccess.java Outdated
@mar-kolya
Copy link
Copy Markdown
Contributor

LGTM assuming it builds

@devinsba devinsba self-requested a review April 17, 2020 15:56
@jbachorik jbachorik merged commit 185ad18 into master Apr 17, 2020
@jbachorik jbachorik deleted the jb/PROF-1307_Scope_event_cpu_time branch April 17, 2020 16:14
end();
if (shouldCommit()) {
if (cpuTime > 0) {
cpuTime = ThreadCpuTimeAccess.getCurrentThreadCpuTime() - cpuTime;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any risk of finish() being called more than once?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called solely from ContinuableScope.close() so as long as that scope is not closed multiple times we are fine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that if this happens things will not blow up but data we produce would be off. But arguably this is the best we can do in this situation anyway.

@tylerbenson tylerbenson added this to the 0.49.0 milestone Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants