Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go Execution Traces: Collection Frequency #2099

Open
felixge opened this issue Jul 6, 2023 · 3 comments
Open

Go Execution Traces: Collection Frequency #2099

felixge opened this issue Jul 6, 2023 · 3 comments
Assignees
Labels

Comments

@felixge
Copy link
Member

felixge commented Jul 6, 2023

This issue is used to gather feedback on the data collection frequency of the new execution tracer (aka code hotspots timeline) feature of the profiler during the beta.

The default collection frequency is 15min. If you'd like to collect this data more frequently, please upvote this issue and comment with more details on your use case.

During the beta you can change this frequency as shown below:

os.Setenv("DD_PROFILING_EXECUTION_TRACE_PERIOD", "15m")
@felixge felixge self-assigned this Jul 6, 2023
@pmaseberg
Copy link

I enabled this feature (which is awesome) and with the default of 15m it "seemed" like it did not work. It is possible that I was not patient enough. I ended up setting it to 1 minute. While I was troubleshooting a slow network issue in real time I still had to want longer then I would like to see the data load. Anyway, all that to say in a troubleshoot setting 15m seems way to long.

What is the cost of setting this to a lower value?

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Aug 2, 2023

Hi @pmaseberg, thanks for your feedback! I have a few questions:

  1. Just to double check, the default period as of dd-trace-go version 1.52 was 90 minutes, if you only set DD_PROFILING_EXECUTION_TRACE_ENABLED=true and nothing else. The default will be 15 minutes starting from the just-released 1.53. Did you set the period to 15 minutes first, or did you enable with no change and then switch to 1 minute? If so, that might explain why you didn't see data as soon as you expected.
  2. Was the slow network issue you were investigating temporary, or persistent? i.e. did it only appear for a few seconds/minutes, or did it last longer or reoccur frequently?

To answer your question, here's the overhead we expect from a single execution trace collection:

  • Up to 1-2 CPU-seconds of CPU usage
  • Up to 1-2 seconds of cumulative latency impact (execution tracing adds a little latency to every time a goroutine starts running, blocks, gets preempted, etc.)
  • A stop-the-world pause that scales with the number of goroutines. The pause should be similar in length to a typical GC stop-the-world pause.
  • Up to ~5MiB of bandwidth to upload the trace, due to our default execution trace size limit of 5iMB.

So, setting the period to a smaller value will result in more traces, and more of all the above overhead. The CPU and latency impact should be reduced significantly with Go 1.21, which will be released soon, but you'll still see the pause and the bandwidth usage increase. The bandwidth in particular is something to look out for depending on what you normally pay for data transfer.

@pmaseberg
Copy link

@nsrip-dd

  1. I was using 1.52 and thinking the default was 15 minutes. I then added DD_PROFILING_EXECUTION_TRACE_PERIOD to 1m after. This makes sense why I was not seeing it now!
  2. The slow networking issues have been ongoing. We have been tracking it down and pinpoint the cause. Seeing the network time in the datadog log that link to a given request have been invaluable in this process!

Thanks for the info! I just saw yesterday about Go 1.21. We are going to wait for that before trying to enable in prod just to be safe will have to decide on the Period then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants