Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(profiling) move upscaling to libdatadog #1984

Merged
merged 12 commits into from
Apr 18, 2023

Conversation

realFlowControl
Copy link
Collaborator

@realFlowControl realFlowControl commented Mar 21, 2023

Description

This will remove upscaling for alloc-size and alloc-samples profile types from the library and make use of the upscaling capabilities in libdatadog.
Doing this solves the rounding issues with the current implementation due to float -> integer conversion, therefore allocation profiles will be more correct and it offloads upscaling from the main thread to the serialization step.
Also bumps libdatadog to 2.1.0

Readiness checklist

  • (only for Members) Changelog has been added to the release document.
  • Tests added for this feature/bug.

Reviewer checklist

  • Appropriate labels assigned.
  • Milestone is set.
  • Changelog has been added to the release document. For community contributors the reviewer is in charge of this task.

This will remove upscaling for `alloc-size` and `alloc-samples` profile
types from the library and make use of the upscaling capabilities in
`libdatadog`.
Doing this solves the rounding issues with the current implementation
due to float -> integer conversion, therefore allocation profiles will
be more correct and it offloads upscaling from the main thread to the
serialization step.
@realFlowControl realFlowControl requested a review from a team as a code owner March 21, 2023 14:07
@realFlowControl realFlowControl marked this pull request as draft March 21, 2023 14:07
@realFlowControl realFlowControl added 🏆 enhancement A new feature or improvement profiling Relates to the Continuous Profiler labels Mar 21, 2023
@realFlowControl realFlowControl added this to the 0.87.0 milestone Mar 28, 2023
@realFlowControl realFlowControl marked this pull request as ready for review March 28, 2023 17:27
@realFlowControl realFlowControl self-assigned this Apr 4, 2023
Copy link
Collaborator

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an understanding of how this changes performance?

@realFlowControl
Copy link
Collaborator Author

Did a benchmark on CLI using a script that does a lot of allocations (therefor a lot of samples in allocation profiling)

$ hyperfine --runs 20 --warmup 2 "php -d extension=datadog-profiling-old.dylib test.php" "php -d extension=datadog-profiling.dylib test.php" 
Benchmark 1: php -d extension=datadog-profiling-old.dylib test.php
  Time (mean ± σ):     420.1 ms ±   5.1 ms    [User: 461.9 ms, System: 29.1 ms]
  Range (min … max):   416.6 ms … 436.3 ms    20 runs

Benchmark 2: php -d extension=datadog-profiling.dylib test.php
  Time (mean ± σ):     416.2 ms ±   2.4 ms    [User: 460.5 ms, System: 26.3 ms]
  Range (min … max):   414.7 ms … 425.5 ms    20 runs

Summary
  'php -d extension=datadog-profiling.dylib test.php' ran
    1.01 ± 0.01 times faster than 'php -d extension=datadog-profiling-old.dylib test.php'

The -old-version is with upscaling in the library, the other version is with upscaling in libdatadog. This is tuned to write the pprof file to disk and not send it to an agent, but this involves also the profiler startup and shutdown time, which includes also the serialisation (and therefor upscaling) in both cases.
I'll try and make another check with siege against a webserver to see those numbers as well.

Depending on the machine the test runs on, it could be possible that we
either take a sample or not. We don't care for this test, but we
actually tested that we do not take a sample, which might fail this
test in case we do
@realFlowControl
Copy link
Collaborator Author

The old version with upscaling in the library:

Transactions:		         58 hits
Availability:		     100.00 %
Elapsed time:		      10.16 secs
Data transferred:	       1.50 MB
Response time:		       0.17 secs
Transaction rate:	       5.71 trans/sec
Throughput:		       0.15 MB/sec
Concurrency:		       0.99
Successful transactions:          58
Failed transactions:	          0
Longest transaction:	       0.31
Shortest transaction:	       0.14

The new version with upscaling in libdatadog

Transactions:		         63 hits
Availability:		     100.00 %
Elapsed time:		      10.87 secs
Data transferred:	       1.63 MB
Response time:		       0.17 secs
Transaction rate:	       5.80 trans/sec
Throughput:		       0.15 MB/sec
Concurrency:		       0.99
Successful transactions:          63
Failed transactions:	          0
Longest transaction:	       0.30
Shortest transaction:	       0.14

The difference in the siege benchmarks is pretty slim.

Note in the benchmarks above that the "new" version was running 0.71 seconds longer, which in worst case allows for 2 additional requests, in best case for 5 additional requests. I ran these tests multiple times and they where all close, but "leaning" towards the "new" version performing slightly better.

Copy link
Collaborator

@morrisonlevi morrisonlevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you think the upscaled numbers we are seeing make sense, then I'm good to merge this!

@realFlowControl realFlowControl merged commit 3f26e56 into master Apr 18, 2023
@realFlowControl realFlowControl deleted the florian/libdatadog-upscaling branch April 18, 2023 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏆 enhancement A new feature or improvement profiling Relates to the Continuous Profiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants