Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-3425] Bootstrap profiling native extension #1584

Merged
merged 1 commit into from
Jul 13, 2021

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Jul 9, 2021

My current plan is to use the profiling native extension to:

  • Enable use of libddprof, a (native) shared library which allows datadog profilers to share common code.

    One of the main advantages of this library is that it will include its own profile encoding implementation, which will enable us to drop profiler's dependency on the google-protobuf gem.

    Right now, we need to tell customers to manually it when onboarding, see https://docs.datadoghq.com/tracing/profiler/enabling/?code-lang=ruby, which is annoying.

  • Call Ruby VM APIs that are otherwise unavailable or costly when called from Ruby code.

But in this commit, I decided to scale it way, way, way back to just the basics: add an empty native extension, and the
scaffolding to load and test it.

Thus, I hope that by releasing the next version of dd-trace-rb with the empty native extension we can to validate the approach, or otherwise root out corner cases that we may have missed.

Furthermore, I'll point out that if our plans of a "big gem" go forward, having this kind of non-trivial addition on the gem is supposed to be the norm, not the exception ;)


EVEN SO, because this is a non-trivial change, here's my notes on possible concerns, in Q&A form:

Q1: Will requiring customers to compile a native extension during gem install ddtrace cause issues?

A1: No, because of our dependencies. dd-trace-rb currently has two dependencies: ffi and msgpack. Both of those gems rely on native components, and neither of them (as of this writing) ships pre-compiled extensions (*except on Windows), as can be seen on rubygems.org:

This fortunate state of affairs means that customers already need to have a working setup for building native extensions, and so our addition of a native extension does not make it any harder for them to onboard.

Q2: Will this cause problems for Windows users?
A2: The ffi and msgpack gem ship precompiled binaries for Windows, so the reasoning from A1 doesn't apply to these users.

For Windows, it's possible that customers that previously were getting by without needing an environment to build Ruby native extensions will no longer be able to install dd-trace-rb.
But:

  • gem install rails on Windows pulls at least one native extension that needs to be compiled, so if you can't build dd-trace-rb, you can't install rails either
  • Recent versions of msgpack (since 1.4.2, from 2021-02-01) don't provide binaries either. This means that, out of the box, even before this change, gem install ddtrace fails on Windows if you don't have a build environment, because rubygems tries to download the latest version of msgpack. (Rather than picking an older version that ships a precompiled build.)

So my assertion is, I don't believe we'll have any customers that A) run on Windows and B) don't have a setup for building native extensions.

BUT, if this assertion turns out to be wrong, we have the option of doing the same thing that ffi and msgpack do: ship prebuilt versions of ddtrace for Windows users.

Q3: Should we provide precompiled versions of the gem right now instead?
A3: Precompiled versions of the gem introduce complexity into our release process (now we have several artifacts, that may need to be generated on multiple machines); it also may pose compatibility issues, given our Ruby 2.1 to 3.0 support Matrix.

So, given the fortunate state we're in (see A1), I think we should avoid them as much as possible.

Q4: Why write the extension in C and not Rust?
A4: The Ruby VM APIs and headers are written in C. They cannot be directly used from Rust (e.g. a few things are actually implemented as preprocessor macros), and thus, to write an extension using Rust, we'd need to rely on community-built bindings that translate the Ruby VM headers into Rust.

I've investigated the state of these bindings, and the only two that are still maintained are:

Unfortunately, there don't seem to be a lot of gems using them and support for older Rubies, 32-bit OSs and Windows seems spotty. So... not in a great state at the moment for our usage.

The second issue is that using Rust pushes us into needing to provide binary builds through rubygems.org -- we definitely can't assume that customers will have a working Rust compiler around.

We plan on implementing libddprof (the profiling shared library) using Rust, but because it doesn't need to talk to Ruby VM APIs (we'll wrap them with some C code in this profiling native extension), we don't need to worry about bindings, and also we get a bit more flexibility on binary builds, since we don't need to link to the Ruby VM from Rust code.

Q5: Can you use dd-trace-rb without the native extension?
A5: Yes...ish. The extension must get built during gem install, but we handle any issues that may happen while loading it.
So, if you're working on the gem, or the extension gets built but doesn't load properly, there should be no impact to the rest of the library; only the profiler will refuse to work.

Q6: Will this impact JRuby users?
A6: No. We'll skip trying to compile and load the native extension on JRuby. (Profiling is anyway not supported on JRuby).

@ivoanjo ivoanjo requested a review from a team July 9, 2021 11:12
Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general question: how do we move l know we are not leaking memory on native calls?
Is there anything like Valgrind or whatever the cool kids use theses days to assert the correctness of native code execution?

# This makes it easier for development (avoids "oops I forgot to rebuild when I switched my Ruby") and ensures that
# the wrong library is never loaded.
# When requiring, we need to use the exact same string, including the version and the platform.
create_makefile "ddtrace_profiling_native_extension.#{RUBY_VERSION}_#{RUBY_PLATFORM}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to also add the OS, given we use MacOS+Docker for development in the same shared folder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUBY_PLATFORM already includes the OS:

  • x86_64-darwin19 this is what I get on my dev machine
  • x86_64-linux-musl this is docker + alpine linux
  • x86_64-linux this is docker + regular linux

(Also the compiled library is going to have a .bundle extension on macos, while linux uses .so, so no collision there).

The one thing not included here is RUBY_ENGINE, which may or may not be relevant for TruffleRuby but... I'm not sure it's worth it.

@ivoanjo
Copy link
Member Author

ivoanjo commented Jul 12, 2021

As a general question: how do we move l know we are not leaking memory on native calls?
Is there anything like Valgrind or whatever the cool kids use theses days to assert the correctness of native code execution?

This is a great question!

Generically, the profiler team is looking into quite a few of these tools (not only for memory leaks) since we're doing quite a bit of native development. The plan is to use them to validate the shared libddprof library to make sure that we catch those issues. (And we also plan to use Rust to build libddprof, to make use of its great approach to resource management and make it harder for us to mess it up).

For this Ruby extension in particular, my plan is to not do any direct heap allocation. If we need to allocate, my plan would be to either:

  • ask Ruby for it (and thus the object would be managed by the Ruby GC, as usual)
  • have libddprof manage it internally (so that we don't need to manage it from the extension code)
  • allocate it on the stack (which would go away once a call to the native extension finishes) (Semi-related: this is the best source I've ever found about how the Ruby GC actually works, including how the native stack is scanned to find references).

For instance, in https://github.com/DataDog/dd-trace-rb/blob/ivoanjo/prototype-libddprof-integration/ext/ddtrace_native_extension/ddtrace.c#L14 I have a very rough prototype of using libddprof to report profiles (replacing the ruby-based implementation we have today). There's no heap allocation there -- we allocate a few things on the stack, and otherwise hand over the data to libddprof that either reports the data synchronously, or will need to do its own copy and manage it.

Of course, this is the current plan, so if/when later it changes, we will need to revisit testing and validation of the extension.

Let me know if of any thoughts/concerns :)

My current plan is to use the profiling native extension to:

* Enable use of libddprof, a (native) shared library which allows
  datadog profilers to share common code.

  One of the main advantages of this library is that it will include
  its own profile encoding implementation, which will enable us to
  drop profiler's dependency on the `google-protobuf` gem.
  Right now, we need to tell customers to manually it when onboarding,
  see <https://docs.datadoghq.com/tracing/profiler/enabling/?code-lang=ruby>,
  which is annoying.

* Call Ruby VM APIs that are otherwise unavailable or costly when
  called from Ruby code.

But in this commit, I decided to scale it way, way, way back to
just the basics: add an empty native extension, and the
scaffolding to load and test it.

Thus, I hope that by releasing the next version of dd-trace-rb
with the empty native extension we can to validate the approach,
or otherwise root out corner cases that we may have missed.

Furthermore, I'll point out that if our plans of a "big gem" go
forward, having this kind of non-trivial addition on the gem
is supposed to be the norm, not the exception ;)

---

EVEN SO, because this is a non-trivial change, here's my notes on
possible concerns, in Q&A form:

**Q1**: Will requiring customers to compile a native extension during
        `gem install ddtrace` cause issues?

**A1**: No, because of our dependencies. dd-trace-rb currently has
two dependencies: `ffi` and `msgpack`. Both of those gems rely on
native components, and neither of them (as of this writing) ships
pre-compiled extensions (*except on Windows), as can be seen on
rubygems.org:

* <https://rubygems.org/gems/ffi/versions>
* <https://rubygems.org/gems/msgpack/versions>

This fortunate state of affairs means that customers already need
to have a working setup for building native extensions, and so our
addition of a native extension does not make it any harder for them
to onboard.

**Q2**: Will this cause problems for Windows users?
**A2**: The `ffi` and `msgpack` gem ship precompiled binaries for
Windows, so the reasoning from A1 doesn't apply to these users.

For Windows, it's possible that customers that previously
were getting by without needing an environment to build Ruby native
extensions will no longer be able to install dd-trace-rb.
But:

* `gem install rails` on Windows pulls at least one native
  extension that needs to be compiled, so if you can't build
  dd-trace-rb, you can't install `rails` either
* Recent versions of `msgpack` (since 1.4.2, from 2021-02-01)
  don't provide binaries either. This means that, out of the
  box, even before this change, `gem install ddtrace` fails
  on Windows if you don't have a build environment, because
  rubygems tries to download the latest version of `msgpack`.
  (Rather than picking an older version that ships a precompiled
  build.)

So my assertion is, I don't believe we'll have any customers
that A) run on Windows and B) don't have a setup for building
native extensions.

BUT, if this assertion turns out to be wrong, we have the option
of doing the same thing that `ffi` and `msgpack` do: ship
prebuilt versions of `ddtrace` for Windows users.

**Q3**: Should we provide precompiled versions of the gem right now instead?
**A3**: Precompiled versions of the gem introduce complexity into
our release process (now we have several artifacts, that may
need to be generated on multiple machines); it also may pose
compatibility issues, given our Ruby 2.1 to 3.0 support Matrix.

So, given the fortunate state we're in (see A1), I think we should
avoid them as much as possible.

**Q4**: Why write the extension in C and not Rust?
**A4**: The Ruby VM APIs and headers are written in C. They cannot be
directly used from Rust (e.g. a few things are actually implemented
as preprocessor macros), and thus, to write an extension using Rust,
we'd need to rely on community-built bindings that translate the
Ruby VM headers into Rust.

I've investigated the state of these bindings, and the only two
that are still maintained are:

* https://crates.io/crates/rosy
* https://crates.io/crates/rutie

Unfortunately, there don't seem to be a lot of gems using them and
support for older Rubies, 32-bit OSs and Windows seems spotty.
So... not in a great state at the moment for our usage.

The second issue is that using Rust pushes us into needing to
provide binary builds through rubygems.org -- we definitely can't
assume that customers will have a working Rust compiler around.

We plan on implementing libddprof (the profiling shared library)
using Rust, but because it doesn't need to talk to Ruby VM APIs
(we'll wrap them with some C code in this profiling native extension),
we don't need to worry about bindings, and also we get a bit more
flexibility on binary builds, since we don't need to link to the
Ruby VM from Rust code.

**Q5**: Can you use dd-trace-rb without the native extension?
**A5**: Yes...ish. The extension must get built during `gem install`,
but we handle any issues that may happen while loading it.
So, if you're working on the gem, or the extension gets built
but doesn't load properly, there should be no impact to the rest
of the library; only the profiler will refuse to work.

**Q6**: Will this impact JRuby users?
**A6**: No. We'll skip trying to compile and load the native
extension on JRuby. (Profiling is anyway not supported on JRuby).
@ivoanjo ivoanjo force-pushed the ivoanjo/add-profiling-native-extension branch from 84947ca to 3fef0d3 Compare July 12, 2021 09:30
@ivoanjo
Copy link
Member Author

ivoanjo commented Jul 12, 2021

Note: Just did a quick rebase + force push to fix a trivial conflict with master.

@codecov-commenter
Copy link

Codecov Report

Merging #1584 (3fef0d3) into master (01eea6a) will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1584      +/-   ##
==========================================
+ Coverage   98.22%   98.26%   +0.04%     
==========================================
  Files         862      896      +34     
  Lines       41704    42951    +1247     
==========================================
+ Hits        40962    42205    +1243     
- Misses        742      746       +4     
Impacted Files Coverage Δ
lib/ddtrace/contrib/rails/log_injection.rb 76.92% <0.00%> (-7.46%) ⬇️
lib/ddtrace/ext/environment.rb 93.33% <0.00%> (-6.67%) ⬇️
lib/ddtrace/profiling/scheduler.rb 90.90% <0.00%> (-6.66%) ⬇️
lib/ddtrace/workers/polling.rb 96.42% <0.00%> (-3.58%) ⬇️
lib/ddtrace/configuration/components.rb 98.19% <0.00%> (-1.81%) ⬇️
spec/support/synchronization_helpers.rb 83.33% <0.00%> (-0.54%) ⬇️
lib/ddtrace/contrib/qless/patcher.rb 94.11% <0.00%> (-0.33%) ⬇️
spec/ddtrace_integration_spec.rb 96.20% <0.00%> (-0.10%) ⬇️
spec/ddtrace/contrib/rails/support/rails3.rb 94.11% <0.00%> (-0.07%) ⬇️
lib/ddtrace/configuration/base.rb 97.61% <0.00%> (-0.06%) ⬇️
... and 174 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55ffdfe...3fef0d3. Read the comment docs.

@marcotc
Copy link
Member

marcotc commented Jul 12, 2021

my plan is to not do any direct heap allocation

🙏. I still feel like having some sort of guarantee around this (an assertion in tests maybe?) would be good to have. Not something to lose sleep on at this moment.

Just like any other functional requirement, I just want to make sure that "don't leak memory" is something we have automated checks for in some way.

@ivoanjo
Copy link
Member Author

ivoanjo commented Jul 13, 2021

my plan is to not do any direct heap allocation

🙏. I still feel like having some sort of guarantee around this (an assertion in tests maybe?) would be good to have. Not something to lose sleep on at this moment.

Just like any other functional requirement, I just want to make sure that "don't leak memory" is something we have automated checks for in some way.

Definitely. As I start to use this, I'll need to up the testing game, to make sure that e.g. we don't break compilation on some platforms.

Also, just out of curiosity, the existing "stub" method I added follows the rules I laid above -- it allocates a few Ruby modules (if they don't exist already), but they are allocated by the VM in the Ruby managed heap:

  VALUE datadog_module = rb_define_module("Datadog");
  VALUE profiling_module = rb_define_module_under(datadog_module, "Profiling");
  VALUE native_extension_module = rb_define_module_under(profiling_module, "NativeExtension");

Thanks and merging away. Still really interested in feedback/issues :)

@ivoanjo ivoanjo merged commit a54750f into master Jul 13, 2021
@ivoanjo ivoanjo deleted the ivoanjo/add-profiling-native-extension branch July 13, 2021 08:30
@github-actions github-actions bot added this to the 0.52.0 milestone Jul 13, 2021
ivoanjo added a commit that referenced this pull request Jul 14, 2021
In #1584 I identified that users of dd-trace-rb on Linux/macOS
needed to have a working build environment to compile msgpack and ffi,
and so installing our gem would be no different.

I've since identified a possible corner case: in the reliability
environment, we use the GitLab omnibus distribution
(<https://docs.gitlab.com/omnibus/>), which already ships
pre-compiled versions of msgpack and ffi, but does not include a
compiler nor the Ruby headers required for compiling extensions.

Thus, up until now it was trivial to add dd-trace-rb (and profiling)
to these images, but the addition of the native extension complicates
things a lot.

Since the native extension is still in the "we're evaluating and
learning about the downsides/sharp edges of having it" phase, I
decided to add a `DD_PROFILING_NO_EXTENSION` environment variable,
that for now can be used to disable building and usage of the
extension.

Because the extension is currently empty, doing this is is harmless,
but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead
to the profiler being disabled as well, once the profiler code
starts requiring the native extension to be available. For that
reason, I documented this toggle as experimental, and added a
warning whenever the app is loaded with it.

For users reading this (if any!): if you needed to use this option,
please tell us why on
<https://github.com/DataDog/dd-trace-rb/issues/new> as we want to
acommodate your use cases.
ivoanjo added a commit that referenced this pull request Jul 14, 2021
In #1584 I identified that users of dd-trace-rb on Linux/macOS
needed to have a working build environment to compile msgpack and ffi,
and so installing our gem would be no different.

I've since identified a possible corner case: in the reliability
environment, we use the GitLab omnibus distribution
(<https://docs.gitlab.com/omnibus/>), which already ships
pre-compiled versions of msgpack and ffi, but does not include a
compiler nor the Ruby headers required for compiling extensions.

Thus, up until now it was trivial to add dd-trace-rb (and profiling)
to these images, but the addition of the native extension complicates
things a lot.

Since the native extension is still in the "we're evaluating and
learning about the downsides/sharp edges of having it" phase, I
decided to add a `DD_PROFILING_NO_EXTENSION` environment variable,
that for now can be used to disable building and usage of the
extension.

Because the extension is currently empty, doing this is is harmless,
but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead
to the profiler being disabled as well, once the profiler code
starts requiring the native extension to be available. For that
reason, I documented this toggle as experimental, and added a
warning whenever the app is loaded with it.

For users reading this (if any!): if you needed to use this option,
please tell us why on
<https://github.com/DataDog/dd-trace-rb/issues/new> as we want to
acommodate your use cases.
ivoanjo added a commit that referenced this pull request Jul 14, 2021
In #1584 I identified that users of dd-trace-rb on Linux/macOS
needed to have a working build environment to compile msgpack and ffi,
and so installing our gem would be no different.

I've since identified a possible corner case: in the reliability
environment, we use the GitLab omnibus distribution
(<https://docs.gitlab.com/omnibus/>), which already ships
pre-compiled versions of msgpack and ffi, but does not include a
compiler nor the Ruby headers required for compiling extensions.

Thus, up until now it was trivial to add dd-trace-rb (and profiling)
to these images, but the addition of the native extension complicates
things a lot.

Since the native extension is still in the "we're evaluating and
learning about the downsides/sharp edges of having it" phase, I
decided to add a `DD_PROFILING_NO_EXTENSION` environment variable,
that for now can be used to disable building and usage of the
extension.

Because the extension is currently empty, doing this is is harmless,
but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead
to the profiler being disabled as well, once the profiler code
starts requiring the native extension to be available. For that
reason, I documented this toggle as experimental, and added a
warning whenever the app is loaded with it.

For users reading this (if any!): if you needed to use this option,
please tell us why on
<https://github.com/DataDog/dd-trace-rb/issues/new> as we want to
acommodate your use cases.
ivoanjo added a commit that referenced this pull request Jul 14, 2021
In #1584 I identified that users of dd-trace-rb on Linux/macOS
needed to have a working build environment to compile msgpack and ffi,
and so installing our gem would be no different.

I've since identified a possible corner case: in the reliability
environment, we use the GitLab omnibus distribution
(<https://docs.gitlab.com/omnibus/>), which already ships
pre-compiled versions of msgpack and ffi, but does not include a
compiler nor the Ruby headers required for compiling extensions.

Thus, up until now it was trivial to add dd-trace-rb (and profiling)
to these images, but the addition of the native extension complicates
things a lot.

Since the native extension is still in the "we're evaluating and
learning about the downsides/sharp edges of having it" phase, I
decided to add a `DD_PROFILING_NO_EXTENSION` environment variable,
that for now can be used to disable building and usage of the
extension.

Because the extension is currently empty, doing this is is harmless,
but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead
to the profiler being disabled as well, once the profiler code
starts requiring the native extension to be available. For that
reason, I documented this toggle as experimental, and added a
warning whenever the app is loaded with it.

For users reading this (if any!): if you needed to use this option,
please tell us why on
<https://github.com/DataDog/dd-trace-rb/issues/new> as we want to
acommodate your use cases.
ivoanjo added a commit that referenced this pull request Jul 14, 2021
In #1584 I identified that users of dd-trace-rb on Linux/macOS
needed to have a working build environment to compile msgpack and ffi,
and so installing our gem would be no different.

I've since identified a possible corner case: in the reliability
environment, we use the GitLab omnibus distribution
(<https://docs.gitlab.com/omnibus/>), which already ships
pre-compiled versions of msgpack and ffi, but does not include a
compiler nor the Ruby headers required for compiling extensions.

Thus, up until now it was trivial to add dd-trace-rb (and profiling)
to these images, but the addition of the native extension complicates
things a lot.

Since the native extension is still in the "we're evaluating and
learning about the downsides/sharp edges of having it" phase, I
decided to add a `DD_PROFILING_NO_EXTENSION` environment variable,
that for now can be used to disable building and usage of the
extension.

Because the extension is currently empty, doing this is is harmless,
but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead
to the profiler being disabled as well, once the profiler code
starts requiring the native extension to be available. For that
reason, I documented this toggle as experimental, and added a
warning whenever the app is loaded with it.

For users reading this (if any!): if you needed to use this option,
please tell us why on
<https://github.com/DataDog/dd-trace-rb/issues/new> as we want to
acommodate your use cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants