-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-3425] Bootstrap profiling native extension #1584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general question: how do we move l know we are not leaking memory on native calls?
Is there anything like Valgrind or whatever the cool kids use theses days to assert the correctness of native code execution?
# This makes it easier for development (avoids "oops I forgot to rebuild when I switched my Ruby") and ensures that | ||
# the wrong library is never loaded. | ||
# When requiring, we need to use the exact same string, including the version and the platform. | ||
create_makefile "ddtrace_profiling_native_extension.#{RUBY_VERSION}_#{RUBY_PLATFORM}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to also add the OS, given we use MacOS+Docker for development in the same shared folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUBY_PLATFORM
already includes the OS:
x86_64-darwin19
this is what I get on my dev machinex86_64-linux-musl
this is docker + alpine linuxx86_64-linux
this is docker + regular linux
(Also the compiled library is going to have a .bundle
extension on macos, while linux uses .so
, so no collision there).
The one thing not included here is RUBY_ENGINE
, which may or may not be relevant for TruffleRuby but... I'm not sure it's worth it.
This is a great question! Generically, the profiler team is looking into quite a few of these tools (not only for memory leaks) since we're doing quite a bit of native development. The plan is to use them to validate the shared libddprof library to make sure that we catch those issues. (And we also plan to use Rust to build libddprof, to make use of its great approach to resource management and make it harder for us to mess it up). For this Ruby extension in particular, my plan is to not do any direct heap allocation. If we need to allocate, my plan would be to either:
For instance, in https://github.com/DataDog/dd-trace-rb/blob/ivoanjo/prototype-libddprof-integration/ext/ddtrace_native_extension/ddtrace.c#L14 I have a very rough prototype of using libddprof to report profiles (replacing the ruby-based implementation we have today). There's no heap allocation there -- we allocate a few things on the stack, and otherwise hand over the data to libddprof that either reports the data synchronously, or will need to do its own copy and manage it. Of course, this is the current plan, so if/when later it changes, we will need to revisit testing and validation of the extension. Let me know if of any thoughts/concerns :) |
My current plan is to use the profiling native extension to: * Enable use of libddprof, a (native) shared library which allows datadog profilers to share common code. One of the main advantages of this library is that it will include its own profile encoding implementation, which will enable us to drop profiler's dependency on the `google-protobuf` gem. Right now, we need to tell customers to manually it when onboarding, see <https://docs.datadoghq.com/tracing/profiler/enabling/?code-lang=ruby>, which is annoying. * Call Ruby VM APIs that are otherwise unavailable or costly when called from Ruby code. But in this commit, I decided to scale it way, way, way back to just the basics: add an empty native extension, and the scaffolding to load and test it. Thus, I hope that by releasing the next version of dd-trace-rb with the empty native extension we can to validate the approach, or otherwise root out corner cases that we may have missed. Furthermore, I'll point out that if our plans of a "big gem" go forward, having this kind of non-trivial addition on the gem is supposed to be the norm, not the exception ;) --- EVEN SO, because this is a non-trivial change, here's my notes on possible concerns, in Q&A form: **Q1**: Will requiring customers to compile a native extension during `gem install ddtrace` cause issues? **A1**: No, because of our dependencies. dd-trace-rb currently has two dependencies: `ffi` and `msgpack`. Both of those gems rely on native components, and neither of them (as of this writing) ships pre-compiled extensions (*except on Windows), as can be seen on rubygems.org: * <https://rubygems.org/gems/ffi/versions> * <https://rubygems.org/gems/msgpack/versions> This fortunate state of affairs means that customers already need to have a working setup for building native extensions, and so our addition of a native extension does not make it any harder for them to onboard. **Q2**: Will this cause problems for Windows users? **A2**: The `ffi` and `msgpack` gem ship precompiled binaries for Windows, so the reasoning from A1 doesn't apply to these users. For Windows, it's possible that customers that previously were getting by without needing an environment to build Ruby native extensions will no longer be able to install dd-trace-rb. But: * `gem install rails` on Windows pulls at least one native extension that needs to be compiled, so if you can't build dd-trace-rb, you can't install `rails` either * Recent versions of `msgpack` (since 1.4.2, from 2021-02-01) don't provide binaries either. This means that, out of the box, even before this change, `gem install ddtrace` fails on Windows if you don't have a build environment, because rubygems tries to download the latest version of `msgpack`. (Rather than picking an older version that ships a precompiled build.) So my assertion is, I don't believe we'll have any customers that A) run on Windows and B) don't have a setup for building native extensions. BUT, if this assertion turns out to be wrong, we have the option of doing the same thing that `ffi` and `msgpack` do: ship prebuilt versions of `ddtrace` for Windows users. **Q3**: Should we provide precompiled versions of the gem right now instead? **A3**: Precompiled versions of the gem introduce complexity into our release process (now we have several artifacts, that may need to be generated on multiple machines); it also may pose compatibility issues, given our Ruby 2.1 to 3.0 support Matrix. So, given the fortunate state we're in (see A1), I think we should avoid them as much as possible. **Q4**: Why write the extension in C and not Rust? **A4**: The Ruby VM APIs and headers are written in C. They cannot be directly used from Rust (e.g. a few things are actually implemented as preprocessor macros), and thus, to write an extension using Rust, we'd need to rely on community-built bindings that translate the Ruby VM headers into Rust. I've investigated the state of these bindings, and the only two that are still maintained are: * https://crates.io/crates/rosy * https://crates.io/crates/rutie Unfortunately, there don't seem to be a lot of gems using them and support for older Rubies, 32-bit OSs and Windows seems spotty. So... not in a great state at the moment for our usage. The second issue is that using Rust pushes us into needing to provide binary builds through rubygems.org -- we definitely can't assume that customers will have a working Rust compiler around. We plan on implementing libddprof (the profiling shared library) using Rust, but because it doesn't need to talk to Ruby VM APIs (we'll wrap them with some C code in this profiling native extension), we don't need to worry about bindings, and also we get a bit more flexibility on binary builds, since we don't need to link to the Ruby VM from Rust code. **Q5**: Can you use dd-trace-rb without the native extension? **A5**: Yes...ish. The extension must get built during `gem install`, but we handle any issues that may happen while loading it. So, if you're working on the gem, or the extension gets built but doesn't load properly, there should be no impact to the rest of the library; only the profiler will refuse to work. **Q6**: Will this impact JRuby users? **A6**: No. We'll skip trying to compile and load the native extension on JRuby. (Profiling is anyway not supported on JRuby).
84947ca
to
3fef0d3
Compare
Note: Just did a quick rebase + force push to fix a trivial conflict with master. |
Codecov Report
@@ Coverage Diff @@
## master #1584 +/- ##
==========================================
+ Coverage 98.22% 98.26% +0.04%
==========================================
Files 862 896 +34
Lines 41704 42951 +1247
==========================================
+ Hits 40962 42205 +1243
- Misses 742 746 +4
Continue to review full report at Codecov.
|
🙏. I still feel like having some sort of guarantee around this (an assertion in tests maybe?) would be good to have. Not something to lose sleep on at this moment. Just like any other functional requirement, I just want to make sure that "don't leak memory" is something we have automated checks for in some way. |
Definitely. As I start to use this, I'll need to up the testing game, to make sure that e.g. we don't break compilation on some platforms. Also, just out of curiosity, the existing "stub" method I added follows the rules I laid above -- it allocates a few Ruby modules (if they don't exist already), but they are allocated by the VM in the Ruby managed heap: VALUE datadog_module = rb_define_module("Datadog");
VALUE profiling_module = rb_define_module_under(datadog_module, "Profiling");
VALUE native_extension_module = rb_define_module_under(profiling_module, "NativeExtension"); Thanks and merging away. Still really interested in feedback/issues :) |
In #1584 I identified that users of dd-trace-rb on Linux/macOS needed to have a working build environment to compile msgpack and ffi, and so installing our gem would be no different. I've since identified a possible corner case: in the reliability environment, we use the GitLab omnibus distribution (<https://docs.gitlab.com/omnibus/>), which already ships pre-compiled versions of msgpack and ffi, but does not include a compiler nor the Ruby headers required for compiling extensions. Thus, up until now it was trivial to add dd-trace-rb (and profiling) to these images, but the addition of the native extension complicates things a lot. Since the native extension is still in the "we're evaluating and learning about the downsides/sharp edges of having it" phase, I decided to add a `DD_PROFILING_NO_EXTENSION` environment variable, that for now can be used to disable building and usage of the extension. Because the extension is currently empty, doing this is is harmless, but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead to the profiler being disabled as well, once the profiler code starts requiring the native extension to be available. For that reason, I documented this toggle as experimental, and added a warning whenever the app is loaded with it. For users reading this (if any!): if you needed to use this option, please tell us why on <https://github.com/DataDog/dd-trace-rb/issues/new> as we want to acommodate your use cases.
In #1584 I identified that users of dd-trace-rb on Linux/macOS needed to have a working build environment to compile msgpack and ffi, and so installing our gem would be no different. I've since identified a possible corner case: in the reliability environment, we use the GitLab omnibus distribution (<https://docs.gitlab.com/omnibus/>), which already ships pre-compiled versions of msgpack and ffi, but does not include a compiler nor the Ruby headers required for compiling extensions. Thus, up until now it was trivial to add dd-trace-rb (and profiling) to these images, but the addition of the native extension complicates things a lot. Since the native extension is still in the "we're evaluating and learning about the downsides/sharp edges of having it" phase, I decided to add a `DD_PROFILING_NO_EXTENSION` environment variable, that for now can be used to disable building and usage of the extension. Because the extension is currently empty, doing this is is harmless, but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead to the profiler being disabled as well, once the profiler code starts requiring the native extension to be available. For that reason, I documented this toggle as experimental, and added a warning whenever the app is loaded with it. For users reading this (if any!): if you needed to use this option, please tell us why on <https://github.com/DataDog/dd-trace-rb/issues/new> as we want to acommodate your use cases.
In #1584 I identified that users of dd-trace-rb on Linux/macOS needed to have a working build environment to compile msgpack and ffi, and so installing our gem would be no different. I've since identified a possible corner case: in the reliability environment, we use the GitLab omnibus distribution (<https://docs.gitlab.com/omnibus/>), which already ships pre-compiled versions of msgpack and ffi, but does not include a compiler nor the Ruby headers required for compiling extensions. Thus, up until now it was trivial to add dd-trace-rb (and profiling) to these images, but the addition of the native extension complicates things a lot. Since the native extension is still in the "we're evaluating and learning about the downsides/sharp edges of having it" phase, I decided to add a `DD_PROFILING_NO_EXTENSION` environment variable, that for now can be used to disable building and usage of the extension. Because the extension is currently empty, doing this is is harmless, but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead to the profiler being disabled as well, once the profiler code starts requiring the native extension to be available. For that reason, I documented this toggle as experimental, and added a warning whenever the app is loaded with it. For users reading this (if any!): if you needed to use this option, please tell us why on <https://github.com/DataDog/dd-trace-rb/issues/new> as we want to acommodate your use cases.
In #1584 I identified that users of dd-trace-rb on Linux/macOS needed to have a working build environment to compile msgpack and ffi, and so installing our gem would be no different. I've since identified a possible corner case: in the reliability environment, we use the GitLab omnibus distribution (<https://docs.gitlab.com/omnibus/>), which already ships pre-compiled versions of msgpack and ffi, but does not include a compiler nor the Ruby headers required for compiling extensions. Thus, up until now it was trivial to add dd-trace-rb (and profiling) to these images, but the addition of the native extension complicates things a lot. Since the native extension is still in the "we're evaluating and learning about the downsides/sharp edges of having it" phase, I decided to add a `DD_PROFILING_NO_EXTENSION` environment variable, that for now can be used to disable building and usage of the extension. Because the extension is currently empty, doing this is is harmless, but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead to the profiler being disabled as well, once the profiler code starts requiring the native extension to be available. For that reason, I documented this toggle as experimental, and added a warning whenever the app is loaded with it. For users reading this (if any!): if you needed to use this option, please tell us why on <https://github.com/DataDog/dd-trace-rb/issues/new> as we want to acommodate your use cases.
In #1584 I identified that users of dd-trace-rb on Linux/macOS needed to have a working build environment to compile msgpack and ffi, and so installing our gem would be no different. I've since identified a possible corner case: in the reliability environment, we use the GitLab omnibus distribution (<https://docs.gitlab.com/omnibus/>), which already ships pre-compiled versions of msgpack and ffi, but does not include a compiler nor the Ruby headers required for compiling extensions. Thus, up until now it was trivial to add dd-trace-rb (and profiling) to these images, but the addition of the native extension complicates things a lot. Since the native extension is still in the "we're evaluating and learning about the downsides/sharp edges of having it" phase, I decided to add a `DD_PROFILING_NO_EXTENSION` environment variable, that for now can be used to disable building and usage of the extension. Because the extension is currently empty, doing this is is harmless, but in the future, the use of `DD_PROFILING_NO_EXTENSION` will lead to the profiler being disabled as well, once the profiler code starts requiring the native extension to be available. For that reason, I documented this toggle as experimental, and added a warning whenever the app is loaded with it. For users reading this (if any!): if you needed to use this option, please tell us why on <https://github.com/DataDog/dd-trace-rb/issues/new> as we want to acommodate your use cases.
My current plan is to use the profiling native extension to:
Enable use of libddprof, a (native) shared library which allows datadog profilers to share common code.
One of the main advantages of this library is that it will include its own profile encoding implementation, which will enable us to drop profiler's dependency on the
google-protobuf
gem.Right now, we need to tell customers to manually it when onboarding, see https://docs.datadoghq.com/tracing/profiler/enabling/?code-lang=ruby, which is annoying.
Call Ruby VM APIs that are otherwise unavailable or costly when called from Ruby code.
But in this commit, I decided to scale it way, way, way back to just the basics: add an empty native extension, and the
scaffolding to load and test it.
Thus, I hope that by releasing the next version of dd-trace-rb with the empty native extension we can to validate the approach, or otherwise root out corner cases that we may have missed.
Furthermore, I'll point out that if our plans of a "big gem" go forward, having this kind of non-trivial addition on the gem is supposed to be the norm, not the exception ;)
EVEN SO, because this is a non-trivial change, here's my notes on possible concerns, in Q&A form:
Q1: Will requiring customers to compile a native extension during
gem install ddtrace
cause issues?A1: No, because of our dependencies. dd-trace-rb currently has two dependencies:
ffi
andmsgpack
. Both of those gems rely on native components, and neither of them (as of this writing) ships pre-compiled extensions (*except on Windows), as can be seen on rubygems.org:This fortunate state of affairs means that customers already need to have a working setup for building native extensions, and so our addition of a native extension does not make it any harder for them to onboard.
Q2: Will this cause problems for Windows users?
A2: The
ffi
andmsgpack
gem ship precompiled binaries for Windows, so the reasoning from A1 doesn't apply to these users.For Windows, it's possible that customers that previously were getting by without needing an environment to build Ruby native extensions will no longer be able to install dd-trace-rb.
But:
gem install rails
on Windows pulls at least one native extension that needs to be compiled, so if you can't build dd-trace-rb, you can't installrails
eithermsgpack
(since 1.4.2, from 2021-02-01) don't provide binaries either. This means that, out of the box, even before this change,gem install ddtrace
fails on Windows if you don't have a build environment, because rubygems tries to download the latest version ofmsgpack
. (Rather than picking an older version that ships a precompiled build.)So my assertion is, I don't believe we'll have any customers that A) run on Windows and B) don't have a setup for building native extensions.
BUT, if this assertion turns out to be wrong, we have the option of doing the same thing that
ffi
andmsgpack
do: ship prebuilt versions ofddtrace
for Windows users.Q3: Should we provide precompiled versions of the gem right now instead?
A3: Precompiled versions of the gem introduce complexity into our release process (now we have several artifacts, that may need to be generated on multiple machines); it also may pose compatibility issues, given our Ruby 2.1 to 3.0 support Matrix.
So, given the fortunate state we're in (see A1), I think we should avoid them as much as possible.
Q4: Why write the extension in C and not Rust?
A4: The Ruby VM APIs and headers are written in C. They cannot be directly used from Rust (e.g. a few things are actually implemented as preprocessor macros), and thus, to write an extension using Rust, we'd need to rely on community-built bindings that translate the Ruby VM headers into Rust.
I've investigated the state of these bindings, and the only two that are still maintained are:
Unfortunately, there don't seem to be a lot of gems using them and support for older Rubies, 32-bit OSs and Windows seems spotty. So... not in a great state at the moment for our usage.
The second issue is that using Rust pushes us into needing to provide binary builds through rubygems.org -- we definitely can't assume that customers will have a working Rust compiler around.
We plan on implementing libddprof (the profiling shared library) using Rust, but because it doesn't need to talk to Ruby VM APIs (we'll wrap them with some C code in this profiling native extension), we don't need to worry about bindings, and also we get a bit more flexibility on binary builds, since we don't need to link to the Ruby VM from Rust code.
Q5: Can you use dd-trace-rb without the native extension?
A5: Yes...ish. The extension must get built during
gem install
, but we handle any issues that may happen while loading it.So, if you're working on the gem, or the extension gets built but doesn't load properly, there should be no impact to the rest of the library; only the profiler will refuse to work.
Q6: Will this impact JRuby users?
A6: No. We'll skip trying to compile and load the native extension on JRuby. (Profiling is anyway not supported on JRuby).