Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCP-184] Fix incompatible library check when there's no libruby.so #2250

Merged
merged 1 commit into from
Aug 31, 2022

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Aug 31, 2022

What does this PR do?:

Add a missing check for a NULL result of dlsym that happens when Ruby is not built with libruby.so.

Motivation:

The logic in ddtrace_profiling_loader.c follows Ruby's dln.c; see the big comment at the top of the former for why it needs to exist.

The incompatible_library sanity check exists to try to catch situations where the library being loaded is linked to a different libruby.so (or equivalent in other OSs) than the one that is doing the loading.

BUT when we implemented incompatible_library we missed an check that upstream Ruby does.

From https://github.com/ruby/ruby/blob/v3_1_2/dln.c#L295:

static bool
dln_incompatible_library_p(void *handle)
{
    void *ex = dlsym(handle, EXTERNAL_PREFIX"ruby_xmalloc");
    void *const fp = (void *)ruby_xmalloc;
    return ex && ex != fp; // ⬅️ HERE!
}

Ruby actually checks that the symbol found is not NULL (e.g. with the ex && check). This is important, because there's a valid situation in which the symbol is expected to be NULL: when Ruby is built with --disable-shared at compilation time.

What --disable-shared does is that it builds the Ruby VM logic inside the ruby executable itself INSTEAD OF splitting the logic between the ruby executable and the libruby.so shared library.

To properly support this configuration, we should, like upstream Ruby does, consider a library compatible if the symbol lookup returns NULL.

Additional Notes:

Here's a few ways of telling if there's a libruby:

  • RbConfig::CONFIG['configure_args'] will contain --disable-shared
  • RbConfig::CONFIG['LIBRUBY'] will be libruby-static.a instead of, for instance libruby.so.3.1.1
  • Ruby will not be linked to libruby:
    # with libruby
    $ ldd `which ruby` | grep libruby
      libruby.so.3.1 => /usr/local/lib/libruby.so.3.1 (0x00007fb37e260000)
    
    # without libruby
    $ ldd `which ruby` | grep libruby
    (no output)
    

How to test the change?:

I couldn't find a prebuilt docker image that had Ruby compiled with --disable-shared.

To reproduce the issue, I downloaded the latest Ruby version, configured it with ./configure --disable-shared, compiled and installed it, and then installed dd-trace-rb, and tried to start the profiler.

Without this fix, profiling did not work, I would get:

W, [2022-08-31T10:41:11.727931 #67946]  WARN -- ddtrace: [ddtrace] Profiling was requested
but is not supported, profiling disabled: There was an error loading the profiling native
extension due to 'RuntimeError Failure to load 
ddtrace_profiling_native_extension.3.1.2_x86_64-linux due to library was compiled and linked 
to a different Ruby version' at
'/working/lib/datadog/profiling/load_native_extension.rb:22:in `<top (required)>''

With the fix, the profiler starts up successfully.

The logic in `ddtrace_profiling_loader.c` follows Ruby's `dln.c`;
see the big comment at the top of the former for why it needs to exist.

The `incompatible_library` sanity check exists to try to catch
situations where the library being loaded is linked to a different
`libruby.so` (or equivalent in other OSs) than the one that is doing
the loading.

BUT when we implemented `incompatible_library` we missed an check
that upstream Ruby does.

From https://github.com/ruby/ruby/blob/v3_1_2/dln.c#L295:

```c
static bool
dln_incompatible_library_p(void *handle)
{
    void *ex = dlsym(handle, EXTERNAL_PREFIX"ruby_xmalloc");
    void *const fp = (void *)ruby_xmalloc;
    return ex && ex != fp;
}
```

Ruby actually checks that the symbol found is **not NULL**
(e.g. with the `ex && ` check). This is important, because there's
a valid situation in which the symbol is expected to be NULL:
when Ruby is built with `--disable-shared` at compilation time.

What `--disable-shared` does is that it builds the Ruby VM logic
inside the `ruby` executable itself INSTEAD OF splitting the logic
between the `ruby` executable and the `libruby.so` shared library.

To properly support this configuration, we should, like upstream
Ruby does, consider a library compatible if the symbol lookup
returns `NULL`.

Extra note -- here's a few ways of telling if there's a `libruby`:

* `RbConfig::CONFIG['configure_args']` will contain `--disable-shared`
* `RbConfig::CONFIG['LIBRUBY']` will be `libruby-static.a` instead
   of, for instance `libruby.so.3.1.1`
* Ruby will not be linked to libruby:
  ```
  # with libruby
  $ ldd `which ruby` | grep libruby
    libruby.so.3.1 => /usr/local/lib/libruby.so.3.1 (0x00007fb37e260000)

  # without libruby
  $ ldd `which ruby` | grep libruby
  (no output)
  ```
@ivoanjo ivoanjo requested a review from a team August 31, 2022 10:56
Copy link

@r1viollet r1viollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
If upstream ruby relies on this, it sounds good to me
Nice reproducing and checking the behaviour !

@ivoanjo ivoanjo merged commit ac39d91 into master Aug 31, 2022
@ivoanjo ivoanjo deleted the ivoanjo/scp-184-handle-ruby-without-shared-object branch August 31, 2022 12:21
@github-actions github-actions bot added this to the 1.5.0 milestone Aug 31, 2022
@marcotc marcotc modified the milestones: 1.5.0, 1.4.1 Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants