Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lldb-3.9 cannot load libsosplugin.so #8292

Closed
micli opened this issue Jun 6, 2017 · 30 comments
Closed

lldb-3.9 cannot load libsosplugin.so #8292

micli opened this issue Jun 6, 2017 · 30 comments
Assignees
Labels
area-Diagnostics-coreclr os-linux Linux OS (any supported distro)
Milestone

Comments

@micli
Copy link

micli commented Jun 6, 2017

Environment

dotnet --info
.NET Command Line Tools (2.0.0-preview1-005867)

Product Information:
Version: 2.0.0-preview1-005867
Commit SHA-1 hash: 37d826d763

Runtime Environment:
OS Name: debian
OS Version: 8
OS Platform: Linux
RID: debian.8-x64
Base Path: /opt/dotnet/sdk/2.0.0-preview1-005867/

Microsoft .NET Core Shared Framework Host
Version : 2.0.0-preview1-002091-00
Build : fa62d01

lldb-3.9 which was installed from apt.llvm.org

Repo steps

using offical .NET Core 2.0 Preview

cd /opt/dotnet/shared/Microsoft.NETCore.App/2.0.0-preview1-002091-00
lldb-3.9
(lldb) plugin load libsosplugin.so
error: this file does not represent a loadable dylib

using custom build .NET Core 2.0 source code clone from github.com/dotnet/coreclr

cd /home/micl/dotnet/coreclr/bin/Product/Linux.x64.Debug
lldb-3.9
(lldb) plugin load libsosplugin.so
Segmentation fault

Neither of them was not loaded correctly.

@janvorli
Copy link
Member

janvorli commented Jun 6, 2017

@micli that's expected. The lldb plugin that's present in the package is for lldb 3.6 and it cannot work with other versions since it is statically linked to the lldb libraries of that specific version. You'll need to build coreclr yourself with lldb-3.9 dev package installed to get plugin that works with that.

@stephentoub
Copy link
Member

@janvorli, I actually get the same error with lldb-3.6 on my Ubuntu 16.10 VM.

@janvorli
Copy link
Member

janvorli commented Jun 6, 2017

I have looked into that and the issue is that the libsosplugin.so in our portable package is not portable itself due to the different SONAME of liblldb.so on Fedora based distros and others. Since out portable build is built on Fedora based distro, the plugin cannot resolve the dependency.
I am trying to figure out if we can make it portable.
CC: @gkhanna79, @Petermarcu

@gkhanna79
Copy link
Member

Assigning to @mikem8361 who owns SOS.

CC @lt72

@janvorli
Copy link
Member

janvorli commented Jun 6, 2017

It seems it is actually even more complicated. The SONAME of the liblldb on Fedora 25 is liblldb.so.3.9.1 while on CentOS 7 it is just liblldb.so and on ubuntu 14.04 it is liblldb-3.9.so.1, so it looks like there are possibly many variants of the SONAME.

@stephentoub
Copy link
Member

FYI, @ellismg and @CesarBS, this is the same issue you were hitting.

@janvorli
Copy link
Member

janvorli commented Jun 6, 2017

I actually think that the best solution would be to stop packaging the libsosplugin.so in the dotnet core runtime / sdk and provide it in a separate package or tarball for each [DISTRO, LLDB_VERSION] path. This plugin is independent of the coreclr version, so we could basically build it once, publish it and forget. The build of that plugin would not have to happen on our build machines, any time a new distro / lldb version needs to be supported, we could do a one-off build of that in a docker container for that distro and publish that.
There can possibly be a case when we would decide to add a new SOS command that would require some extra glue in the libsosplugin.so, but that should be rare. @mikem8361 should know better than me though.

@mikem8361
Copy link
Member

It is true that the libsosplugin.so does really change that much and the interface between it and sos should always be backward/forward compatible. Putting it in a separate distro specific package/zip/etc. will make using SOS a lot harder. I understand that the "portable" build will require us to do something like this but it just adds to overhead of getting SOS working.

@mikem8361
Copy link
Member

I thought the portable builds will built on RHEL.

On Ubuntu 14.04, the lldb 3.9 install does create links for "liblldb.so" to "liblldb-3.9.so.1", etc. I'll have to investigate all the other distros to see if they all have "liblldb.so". If so, then maybe we can force the portable build to find/use "liblldb.so" SONAME to build libsosplugin.so against instead of "liblldb.3.9.so.1".

Jan can you look at your Fedora and any other distro you have handy? Building with liblldb.so might just be getting the CMakeLists.txt logic right.

@janvorli
Copy link
Member

janvorli commented Jun 6, 2017

@mikem8361 the loader doesn't care about the filenames. It uses the SONAME stored in the libraries. So for example, if you have libssl.so.1.0.2 which has SONAME libssl.so.1.0.0 and your binary references libssl.so.1.0.0, then it loads the file libssl.so.1.0.2.
The libraries with just .so are usually only in the development packages so that the build can refer to a non-versioned name. They are not installed without these dev packages.
So for example on my Ubuntu 14.04, I have liblldb.so:
/usr/lib/llvm-3.6/lib/liblldb.so
But as you can see below, the dynamic loader doesn't care about this one, since it is a link to /usr/lib/llvm-3.6/lib/liblldb.so.1 and its soname is

 objdump -p /usr/lib/llvm-3.6/lib/liblldb.so.1 | grep SONAME
  SONAME               liblldb-3.6.so
ldd libsosplugin.so
        linux-vdso.so.1 =>  (0x00007fff2b3c3000)
        liblldb.so => not found
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2c467be000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2c464b7000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2c462a1000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2c45edc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2c46cef000)

@mikem8361
Copy link
Member

Building for each distro and packaging libsosplugin separately may be the only choice but it is going to be a big hassle. This would mean we still have to build the coreclr repo for each distro (the build system could be changed to just build the sos plugin). The sos plugin and sos have dependencies on each (a COM like interface) that would make it inconvenient to move to a separate repo.

And we want to try to do this for 2.0.0? I don't think so.

@janvorli, @lt72, @gkhanna79

@janvorli
Copy link
Member

janvorli commented Jun 6, 2017

@mikem8361 I don't think we can just leave the plugin broken. Currently, it doesn't work on anything else than the Centos 7 / RHEL 7 where we build the portable build. So we need to do something about it.

I can see the following options:

  1. As I have described above - remove it from the dotnet core package and make it available separately for each distro / lldb version. Actually, it may be that except for the RHEL 7, there are just two schemes for the SONAME and if that's the case, then we might end up with smaller matrix with just three distro flavors: RHEL 7, anything based on Fedora newer than RHEL 7 and other distros.
  2. Try to make the plugin portable. Load the liblldb.so.xxx using dlopen, trying all SONAME schemes we find being used.
    The tricky part is how to refer to the methods of the LLDB objects in a way that would allow the sos plugin to be loaded before these methods are resolved.
    I am not sure if the way of function pointers I was using to do similar thing for libssl or libicu would work here due to the fact that here we have class methods and not plain functions. Especially constructors would be tricky if possible at all.
    One approach that might work fine would be to have an intermediate .so that would first load the liblldb.so using dlopen and then load the .so that's now the libsosplugin.so, but with the explicit dependency on the liblldb.so removed. That way, the symbols loaded from the liblldb.so would satisfy the symbols required by the plugin.

Actually, now that I think about it, the last described approach might give us a libsosplugin.so that would be independent of the lldb version, which would be really cool. But that really depends on the stability of the liblldb interface over the lldb versions.

@gkhanna79
Copy link
Member

Currently, it doesn't work on anything else than the Centos 7 / RHEL 7 where we build the portable build.

I agree this needs to be fixed - and building it for various distros (other than RHEL7 where portable build happens) would require bringing back distro specific builds that we intentionally removed.

@mikem8361 Can you please look into determining what is the feasibility of building it portable and if not, how should this be supported?

Unless the work required is minimal (I doubt it), this will need to be done post 2.0. For 2.0, I would suggest we build these manually for the key distros and make them available.

mikem8361 referenced this issue in mikem8361/coreclr Jun 7, 2017
Removing the explicit reference to liblldb. Since the lldb program has
already loaded this lib, our will now load regardless of the distro
and version of lldb.

Issue #12098.
@mikem8361
Copy link
Member

I need to do some more testing but I have a simple fix for this that removes the explicit reference to the troublesome liblldb* (simple CMakeList.txt change). Because lldb itself has the correct liblldb already loaded this works. So far it works on Ubuntu 14.04 with lldb 3.9 and 3.8 on the machine the libsosplugin was built on and my Centos VM under lldb 3.8.

mikem8361 referenced this issue in mikem8361/coreclr Jun 7, 2017
Removing the explicit reference to liblldb. Since the lldb program has
already loaded this lib, our will now load regardless of the distro.

Issue #12098.
mikem8361 referenced this issue in dotnet/coreclr Jun 7, 2017
* Fix portable build sos plugin problems.

Removing the explicit reference to liblldb. Since the lldb program has
already loaded this lib, our will now load regardless of the distro
and version of lldb.

Issue #12098.

* Fix OSX build.
Petermarcu referenced this issue in dotnet/coreclr Jun 7, 2017
Removing the explicit reference to liblldb. Since the lldb program has
already loaded this lib, our will now load regardless of the distro.

Issue #12098.
@mikem8361
Copy link
Member

Fixed in master and 2.0.0.

@raffaeler
Copy link

From the user perspective, how can I know the version of lldb that should be installed on the system to be compatible with the libsosplugin.so?
For example using netcore 2.0.2:

  • Ubuntu17.1. Using ldd I can see it requires version 3.6. ok, it works.
  • CentOS. Using ldd I can't see anything related to lldb and if I use the latest (3.4 from yum) all the SOS commands make lldb crash (wrong lldb version then).

@micli
Copy link
Author

micli commented Nov 27, 2017

@raffaeler
For .NET Core version 1.x or 2.0.x
LLDB version should be 3.6
for .NET Core version 2.1.x
LLDB version should be 3.9

CentOS was really let me sad that a lot of default packages include LLDB and CMake are out of date. A way to install LLDB 3.6 on CentOS is download source code and compile by yourself. I cannot understand that Be a mirror of commercial copy of Red Hat, how could it be like this?

I can share you some commands to compile LLVM, Clang and LLDB 3.6/3.9 on your CentOS.

Compile and install CMake 3.8 first.

[centos-linux-7 ~]$ wget
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz
[centos-linux-7 ~]$ tar -xvf cmake-3.8.2.tar.gz
[centos-linux-7 ~]$ cd cmake-3.8.2
[centos-linux-7 ~]$ ./configure
[centos-linux-7 ~]$ make -j2
[centos-linux-7 ~]$ sudo make install

Get LLVM source code

[centos-linux-7 ~]$ wget http://llvm.org/releases/3.6.0/llvm-3.6.0.src.tar.xz
[centos-linux-7 ~]$ tar xvf llvm-3.6.0.src.tar.xz
[centos-linux-7 ~]$ mv llvm-3.6.0.src llvm
3rd support libraries
[centos-linux-7 ~]$ sudo yum install python-devel doxygen swig libxml2-devel ncurses-devel libedit-devel

Get Clang

[centos-linux-7 ~]$ cd llvm/tools
[centos-linux-7 tools]$ wget http://llvm.org/releases/3.6.0/cfe-3.6.0.src.tar.xz
[centos-linux-7 tools]$ tar xvf cfe-3.6.0.src.tar.xz
[centos-linux-7 tools]$ mv cfe-3.6.0.src clang

Get LLDB

[centos-linux-7 tools]$ wget http://releases.llvm.org/3.6.0/lldb-3.6.0.src.tar.xz
[centos-linux-7 tools]$ tar xvf lldb-3.6.0.src.tar.xz
[centos-linux-7 tools]$ mv lldb-3.6.0.src lldb

Compile and Install

[centos-linux-7 llvm]$ mkdir llvmbld
[centos-linux-7 llvm]$ cd llvmbld
[centos-linux-7 tools]$ cmake ../
[centos-linux-7 tools]$ cmake --build .
[centos-linux-7 tools]$ sudo /usr/local/bin/cmake --build . --target install

After a long time waiting(maybe some hours), you will get your LLDB 3.6 environment. Please notice that, you'd better have more than 4GB memory if you want to compile LLDB 3.9 in same way, or you will get some linking error at 64% and 78% on code compiling.

@raffaeler
Copy link

@micli wow, a lot of work then.
Than you very very much for the info which is definitely precious for all the community.

@Happi-cat
Copy link

Hi! Trying to use lldb for netcoreapp2.0 debugging on odroid-xu3 device (linux, armv7). Which version of lldb should I use? The problem is when Ive tried to use 3.9 got a segfault, when Ive tried to use 3.6 got a message:
dlopen(./libsos.so) failed ./libsos.so: undefined symbol: g_diagnostics

@mikem8361
Copy link
Member

If you are using version 2.0.x of coreclr/.NET Core, then the libsosplugin.so will only work/load with lldb 3.6.

If you are using/building 2.1 of coreclr (from the master branch) or one of the 2.1.0-preview's, then it you need to use lldb 3.9.

What command exactly are you using to load sos? libsos.so is loaded by libsosplugin.so which should be loaded with the "plugin load libsosplugin.so" command? If that is how you loaded sos, then what sos command did you execute to get the above error? It is very strange and doesn't make a lot of sense.

@Happi-cat
Copy link

@mikem8361 I'm using netcore 2.0.2

Here is how I'm running lldb with a dump from my app:

> lldb-3.6 ./MyApp --core core
(lldb) target create "./MyApp" --core "core"
Core file '/home/odroid/MyApp/core' (arm) was loaded.
(lldb) plugin load libsosplugin.so
(lldb) setclrpath ./
Set load path for sos/dac/dbi to './'
(lldb) r
There is a running process, kill it and restart?: [Y/n] n
(lldb) thread list
error: Process must be launched.
(lldb) clrthreads
dlopen(./libsos.so) failed ./libsos.so: undefined symbol: g_diagnostics
(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 1880 launching
Process 1880 launched: './MyApp' (arm)
(lldb) Segmentation fault

@mikem8361
Copy link
Member

mikem8361 commented Dec 2, 2017 via email

@Happi-cat
Copy link

Happi-cat commented Dec 5, 2017

@mikem8361 here is my app: 7z archive on dropbox

@sherlock1982
Copy link

Very easy way to reproduce netcore 2.0.3 on Stretch. Application is not needed.

> lldb-3.9
(lldb)  plugin load ./libsosplugin.so
(lldb) clru
(lldb)
Segmentation fault (core dumped)

@raffaeler
Copy link

@sherlock1982 3.9? version 3.9 works only from 2.1 on.
I used 3.6 on Ubuntu 17.1 and it worked perfectly.

Compiling lldb3.6 on CentOS failed, but this is another problem.

@micli
Copy link
Author

micli commented Dec 6, 2017

Yes, as I mentioned before. Net Core libsosplugin.so supports LLDB 3.9 start from 2.1.x,not 2.0.3.
With 2.0.3 you’d better use LLDB 3.6 or use LLDB 3.9 with libsosplugin.so compiled under.Net Core 2.1.x.

@Happi-cat
Copy link

@sherlock1982 in my case I'm trying to find out why my ASP.NET core app fails with segmentation fault on linux-arm (while simple console app works ok) and why I can't use sos commands. So it differs from your example.

@raffaeler
Copy link

@Happi-cat I had segmentation faults in the past (unrelated to lldb or sos) on ARM because of wrong packages. Also verify the bitness (arm32 vs arm64).

@Happi-cat
Copy link

@raffaeler thanks for advice, but there are few odd things I've faced:

  • was able to run app and got a segfault when did a first request (requested index page);
  • also did some other tests, like: ran app -> did a request to API method -> no segfault -> did a request to index -> no segfault. Then I did few other requests and after N attempts only got a segfault. So suspect there might be something with libuv or in a code that calls it.
  • was able to reproduce segfault on a bare razor template that was created with dotnet new razor on my ARM-device

@raffaeler
Copy link

@Happi-cat you are welcome.
I see your problem goes beyond the lldb story....

In my experience, I had a lot of trouble on ARM cards that use an old-fashioned style development stack like Yocto. The libraries they use are typically old (ehm ... consolidated).
For example I could run asp.net core on yocto but I received random problems (segfaults and more) on certain requests.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
MichalStrehovsky pushed a commit to MichalStrehovsky/runtime that referenced this issue Oct 12, 2020
After allocation objects in the large object heap needs to be published for some cleanups related to the background gc. The constant value for this limit (RH_LARGE_OBJECT_SIZE) was not loaded correctly in the register. This caused that the upper 32 bits of the register were in an undefined state. Therefore the check for large objects did practically always fail and the objects were never published. Therefore the cleanup never happened and the background GC did fail.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics-coreclr os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

9 participants