Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run integration tests via CUDA Execution Provider #41

Merged
merged 3 commits into from
Nov 25, 2023

Conversation

james-s-tayler
Copy link

@james-s-tayler james-s-tayler commented Nov 20, 2023

This was an absolute mission to figure out how to get working correctly, but I finally have the integration tests running in Docker via the CUDA execution provider :)

Some notes on this:

Initially I couldn't run OnnxStack on my linux development machine as it would fail citing "The ONNX Runtime extensions library was not found". I tried lots of stuff to get it working in my local environment and couldn't, so I decided I would file a bug report with OnnxRuntime themselves and to do so I would make a minimal reproduction of the issue in Docker. To my surprise it actually worked in Docker and I didn't get that error, so I implemented the first round of tests just using the Cpu Execution Provider.

When it came time to try and get the tests running inside the container and using the GPU I was tossing up between whether it would be better to use the dotnet base image and install the drivers into the container (not desirable IMO) or to use the nvidia/cuda base image and either install dotnet sdk into that or build a standalone executable. During this I discovered that while I didn't get the "The ONNX Runtime extensions library was not found" error in the dotnet base image in Docker I did get it in the nvidia/cuda one!

It was pretty tough to work out why though, and I tried every possible combination of shifting around NuGet package references, and changing project settings in the .csproj and whatnot assuming it was simply having difficulty due to a wrong setting or a package conflict somewhere. I knew that it was a runtime issue, and that it was failing to load the .so files, but after comparing the bin/Debug folders of the working version in the dotnet base image container, and the two failing versions in my local dev environment and the nvidia/cuda base image container I wasn't seeing any differences. So, that means it had to be an environmental difference.

I'm not familiar with debugging issues calling into native code from .NET, so I asked GPT-4 for any debugging strategies that could help with this vexing problem, and it recommended running ldd against the native binaries to reveal what dependencies they need. So, I tried that between my 1 working, and 2 non-working environments and that revealed the following:

working dotnet/sdk based container

Step 17/18 : RUN ldd Tests/bin/Debug/net7.0/linux-x64/libortextensions.so
 ---> Running in b00a57c0bc62
    linux-vdso.so.1 (0x00007ffdb69d7000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff07a2d4000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff07a2b2000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff07a295000)
    libssl.so.1.1 => /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007ff07a202000)
    libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007ff079f0e000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff079d3f000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff079bfb000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff079be1000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff079a0d000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff07a7c3000)

non-working local Pop_OS 23.04 development environment

me@pop-os:~/source/cuda-playground$ ldd Tests/bin/Debug/net7.0/linux-x64/libortextensions.so 
linux-vdso.so.1 (0x00007ffc025b4000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc01766e000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc017669000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc01764d000)
libssl.so.1.1 => not found
libcrypto.so.1.1 => not found
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc016c00000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc017564000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc017544000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc016800000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc017689000)

non-working nvidia/cuda based container

Step 17/18 : RUN ldd Tests/bin/Debug/net7.0/linux-x64/libortextensions.so
  ---> Running in a6d639d5ec48
    linux-vdso.so.1 (0x00007ffd577e2000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f014f36e000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f014f369000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f014f34d000)
    libssl.so.1.1 => not found
    libcrypto.so.1.1 => not found
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f014f11f000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f014f038000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f014f018000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f014edf0000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f014f85d000)

As you can see the two non-working environments are missing the dependency on SSL 1.1 binaries! Once I installed these into the container manually the problem went away.

Other things to note

  • I had to install nvidia-container-toolkit on my host system to get the GPU passthrough to the containers working.
  • When running watch -n1 nvidia-smi while running the tests nvidia-smi reports VRAM usage going as high as 23GB!
    • I'm wondering what the deal is with this?
    • It should be noted that all the tests run in sequence and not parallel, and unloading the model at the end of each test doesn't seem to affect it.
  • I don't know if this runs on Windows due to needing to install nvidia-container-toolkit into the host system, though presumably it works if you install that into WSL-2???

All in all the tests run quite a bit faster :)

@saddam213
Copy link
Member

saddam213 commented Nov 20, 2023

So CUDA works on windows if you install CUDA 11 and the toolkit, however the VRAM usage is 2x what DirectML is, which looks like you have confirmed on Linux

Using F16 models you can get the VRAM usage down to about 11GB, but the model load time took about 40-50 seconds on windows, not 2-3 seconds like DirectML

My initial tests were that CUDA may have been 10-20% faster, however the VRAM and load delays make this meaningless IMO

I did not investigate much further as it seems DOA to me

@saddam213 saddam213 merged commit 098b758 into TensorStack-AI:master Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants