Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic cuda wrapper #1065

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

wkpark
Copy link
Contributor

@wkpark wkpark commented Feb 15, 2024

This is a proof-of-concept, quick and dirty, CUDA version independent, dymamic CUDA library wrapper fix.

  • cuda_setup/main.py compatible with older libraries. (older libraries mean libs with cuda version tags)
  • all used cublas*, cusparse*, cublasLt* functions are wrapped using dlsym()/GetProcessAddress()
  • minimize cuda runtime functions that are linked with cudart_static lib.

ubuntu-20.04 + cuda11.8 environment.

$ ldd venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda.so
        linux-vdso.so.1 (0x00007ffc44cc9000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f858d075000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f858d052000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f858d04c000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f858ce6a000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f858cd1b000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f858cd00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f858cb0c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f858f3fc000)

windows 10 + cuda11.8 environment (mingw64 terminal)

ldd bitsandbytes/libbitsandbytes_cuda.dll
       	ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7ffd23ed0000)
       	KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7ffd23650000)
       	KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7ffd217c0000)
       	msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7ffd21f10000)
       	ucrtbase.dll => /c/WINDOWS/System32/ucrtbase.dll (0x7ffd21de0000)
       	VCRUNTIME140.dll => /c/WINDOWS/SYSTEM32/VCRUNTIME140.dll (0x7ffd0d890000)
       	MSVCP140.dll => /c/WINDOWS/SYSTEM32/MSVCP140.dll (0x7ffd0d8b0000)
       	ole32.dll => /c/WINDOWS/System32/ole32.dll (0x7ffd23710000)
       	RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7ffd22730000)
       	combase.dll => /c/WINDOWS/System32/combase.dll (0x7ffd23a20000)
       	GDI32.dll => /c/WINDOWS/System32/GDI32.dll (0x7ffd22a20000)
       	win32u.dll => /c/WINDOWS/System32/win32u.dll (0x7ffd21ee0000)
       	gdi32full.dll => /c/WINDOWS/System32/gdi32full.dll (0x7ffd21c20000)
       	msvcp_win.dll => /c/WINDOWS/System32/msvcp_win.dll (0x7ffd21640000)
       	vcruntime140_1.dll => /c/Windows/System32/vcruntime140_1.dll (0x180000)
       	USER32.dll => /c/WINDOWS/System32/USER32.dll (0x7ffd23370000)
       	VCRUNTIME140_1.dll => /c/WINDOWS/SYSTEM32/VCRUNTIME140_1.dll (0x7ffd0d880000)
       	IMM32.DLL => /c/WINDOWS/System32/IMM32.DLL (0x7ffd22650000)

 * fix cmake version: default cmake version of ubuntu20.04 is too low
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@akx
Copy link
Contributor

akx commented Feb 15, 2024

Ah, this will definitely need to be rebased after #898 and #1041 get merged.

Comment on lines +171 to +179
if(NOT USE_CUDA_WRAPPER)
target_link_libraries(bitsandbytes PUBLIC CUDA::cublas CUDA::cusparse)
endif()
if(NO_CUBLASLT)
target_compile_definitions(bitsandbytes PUBLIC NO_CUBLASLT)
else()
target_link_libraries(bitsandbytes PUBLIC CUDA::cublasLt)
if(NOT USE_CUDA_WRAPPER)
target_link_libraries(bitsandbytes PUBLIC CUDA::cublasLt)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake has CMAKE_CUDA_RUNTIME_LIBRARY to control this; maybe would be better here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_RUNTIME_LIBRARY.html
docuement says CMAKE_CUDA_RUNTIME_LIBRARY only control cudart
(Currently, with or without this PR, cudart_static added by default)

@akx
Copy link
Contributor

akx commented Mar 4, 2024

I'm a little worried that the API or ABI could have changed between 11 and 12, causing very weird runtime errors if we just use symbols from v12 and call them like they were v11, or vice versa. 🤔

@Titus-von-Koeller
Copy link
Collaborator

Hey @wkpark @akx @matthewdouglas,

I'm hoping to merge this in the coming two weeks. One thing I feel we need to discuss is how we can test if this works as intended without introducing any issues:

Do you have any thoughts on that? I'm willing to do the work, but if you already have a few concrete things in mind, that would definitely help speed this up.

Thanks @wkpark on your initiative on this, really appreciated! Another thing needed would be to resolve the conflicts.

@Titus-von-Koeller
Copy link
Collaborator

P.S. What's your opinion on how this goes together with #1052? I don't fully grok that yet. From what I understand #1052 determines major CUDA version. Is #1052 dependent on this one here?

@matthewdouglas
Copy link
Contributor

matthewdouglas commented Apr 8, 2024

I'm not sure how I feel about this one yet. I will think about it a little more but my initial thought is that it seems to be quite hacky feeling. Maybe this is a common thing in C++ world, but I'm not used to seeing it.

Edit: This would definitely conflict with what I was thinking with #1126 too.

@akx
Copy link
Contributor

akx commented Apr 9, 2024

I also get a hacky/too-complex feeling here, but I'm not sure what the better option would be.

@Titus-von-Koeller
Copy link
Collaborator

Thanks for your valuable inputs ❤️ Really appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants