Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload pip package to pypi with pre-compiled cuda kernels. #13

Closed
blooop opened this issue Nov 1, 2023 · 4 comments
Closed

Upload pip package to pypi with pre-compiled cuda kernels. #13

blooop opened this issue Nov 1, 2023 · 4 comments
Assignees

Comments

@blooop
Copy link
Contributor

blooop commented Nov 1, 2023

I made curobo into a pip package using:

python setup.py bdist_wheel

and uploaded it to a local pip registry so that I'm now able to:

pip install nividia_curobo

on different repos without needing to compile it each time.

However, when I use the pip installed curobo wheel I need to jit compile the cuda kernels every time. The output looks like:

kinematics_fused_cu not found, JIT compiling...
geom_cu binary not found, jit compiling...
lbfgs_step_cu not found, JIT compiling...
line_search_cu not found, JIT compiling...
tensor_step_cu not found, jit compiling...
  1. Do you know how to fix this? I have done some basic searching online but not found anything helpful.
  2. Can you publish curobo on pypi to make it easier for anyone to use curobo?

Thanks

@balakumar-s
Copy link
Collaborator

balakumar-s commented Nov 1, 2023

Here is how we create a pip package. Use either a docker with pytorch or a python environment with torch already installed and run the below commands:

python3 -m pip install build
cd curobo && python3 -m build --no-isolation

This will create the .whl file that you can host in your pip registry.

Let me know if this works. You might have to install venv if it's not already installed.

We are looking at putting on pypi.

@blooop
Copy link
Contributor Author

blooop commented Nov 2, 2023

Thanks. Those steps produce the wheel, but the jit compile steps don't go away. Its not really that big of a deal though, I can wait for an official pip as this temporary solution works well enough.

@balakumar-s
Copy link
Collaborator

You can also reduce the compilation time by setting this environment variable export TORCH_CUDA_ARCH_LIST="7.0+PTX" to only compile for one architecture with forward compatability.

@balakumar-s balakumar-s added the fix_in_progress We are working on a fix and update once it's resolved label Dec 22, 2023
@balakumar-s balakumar-s removed the fix_in_progress We are working on a fix and update once it's resolved label Aug 19, 2024
@balakumar-s
Copy link
Collaborator

We are deferring this to a later time.

@balakumar-s balakumar-s closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants