Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation runs into CUDA problem #212

Closed
cap-jmk opened this issue Feb 18, 2022 · 13 comments
Closed

Installation runs into CUDA problem #212

cap-jmk opened this issue Feb 18, 2022 · 13 comments

Comments

@cap-jmk
Copy link

cap-jmk commented Feb 18, 2022

The bug reported at rusty1s/pytorch_sparse#180 and pyg-team/pytorch_geometric#4095 propagates to deeptime, too. Fixing PyTorch to older versions in the setup might help.

@clonker
Copy link
Member

clonker commented Feb 18, 2022

This seems to be a problem related to incompatible pytorch and pytorch_sparse versions, here we only depend on pytorch and that only weakly; there is no explicit dependency.
In that sense I am a bit uncomfortable fixing a version in the setup as it would introduce a hard link.

@cap-jmk
Copy link
Author

cap-jmk commented Feb 19, 2022

I experienced the error while installing deeptime in an isolated conda environment on the newest Ubuntu. As pip was pulling the default PyTorch, the error occured for plain PyTorch, too. The error occurs also on Colab when using PyTorch. From what I know, the error does not depend on a Python package but rather on CUDA compilation and is thus independent of a specific Python package. Anyhow, one can't use deeptime in that case and I thus recommend fixing the error.

@clonker
Copy link
Member

clonker commented Feb 19, 2022

That is very odd, deeptime is not supposed to pull pytorch at all. Can you try again in an isolated environment and paste the output here?
If you have a look here you can see that pytorch is only an "extras" dependency, so a mere pip install deeptime shouldn't pull it.
Here is what you can run to check the installed dependencies of a pip package (example output for a test installation of mine):

~  pip show deeptime
Name: deeptime
Version: 0.4.1
[...]
Requires: numpy, scikit-learn, scipy, threadpoolctl
Required-by:

Please let me know what you find, thanks!

@cap-jmk
Copy link
Author

cap-jmk commented Feb 21, 2022

However, for deeptime it is required to install torch. Maybe you can try reproducing the error with pulling default torch on a CUDA machine with CUDA 11.1. While the bug is present, I think the user will wonder why they can't use the full functionality of deeptime, or why the import of deeptime fails at all. Maybe the user will think the library is faulty and skip using it.

@clonker
Copy link
Member

clonker commented Feb 21, 2022

Ah now I see what you mean - I think it's a good idea to catch such an import error. 🙂 Fixing the version in the setup doesn't seem very sensible to me though, as we do not depend on pytorch.
Here is what happens: deeptime checks if pytorch is installed and if so, imports certain deep learning submodules. I will add a check if torch could successfully imported rather than just checking whether the namespace is available.

@cap-jmk
Copy link
Author

cap-jmk commented Feb 21, 2022

Great fix 🚀
Under the hood of the torch bug, I realized another, similar bug, too. When installing from pip, the package does not always have the right c++ compilation in the numerical module. Installing from conda works, though. The bug looks similar, to the other one.

undefined symbol:  _ZNSt15__exception_ptr13exception_ptr10_M_releaseE

Ref: pybind/pybind11#3623

I am not sure if it is worth fixing at all. Just wanted to report in case there is some inconsistency in the distributions.

@clonker
Copy link
Member

clonker commented Feb 21, 2022

Ah thank you for bringing it to my attention! That is one of the drawbacks of using a sdist over a binary distribution with pip. On the other hand I do like that it is compiled locally. Basically a toolchain setup problem... not sure how one would even go about fixing that. Aside from using a binary distribution of course :)

@cap-jmk
Copy link
Author

cap-jmk commented Feb 21, 2022

From the user perspective, I think it is whatever floats the boat. When building packages that have deeptime as dependency, it would be useful to be able to reliably pull it from pip. Otherwise, the distribution for the new package via PyPi is somewhat having the same problem, and the bug would propagate forever…
If the faulty behaviour is present, one could also redirect the user to the conda build or provide additional instructions. Maybe a simple test during the setup procedure helps to decide what to do. What do you think?

Conda is not an option in each environment.

@clonker
Copy link
Member

clonker commented Mar 2, 2022

Hey @MQSchleich, I've been experimenting a bit with CMake as dominant build system, I'd imagine it is a bit more robust with respect to incompatible toolchains. Also the initial pytorch issue should be fixed on the brach of PR #215 - if you'd like and have some time I'd appreciate if you can try it out and see if the problem persists.

@cap-jmk
Copy link
Author

cap-jmk commented Mar 7, 2022

@clonker, did you upload it to PyPi, yet? I tried it out on the problematic machine, and it did indeed persist...

@clonker
Copy link
Member

clonker commented Mar 7, 2022

No it's not on pypi yet, you'll have to run the setup from the remote:

pip install git+https://github.com/deeptime-ml/deeptime.git@main

@clonker
Copy link
Member

clonker commented Apr 12, 2022

Ping on this one, with the new version it should also work via pip install deeptime.

@clonker
Copy link
Member

clonker commented Aug 18, 2022

I assume this is either no longer an issue or abandoned, please feel free to reopen otherwise. :)

@clonker clonker closed this as completed Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants