Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build failures, undefined references when building, Docker #62

Open
mureva opened this issue Apr 8, 2022 · 2 comments
Open

build failures, undefined references when building, Docker #62

mureva opened this issue Apr 8, 2022 · 2 comments

Comments

@mureva
Copy link

mureva commented Apr 8, 2022

I'm trying to build ADOP without Conda so I can run it on a remote machine - the only machine I have access to with powerful enough GPU - for which I need to run with a Docker container.

I have managed to build on my local machine, but no matter what settings I use on my trivial test dataset it fails to allocate memory on that machine's "meagre" 8GB 1070.

Following the same procedure that gave me success, I believe I've installed all relevant dependencies. The base container is a cuda enabled container based on Ubuntu 20.04., and I've installed cuda, cudnn8, pre-compiled libTorch with modern ABI (building torch has too many headaches itself), MKL, libjpeg, libpng, protobuf, protobuf-compiler, python3-dev, ninja-build, cmake 3.19.5. I've also enabled headless build.

When I used cuda 11.3 (which would match the current libtorch release), ADOP fails to build - or rather, when compiling PointRenderer.cu it stalls and remains on that step for > 24 hours.

When I use cuda 11.2 or 11.4 I can get all the way through compilation, but the linking stage produces undefined references to functions in your Saiga library, despite including the Saiga libraries on the compile command.

I've attached a file with the first linker error, and also my Dockerfile incase it can help - I suspect that I must be just missing some dependency, or have the wrong version of some dependency, given that I have one machine that did manage to build on, but I'm a bit stuck as to what it is now, so any help greatly appreciated.

ADOP-link-error.txt
Dockerfile-ADOP.txt

@Gatsby23
Copy link

I'm trying to build ADOP without Conda so I can run it on a remote machine - the only machine I have access to with powerful enough GPU - for which I need to run with a Docker container.

I have managed to build on my local machine, but no matter what settings I use on my trivial test dataset it fails to allocate memory on that machine's "meagre" 8GB 1070.

Following the same procedure that gave me success, I believe I've installed all relevant dependencies. The base container is a cuda enabled container based on Ubuntu 20.04., and I've installed cuda, cudnn8, pre-compiled libTorch with modern ABI (building torch has too many headaches itself), MKL, libjpeg, libpng, protobuf, protobuf-compiler, python3-dev, ninja-build, cmake 3.19.5. I've also enabled headless build.

When I used cuda 11.3 (which would match the current libtorch release), ADOP fails to build - or rather, when compiling PointRenderer.cu it stalls and remains on that step for > 24 hours.

When I use cuda 11.2 or 11.4 I can get all the way through compilation, but the linking stage produces undefined references to functions in your Saiga library, despite including the Saiga libraries on the compile command.

I've attached a file with the first linker error, and also my Dockerfile incase it can help - I suspect that I must be just missing some dependency, or have the wrong version of some dependency, given that I have one machine that did manage to build on, but I'm a bit stuck as to what it is now, so any help greatly appreciated.

ADOP-link-error.txt Dockerfile-ADOP.txt

Hey, have you solved this problem ? I have the same problem with you

@mureva
Copy link
Author

mureva commented Jul 1, 2022

I've managed to have the build complete using a Dockerfile posed by another user in a comment, with a couple of small adjustments. I've not tested that the build works yet, but at least it builds. See here for the original : I made two small changes, first the 'FROM' line: FROM nvidia/cuda:11.4.2-devel-ubuntu20.04 and then later on to change RUN ./install_pytorch.sh to RUN ./install_pytorch_source.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants