-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with Docker Container #38
Comments
This is likely a duplicate of #29, which we have not yet been able to reproduce. |
This is so sad :( And very strange also, because I run two commands in wsl - nvidia-smi and nvcc --version, both show info without any errors. So I really don't understand why docker run with --gpus all fails |
What strange is that I tried another docker image, from nvidia, and it worked! No warnings, no errors.
ChatGPT was kindly helping me with all of that, and I will just copy here the reply: "It looks like the chenhsuanlin/colmap:3.8 image still has an issue, as indicated by the error about the libnvidia-ml.so.1 file. This could be a compatibility problem with how the image was constructed, or it might have something to do with specific libraries it's trying to leverage that are clashing with the WSL setup. However, the successful run of the CUDA 12.2.0 container is a promising sign that your Docker setup with NVIDIA GPUs is functioning correctly. If you're mainly looking to utilize GPU power inside Docker, you might be able to proceed with other Docker images that are compatible with your setup." |
WORKING!!
Basically, after properly installing everything related to cuda, I cloned git repo, didn't run the docker image but built it, and now it works without errors. Windows 11, WSL2 Ubuntu. Later I'll check how video processing works, hoping for the best :) |
Thanks @iam-machine for helping look into this! Could you elaborate more on what you had to go through to get it working? This would help others with similar issues as well. Unfortunately we don't have Windows machines so cannot test with WSL, so your experience would be very helpful to us as well! |
Also for additional context, what GPU models and driver versions did you have? |
@chenhsuanlin Of course, but first I will test the whole pipeline to work properly, and if it does - I will then describe what I've done to make it work :) Right now I already successfully completed the Data Preparation instructions, installed Blender with BlenderNeuralangelo add-on and completed steps there. Now, as I understand, I need to run another docker image (docker-neuralangelo), enter some code there from guide and do the Isosurface extraction. |
@chenhsuanlin Started training and stopped it somewhere at 1500 epoch, checked whether I have checkpoints saved and found nothing in neuralangelo folder. Nothing was saved. I checked config.yaml - it literally says that checkpoints must be created each 9999999999 epoch. Why it's so by default? I changed this to be 500, saved, everything really was saved, but after starting training something is going on that reverts back the config.yaml file. Hence, no matter how long I wait for training, I do not get checkpoint files. Strange. |
@iam-machine the checkpoints should be saved in the |
@chenhsuanlin Thank you, it helped! almost a week of hell and I made it work ot windows 11. Tested the whole pipeline, to the point where I exported .ply file from Meshlab to Maya as .obj. Wasn't able to export textured mesh, but I noticed that I used a bit older version of repo, so I updated it yesterday, new training is still running. Windows + wsl 2 + Ubuntu 22 (don't remember the exact version) + docker. The amount of errors I solved to make it work is insane. I'm not even a programmer, I'm a 3d artist, so that was really hell troubleshooting everything. I'm not able to describe my steps in details because I was working on it for like a week, from morning to night. Huge amount of work, it's just crazy. |
Thanks @iam-machine, great to know you finally got it working! Sorry you had to go through this trouble, I don't have a Windows + WSL to develop with so I couldn't really put together a formal doc. |
@chenhsuanlin By the way, when I exported the mesh I noticed it has too much polygons, if it would have 2 times less there would be no difference in details at all. Maybe there is some kind of parameter that regulates this? I don't mind doing auto retopo in Maya, but I just thought that using some parameter to lower the amount of polygons would speed up some processes. Though, the mesh extraction process is quick, the training is slow... And this exact second I thought that polygons have nothing to do with training speed 😂 |
Yes, the triangle count is controlled by the marching cubes resolution (argument |
@chenhsuanlin Got it, thanks :) What about the video resolution - are there any rules regarding that? I mean, I thought if I shoot video in 2k at least, maybe this way I would be able to get better mesh detailing, and get it even an earlier stages of checkpoint than when training on video with small resolution. No? Or maybe it's exactly opposite - the bigger the video resolution the slower the training, and the slower details reconstruction? |
Hi @iam-machine and @chenhsuanlin. I noticed this thread was open and I had a similar question so I am deciding it to post on here. Thank you :) |
@iam-machine in general it would be the latter -- the larger the video resolution, the more time it would take to recover the fine details. If you want to get more details at earlier stages, you can try increasing the hyperparameter |
@smandava98 thanks for the kind words! You can adjust the below hyperparameters correspondingly to optimization with fewer iterations:
Please also see #4 and the FAQ for details. |
Thanks @chenhsuanlin ! Just to clarify, no matter the range of hyper parameters I give (because I want to optimize for speed in my case), will the results at the very minimum be the same quality as instant NGP? From the paper it seems that is being used and it is being optimized over that (pls correct me if I'm wrong) |
@smandava98 this is not guaranteed (and we did not do a systematic study on this). Despite the backbone being the same, there is still a difference between the 3D representations (NeRF vs. neural SDF). Also, the |
Closing due to inactivity, please feel free to reopen if the issue persists. |
Hi! Your project is so amazing that even a person who knows nothing about coding (yes, that's me) decided to try it :) Expectedly, I had some problems getting everything to work. I use WSL2 on Windows 11, and when I run the script:
docker run -it chenhsuanlin/colmap:3.8 /bin/bash
I get the following warning about nvidia driver:
==========
== CUDA ==
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
But when I run the same with --gpus, like:
docker run --gpus all -it chenhsuanlin/colmap:3.8 /bin/bash
I get this:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/1140653493db5ca0f2b71b42c2194624b1e9e50bd0f9f72121bf836058a77900/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0000] error waiting for container:
What am I doing wrong?
The text was updated successfully, but these errors were encountered: