Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the training time on DTU #4

Open
leonwu0108 opened this issue Apr 13, 2023 · 2 comments
Open

Question about the training time on DTU #4

leonwu0108 opened this issue Apr 13, 2023 · 2 comments

Comments

@leonwu0108
Copy link

Hello,
First of all, thanks for your excellent work and code release!
When I was trying to repeat the experiments with the given docker environment on my local workstation (a single RTX 3090), I noticed that the training process (20w iters with default hyper-params) on a single DTU scan (dtu_scan24) took about 90mins to complete, which is much longer than the officially claimed training time in the paper conclusion part ($\approx$ 30mins). I'm curious if it was normal. Was it out of that the default parameters in the code were not the same as the parameters you used to measure the training time, or was it for some other reason? I'll be appreciate to get the answer.

@RaduAlexandru
Copy link
Owner

Hi there!

Thanks a lot for mentioning this issue! 90min is definitely way too long and indicates an issue somewhere.

There are a couple of things that may have contributed to the long training time:

  1. Please check that apex package was found by the training script. This will show up while training with a message that says has_apex True. This significantly speeds up training and it should be automatic with the provided docker but it's best to double check.
  2. If you want to get the fastest training it is also best to disable the viewer by using --no_viewer. The 3D viewer will hook onto the training loop and render at every frame which can also slow down things.
  3. Logging images on tensorboard by using the with_tensorboard: true flag in the train_permuto_sdf.cfg can also slow down things.
  4. Running with the default settings will assume that you have no mask for the images so the background will need to be modeled by a NeRF network. In the case of DTU you can enable the masked images with --with_mask.
  5. Recently I found a bug that was causing the GPU to stall at every iteration due to an unnecessary amount of copies to CPU for logging purposes. This has now been fixed so please pull the latest version.
  6. Another aspect that causes a GPU to CPU synchronization is the creation of the foreground ray samples. On the master branch the creation of samples will stall the GPU at this line. This is fixed in the async_create_samples_cleaned branch. However it is not yet merged since the code is quite difficult to follow and I'm still trying to find an elegant way to refactor it. Note that pulling this branch also requires to pull the latest version of the permutohedral_encoding package.

I hope there were no more performance regressions that have occurred due to refactoring so I will keep the issue open until I double check everything and also merge the async branch.

One tangential point to mention is that you can get significantly faster training by compressing the schedule using s_mult from here. You can set it for example to 0.5 to half the training time with almost no loss in accuracy for the vast majority of objects.

@Ice-Tear
Copy link

Ice-Tear commented Dec 1, 2023

Hello, I checked the setting.It looks right.
image
On the master branch, it took 28 minutes to converge (a single RTX 4090). I think the 4090 should be much faster than the 3090.
Do I need to switch from the master branch to the async_create_samples_cleaned branch? Is there a difference in reconstruction accuracy between these two branches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants