Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the training cost #5

Closed
ewrfcas opened this issue Oct 28, 2021 · 2 comments
Closed

Question about the training cost #5

ewrfcas opened this issue Oct 28, 2021 · 2 comments

Comments

@ewrfcas
Copy link

ewrfcas commented Oct 28, 2021

Thanks for your great works!

I am retraining AA-RMVSNet in DTU dataset with default settings on one V100 32GB GPU.
But it cost about 17GB for batchsize=1, and batchsize=2 will cause OOM problem.

It is really strange because in the paper, batchsize=4 costs only 20.16GB. Besides, the depth_num is set as 192 in the paper, while it is just 150 in the default setting.

Another question is that the training is very slow. It cost about 4.6s for one step of batch=1.
image

Can you provide any advice on it?

@QT-Zhu
Copy link
Owner

QT-Zhu commented Oct 28, 2021

Hello. As illustrated in the paper, we use 4 GPUs for training, and the resulting total batchsize is 4. Since you are attempting to put a batch of 4 on only one GPU, it is natural to get OOM. You can adjust the depth division according to the hint in train_dtu.sh by modifying both d and interval_scale.

And in terms of training speed, since RNN has to process one slice by another, your training speed seems right to me.

@ewrfcas
Copy link
Author

ewrfcas commented Oct 28, 2021

Thanks for your reply.

@ewrfcas ewrfcas closed this as completed Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants