Question about the training cost #5

ewrfcas · 2021-10-28T02:56:25Z

Thanks for your great works!

I am retraining AA-RMVSNet in DTU dataset with default settings on one V100 32GB GPU.
But it cost about 17GB for batchsize=1, and batchsize=2 will cause OOM problem.

It is really strange because in the paper, batchsize=4 costs only 20.16GB. Besides, the depth_num is set as 192 in the paper, while it is just 150 in the default setting.

Another question is that the training is very slow. It cost about 4.6s for one step of batch=1.

Can you provide any advice on it?

QT-Zhu · 2021-10-28T03:22:27Z

Hello. As illustrated in the paper, we use 4 GPUs for training, and the resulting total batchsize is 4. Since you are attempting to put a batch of 4 on only one GPU, it is natural to get OOM. You can adjust the depth division according to the hint in train_dtu.sh by modifying both d and interval_scale.

And in terms of training speed, since RNN has to process one slice by another, your training speed seems right to me.

ewrfcas · 2021-10-28T06:05:52Z

Thanks for your reply.

ewrfcas closed this as completed Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the training cost #5

Question about the training cost #5

ewrfcas commented Oct 28, 2021 •

edited

QT-Zhu commented Oct 28, 2021 •

edited

ewrfcas commented Oct 28, 2021

Question about the training cost #5

Question about the training cost #5

Comments

ewrfcas commented Oct 28, 2021 • edited

QT-Zhu commented Oct 28, 2021 • edited

ewrfcas commented Oct 28, 2021

ewrfcas commented Oct 28, 2021 •

edited

QT-Zhu commented Oct 28, 2021 •

edited