running time for perpare.sh #9

cailk · 2022-04-20T02:07:16Z

Hi, thanks for your great work!

I'm trying to run the first command in 'prepare.sh',
CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 2 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False totol_epochs=1
which is used to generate the CLIP embeddings for precomputed proposals.
However, this process will take about 30 days with 8 16g-V100s. And in this issue #4, you already claim that it only takes one day. So I was wondering if I missed any details?

dyabel · 2022-04-20T02:23:06Z

Hi, thanks for your great work!

I'm trying to run the first command in 'prepare.sh', CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 2 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False totol_epochs=1 which is used to generate the CLIP embeddings for precomputed proposals. However, this process will take about 30 days with 8 16g-V100s. And in this issue #4, you already claim that it only takes one day. So I was wondering if I missed any details?

Hi, do you modify the command the use 8 gpus? And make sure the total schedule is 1 epoch which should take about one day.

cailk · 2022-04-20T09:41:46Z

Yes, I have already changed the GPU numbers to 8, and the total_epochs is also reset to 1 in the command.
And it looks like the running time is still 30+ days after printing some float tensors.

0.20857863751051303
0.19100091827364554
0.1633187772925764
0.12694300518134716
0.2342857142857143
0.23985572587917042
2022-04-20 09:35:50,993 - mmdet - INFO - Epoch [1][50/7665]     lr: 1.978e-03, eta: 36 days, 13:22:29, time: 20.610, data_time: 2.224, memory: 8942, loss_rpn_cls: 0.6659, loss_rpn_bbox: 0.1318, loss_bbox: 0.0274, text_cl
s_loss: 2.6543, kd_loss: 7.1762, loss_mask: 1.5504, loss: 12.2060

XiongweiWu · 2022-04-21T02:09:14Z

@cailk totol -> total

cailk · 2022-04-21T09:36:51Z

@cailk totol -> total

Alright, I'm an idiot. Thank you for the reminder~

cailk · 2022-04-21T11:15:48Z

@cailk totol -> total

BTW, I'm still wondering why this embeddings generation process produces training loss. Shouldn't only forward computation is required for this?

XiongweiWu · 2022-04-21T15:16:31Z

@cailk I guess it's an implementation issue. Personally speaking I prefer to generating the feature in test mode.

dyabel closed this as completed Apr 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running time for perpare.sh #9

running time for perpare.sh #9

cailk commented Apr 20, 2022

dyabel commented Apr 20, 2022

cailk commented Apr 20, 2022

XiongweiWu commented Apr 21, 2022

cailk commented Apr 21, 2022

cailk commented Apr 21, 2022

XiongweiWu commented Apr 21, 2022 •

edited

Loading

running time for perpare.sh #9

running time for perpare.sh #9

Comments

cailk commented Apr 20, 2022

dyabel commented Apr 20, 2022

cailk commented Apr 20, 2022

XiongweiWu commented Apr 21, 2022

cailk commented Apr 21, 2022

cailk commented Apr 21, 2022

XiongweiWu commented Apr 21, 2022 • edited Loading

XiongweiWu commented Apr 21, 2022 •

edited

Loading