Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running time for perpare.sh #9

Closed
cailk opened this issue Apr 20, 2022 · 6 comments
Closed

running time for perpare.sh #9

cailk opened this issue Apr 20, 2022 · 6 comments

Comments

@cailk
Copy link

cailk commented Apr 20, 2022

Hi, thanks for your great work!

I'm trying to run the first command in 'prepare.sh',
CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 2 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False totol_epochs=1
which is used to generate the CLIP embeddings for precomputed proposals.
However, this process will take about 30 days with 8 16g-V100s. And in this issue #4, you already claim that it only takes one day. So I was wondering if I missed any details?

@dyabel
Copy link
Owner

dyabel commented Apr 20, 2022

Hi, thanks for your great work!

I'm trying to run the first command in 'prepare.sh', CUDA_VISIBLE_DEVICES=6,7 ./tools/dist_train.sh configs/lvis/detpro_ens_20e.py 2 --work-dir workdirs/collect_data --cfg-options model.roi_head.load_feature=False totol_epochs=1 which is used to generate the CLIP embeddings for precomputed proposals. However, this process will take about 30 days with 8 16g-V100s. And in this issue #4, you already claim that it only takes one day. So I was wondering if I missed any details?

Hi, do you modify the command the use 8 gpus? And make sure the total schedule is 1 epoch which should take about one day.

@cailk
Copy link
Author

cailk commented Apr 20, 2022

Yes, I have already changed the GPU numbers to 8, and the total_epochs is also reset to 1 in the command.
And it looks like the running time is still 30+ days after printing some float tensors.

0.20857863751051303
0.19100091827364554
0.1633187772925764
0.12694300518134716
0.2342857142857143
0.23985572587917042
2022-04-20 09:35:50,993 - mmdet - INFO - Epoch [1][50/7665]     lr: 1.978e-03, eta: 36 days, 13:22:29, time: 20.610, data_time: 2.224, memory: 8942, loss_rpn_cls: 0.6659, loss_rpn_bbox: 0.1318, loss_bbox: 0.0274, text_cl
s_loss: 2.6543, kd_loss: 7.1762, loss_mask: 1.5504, loss: 12.2060

@XiongweiWu
Copy link

@cailk totol -> total

@cailk
Copy link
Author

cailk commented Apr 21, 2022

@cailk totol -> total

Alright, I'm an idiot. Thank you for the reminder~

@cailk
Copy link
Author

cailk commented Apr 21, 2022

@cailk totol -> total

BTW, I'm still wondering why this embeddings generation process produces training loss. Shouldn't only forward computation is required for this?

@XiongweiWu
Copy link

XiongweiWu commented Apr 21, 2022

@cailk I guess it's an implementation issue. Personally speaking I prefer to generating the feature in test mode.

@dyabel dyabel closed this as completed Apr 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants