How many gpu days does the training procedure of AVQA take? #8

Rainlt · 2023-03-23T09:14:44Z

Hello, I'm intrested in your pretty work and trying to reproduce the result. But I found that I have to spend nearly 5 gpu days to train for the AVQA task on 1 3090 gpu. Is this normal?
This is the recorded time during training:

feature Embed time:  0.0016405582427978516                                                                                                                
time for posi encode:  0.26480627059936523                                                                                                                
time for nega encode:  0.09544777870178223                                                                                                                
time for grounding:  0.009487152099609375                                                                                                                 
time for result:  0.0050661563873291016

It can be seen that encoding one audio and positive visual sample using swin transformer with adapter spend 0.2s. So it will take 2 gpu days just to encode the positive feature for 30 epochs.

The text was updated successfully, but these errors were encountered:

GenjiB · 2023-03-24T12:48:13Z

Actually, I got the best results ~~13 epochs. It took about 1~~2 day with one A5000, You can also try not to use positive and negative sampling (we did not study the effectiveness of the sampling. But, I believe the results would be similar).

Rainlt · 2023-03-24T15:41:39Z

can also try not to use positive and negative sampling (we did not study the effectiveness of the sampling. But, I believe the results would be similar)

Thanks for your kind response. I can see your result is 77 in the end of epoch 13, but I can only reach 76.0 there. I have some idea about that. Could you please check something for me:

Did you change the random seed? (default is 1)
Have you modified the audio wave? Because some of them is shorter than 60 seconds, so I have padded them to 60s as the original paper said.
There are some bugs about the dimension of image and feature. I have fixed them selfishly but for I have also add lots of note in the code and I'm not proficient in git, so I didn't pull requests. Eg. The swin_v2 need the input with dim [192, 192], so the Resize function in line 86 in dataloader_avst.py should be changed from [224, 224] to [192, 192]. Another, maybe the f_v should be assigned to visual_posi in line 375 in net_avst.py

Rainlt · 2023-03-24T15:53:11Z

Actually, I got the best results ~~13 epochs. It took about 1~~2 day with one A5000, You can also try not to use positive and negative sampling (we did not study the effectiveness of the sampling. But, I believe the results would be similar).

Oh, another, have you pretrained the grounding module as the original code? I haven't done it and I thought that for the backbone have been changed, the pretrained parameter may be useless, so I commented out the loading code in line 227 in main_avst.py. I think this may be the main cause of this difference!

GenjiB · 2023-03-25T03:41:10Z

@Rainlt Thanks for pointing that out. I found we did use the pretrained grounding module. Gonna fix this bug soon.

GenjiB closed this as completed Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How many gpu days does the training procedure of AVQA take? #8

How many gpu days does the training procedure of AVQA take? #8

Rainlt commented Mar 23, 2023

GenjiB commented Mar 24, 2023

Rainlt commented Mar 24, 2023

Rainlt commented Mar 24, 2023 •

edited

Loading

GenjiB commented Mar 25, 2023

How many gpu days does the training procedure of AVQA take? #8

How many gpu days does the training procedure of AVQA take? #8

Comments

Rainlt commented Mar 23, 2023

GenjiB commented Mar 24, 2023

Rainlt commented Mar 24, 2023

Rainlt commented Mar 24, 2023 • edited Loading

GenjiB commented Mar 25, 2023

Rainlt commented Mar 24, 2023 •

edited

Loading