New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inf in attached conv layers #8
Comments
It seems to be a float precision problem. Maybe you can try to run the training in FP32 mode? |
thanks for the advice, I am trying to disable fp16 training by commenting this line in the config file.
And then I remove the
Did I miss something? |
Could you show me your model config? |
|
I'm guessing the reason why the Inf occurs during the training. I check the log, when the Inf happened, the lr was high (almost 1e-3) at that moment. Probably the large LR caused the weights to become larger during bp, and eventually, the feats value goes overflown in FP16. |
There is a small difference between: |
thanks, I will try the v2. BTW, have you tried not to use fp16 for training? |
I do not try fp32 in these configs, but I try fp32 in our customized model and I do not see any problems. |
@chyohoo Hi, i also try to use SST to train on nuScenes dataset, could you please share the results with nuScenes dataset? Thanks! |
Hi, recently I used SST to train on nuScenes dataset. Everything worked fine in the beginning. But after several epchos, I got nan in bbox_loss and dir_los. And I found that the loss is caused by inf output from the attached conv layer at the end of the sstv1, where the output from recover_bev is fp32 and the intermediate feature maps from conv2d in attached_conv output fp16 values, with the training going on the output values becoming inf.
I print the weights of conv which is normal.
I tried to clamp the inf value in the feature map, but inf value occurs in the following layers,
what could be wrong?
The text was updated successfully, but these errors were encountered: