Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss nan problem #142

Open
pi1ing opened this issue Nov 5, 2021 · 0 comments
Open

loss nan problem #142

pi1ing opened this issue Nov 5, 2021 · 0 comments

Comments

@pi1ing
Copy link

pi1ing commented Nov 5, 2021

Hello, thank you for your great job.
I tried to train DOTA dataset with the default cfgs(backbone: resnet_50), however got training result like this:

************************************************************
2021-11-05 09:18:35: global_step:20  current_step:20
per_cost_time:4.518s
refine_cls_loss_stage3:0.000
cls_loss:1364.121
refine_reg_loss:0.000
refine_reg_loss_stage3:0.000
reg_loss:2.277
refine_cls_loss:741079.375
total_losses:742445.750

************************************************************
2021-11-05 09:18:44: global_step:40  current_step:40
per_cost_time:0.234s
refine_cls_loss_stage3:0.000
cls_loss:nan
refine_reg_loss:0.000
refine_reg_loss_stage3:0.000
reg_loss:nan
refine_cls_loss:nan
total_losses:nan

by the way, I have one Gefore RTX 3080ti, the development environment uses the recommanded docker images, but the first
_, global_stepnp, summary_str = sess.run([train_op, global_step, summary_op])
took me 10 min to run, is it normal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant