-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training issue #5
Comments
Hi, Can you please check the format of printing lr? Maybe the lr is too small. |
@Redaimao I have changed the following in your train_net.py file. (Changed according to the training details mentioned in the paper)
See below: parser.add_argument('--lr_init', type=float, default=0.001, help='learning rate for generator') It is running now, lets see if that problem comes again. But another problem is I am getting loss_bt = 0 from the very start of training. Why so? Is the model overfitted, or something else |
Hi, |
The format of printing the learning rate (lr) is as a string with the value of "0.001". See below: "Training: Epoch[{:0>3}/{:0>3}] Iteration[{:0>3}/{:0>3}] lr:{} Loss: {:.4f} Loss_pair: {:.4f} Loss_bt: {:.4f} Loss_grads: {:.4f} Loss_ssim: {:.4f} ".format(epoch + 1, opt.max_epoch, i + 1, len(train_loader), '0.001', loss_avg, both before and after changing the configuration, loss_bt is consistently zero, but this time lr does not become 0. See below the screenshot. (showing last epoch with some last iterations) Training: Epoch[020/020] Iteration[5370/5495] lr:1.330612450002547e-113 Loss: 0.3635 Loss_pair: 0.4194 Loss_bt: 0.0000 Loss_grads: 0.2198 Loss_ssim: 0.0601 |
这是因为你的lr下降的太快了导致无限接近于0 |
@Redaimao @Zhaohaojie4598 I am ruuning the model with 20 epochs, but after few iterations in the very first epoch, I am getting loss_bt=0. I am not able to understand the reason behind this. Please help. and second problem is I have set step size to 300 in the learning rate as my batch size is 8. See above, at 300th iteration, how does it become 0.00025? and immediately in the next iteration it multiplies by 0.5 and gives 0.0005. From where does 0.00025 come? Please reply. I am waiting for your response. Thank you |
I was training your model. I had run it for 20 epochs and set the training batch size to 5.
During training , i have seen that when it comes to 6th epoch at iteration 4660 out of 5495, the learning rate becomes 0.0 and it remain until the training finished. i. till 20th epoch.
and the last epoch results is
what is the reason behind this?
I have used all the default values, nothing changed.
Any help will be appreciated.
Thanks
The text was updated successfully, but these errors were encountered: