Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get same result on multiple runs ? #21

Closed
Dongwoo-Im opened this issue Sep 25, 2022 · 5 comments
Closed

How to get same result on multiple runs ? #21

Dongwoo-Im opened this issue Sep 25, 2022 · 5 comments

Comments

@Dongwoo-Im
Copy link

Dongwoo-Im commented Sep 25, 2022

Hi, thanks for your sharing and contribution!

I tried to reproduce same result about training loss on my custom dataset, but it didn't work.

So, I wonder if HAT can return the exactly same result about training loss.

Any help would be much appreciated, thanks.

My environments

  • windows 10
  • python : 3.7.13
  • pytorch : 1.12.1+cu113
  • torchvision : 0.13.1+cu113
  • cuda : 11.3
  • cudnn : 8.4.1
  • basicsr : both 1.3.4.9 and 1.4.2 (latest version)

Methods I tried

  • use_hflip = False
  • use_rot = False
  • use_shuffle = False
  • num_worker_per_gpu = 0
@Dongwoo-Im
Copy link
Author

Dongwoo-Im commented Sep 25, 2022

When I set print_freq to 1, we can check that there's some negligible difference in loss

image

@Dongwoo-Im Dongwoo-Im changed the title HAT can return the exact same result about training loss ? HAT can return the exactly same result about training loss ? Sep 25, 2022
@chxy95
Copy link
Member

chxy95 commented Sep 26, 2022

I do not understand your question. But you can refer to the original training log of HAT for SRx4 on DF2K.
train_416_train_Final_Model_SRx4_scratch_DF2K_CR3W16P64_500k_B4G8_20220301_002049.log

@Dongwoo-Im
Copy link
Author

Dongwoo-Im commented Sep 26, 2022

@chxy95 Thanks for quick response.
When I run same yaml file twice on my custom dataset, loss is not same. Maybe this is caused by some non-deterministic algorhitm in HAT. Anyway I want to reproduce same result on same yaml file, but I do not know how to do it.

@Dongwoo-Im Dongwoo-Im changed the title HAT can return the exactly same result about training loss ? How to get same result on multiple runs ? Sep 26, 2022
@chxy95
Copy link
Member

chxy95 commented Sep 27, 2022

@Dongwoo-Im I'm not sure the reason for this either. The ONLY random factor in HAT appears to be drop_path_rate. It should have no effect on the performance. You can set it to 0 and observe the training process. If it does not work, there may be some random factors that the current version of BasicSR does not take into account.

@Dongwoo-Im
Copy link
Author

@chxy95 Thanks your advice. I haven't solved yet, but I'll re-open when I get some insights or find solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants