Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training of LR stage #15

Open
Mayongrui opened this issue Oct 4, 2022 · 5 comments
Open

Training of LR stage #15

Mayongrui opened this issue Oct 4, 2022 · 5 comments

Comments

@Mayongrui
Copy link

Hi there, I cannot train the network and converge to the numeric metric values reported in the manuscript in the SR stage. All settings and experiments are performed for x4 SR.

Setting A. As you commented in #11, I changed the corresponding codes and retrained the network for the pretraining stage. The network converged as expected, and the validation PSNR was around 24.5 dB on DIV2K validation set, which seemed reasonable. Then, I further trained the network for the pretraining stage, however, cannot reproduce the results reported in the paper. The best PSNR/SSIM/LPIPS loss was 21.85/0.5813/0.3724 at 350K iterations, respectively.

Setting B. To locate the problem, I trained the network for SR stage with the default options file and HRP pretrained weights of this repo. However, also converged to a very similar number with Setting A.

Would you mind giving me any suggestions or guidance about this issue?

Some information may help:

  1. The code to generate synthetic testing images:https://drive.google.com/file/d/1k4063h7KHKf5x5firP9FFzG0nIzGTv6s/view?usp=sharing
  2. The generated testing images: https://drive.google.com/drive/folders/1UDodF_0BcnU3KeCd7UqTQP0fHiufbrff?usp=sharing

Setting A:

  1. pretrain stage files and results: https://drive.google.com/drive/folders/15ser2Fvk0DFx-V0mv33Dj9hur92TJxBD?usp=sharing
  2. SR stage files and results: https://drive.google.com/drive/folders/1_wvSpzwgG4cT3uczlsFJDGPOB0AOg2En?usp=sharing

Setting B:

  1. pretrain weights: https://drive.google.com/drive/folders/1g3NsDoEnUvKIzw-fZPHtu5o4G7Znmx8A?usp=sharing
  2. SR stage files and results: https://drive.google.com/drive/folders/1aayOT3xDUicuCM5eGvChryrfO_1AGtbg?usp=sharing

Fullfile: https://drive.google.com/drive/folders/1MLPoIYXvWODhevk8ICSAmPHj0DP0PF-k?usp=sharing

@chaofengc
Copy link
Owner

chaofengc commented Oct 7, 2022

Hi, I retrained the model with Setting B in the past two days and it works fine. With our generated test images, it reach the best PSNR/SSIM/LPIPS - 22.43/0.5909/0.3437 at 160k iter, and also achieves LPIPS score with 0.3557 in your tested images. It is supposed to reach similar performance as the paper with longer training. The training log is updated to wandb for your reference. There might be several reasons for your problem:

  • Due to different random process, your generated test images are different, and seems to be more difficult than my testing images. I have tested your generated images with the provided best model, and it reads 0.342 LPIPS.
  • Please make sure that the pyiqa package you installed >=0.1.4. Although lower version does not give errors, it actually does not support perceptual loss backward. Note: I made this mistake in 008_FeMaSR_HQ_stage, and the initialization fix in Training details #11 might be not necessary (would be quite nice if you could help to verify this)
  • Please make sure that your pretrained HRP gives similar (or better) reconstruction performance as the provided one.

@Mayongrui
Copy link
Author

Thanks for the response.

  • I rechecked my environment, and the pyiqa version was v0.1.4;
  • To minimize the performance gap, would you mind releasing the degraded validation images and the code for generating the training set?
  • In my experiments, the correct initialization is crucial during HRP training. Without that, HRP cannot work as expected, producing color shift (over-yellow or over-red) with low PSNR/SSIM/LPIPS. This issue did not happen after the initialization bug was fixed.

@chaofengc
Copy link
Owner

chaofengc commented Oct 8, 2022

The training images are generated with degradation_bsrgan and testing images are generated with degradation_bsrgan_plus, using the provided script generate_dataset.py. We did not make any changes to these codes. Please note that my retrained model also works fine on your test images, therefore data generation is unlikely to be the problem. If your model did not work well on your own test images (achieve similar performance as the released model, i.e., 0.342 LPIPS), it is unlikely to work on our test images.

Another difference is that, we generate the training dataset offline to speed up the training. Since the degradation space of bsrgan is quite large, generating the images online and training the model with a small batch size may cause problem.

You may try to first synthesize the LR images offline, which would make the model training easier.

# type: BSRGANTrainDataset
# dataroot_gt: ../datasets/HQ_sub
type: PairedImageDataset
dataroot_gt: ../datasets/HQ_sub
dataroot_lq: ../datasets/LQ_sub_X4

@Mayongrui
Copy link
Author

Does the offline preprocessing for training set generation include some other enhancement? like 0.5~1.0 scaling before passing to the degradation model described in the manuscript?

@chaofengc
Copy link
Owner

chaofengc commented Oct 8, 2022

No, resize at the beginning will further enlarge the degradation space. This might also be the problem in current online mode, you can try to set use_resize_crop to false when using the BSRGANTrainDataset :

use_resize_crop: true

In fact, we did not verify whether such random scaling brings improvement or deterioration to the performance in offline mode either. We released the same setting as the paper to reproduce our results. Since random scaling is already performed in degradation_bsrgan, further scaling may not be necessary. You may try to verify its influence if you have enough GPUs.

In a word, keep a proper degradation space can ease the difficulty of model training. Otherwise, you may need much more computation resources similar to the training of BSRGAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants