Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue about resume #34

Open
sujyQ opened this issue Aug 30, 2021 · 3 comments
Open

issue about resume #34

sujyQ opened this issue Aug 30, 2021 · 3 comments

Comments

@sujyQ
Copy link

sujyQ commented Aug 30, 2021

Hi.

There's a problem when resume training.

I tried to restart training DASR using this :

python main.py --dir_data='my/path' \
               --model='blindsr' \
               --scale='4' \
               --blur_type='aniso_gaussian' \
                --noise=25.0 \
               --lambda_min=0.2 \
               --lambda_max=4.0 \
               --start_epoch=157\
               --resume=157\

The problem is that contrastive loss gets bigger.
I think parameters of encoder for degradation representation can't be loaded.

[Epoch 158]	Learning rate: 1.00e-4
Epoch: [0158][6400/31050]	Loss [SR loss: 9.753 | contrastive loss: 0.892 ]	Time [ 145.0 s]
Epoch: [0158][12800/31050]	Loss [SR loss: 9.747 | contrastive loss: 0.920 ]	Time [ 143.7 s]
Epoch: [0158][19200/31050]	Loss [SR loss: 9.722 | contrastive loss: 0.918 ]	Time [ 144.1 s]
[Epoch 158]	Learning rate: 1.00e-4
Epoch: [0158][6400/31050]	Loss [SR loss: 9.598 | contrastive loss: 7.457 ]	Time [ 145.2 s]
@LongguangWang
Copy link
Member

Hi @sujyQ, we will fix this bug in an upcoming update.

@sujyQ
Copy link
Author

sujyQ commented Sep 21, 2021

Hi @LongguangWang , I think here is the problem.

When set strict=True,
Traceback (most recent call last): File "test.py", line 19, in <module> model = model.Model(args, checkpoint) File "/home/hsj/d_drive/hsj/hsj/DASR_DDF/model/__init__.py", line 35, in __init__ cpu=args.cpu File "/home/hsj/d_drive/hsj/hsj/DASR_DDF/model/__init__.py", line 104, in load strict=True File "/home/hsj/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BlindSR: Missing key(s) in state_dict: "E.queue", "E.queue_ptr", "E.encoder_k.E.0.weight", "E.encoder_k.E.0.bias", "E.encoder_k.E.1.weight", "E.encoder_k.E.1.bias", "E.encoder_k.E.1.running_mean", "E.encoder_k.E.1.running_var", "E.encoder_k.E.1.num_batches_tracked", "E.encoder_k.E.3.weight", "E.encoder_k.E.3.bias", "E.encoder_k.E.4.weight", "E.encoder_k.E.4.bias", "E.encoder_k.E.4.running_mean", "E.encoder_k.E.4.running_var", "E.encoder_k.E.4.num_batches_tracked", "E.encoder_k.E.6.weight", "E.encoder_k.E.6.bias", "E.encoder_k.E.7.weight", "E.encoder_k.E.7.bias", "E.encoder_k.E.7.running_mean", "E.encoder_k.E.7.running_var", "E.encoder_k.E.7.num_batches_tracked", "E.encoder_k.E.9.weight", "E.encoder_k.E.9.bias", "E.encoder_k.E.10.weight", "E.encoder_k.E.10.bias", "E.encoder_k.E.10.running_mean", "E.encoder_k.E.10.running_var", "E.encoder_k.E.10.num_batches_tracked", "E.encoder_k.E.12.weight", "E.encoder_k.E.12.bias", "E.encoder_k.E.13.weight", "E.encoder_k.E.13.bias", "E.encoder_k.E.13.running_mean", "E.encoder_k.E.13.running_var", "E.encoder_k.E.13.num_batches_tracked", "E.encoder_k.E.15.weight", "E.encoder_k.E.15.bias", "E.encoder_k.E.16.weight", "E.encoder_k.E.16.bias", "E.encoder_k.E.16.running_mean", "E.encoder_k.E.16.running_var", "E.encoder_k.E.16.num_batches_tracked", "E.encoder_k.mlp.0.weight", "E.encoder_k.mlp.0.bias", "E.encoder_k.mlp.2.weight", "E.encoder_k.mlp.2.bias".
occurs.

@tongchangD
Copy link

How did you solve this problem
#34 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants