issue about resume #34

sujyQ · 2021-08-30T05:43:06Z

Hi.

There's a problem when resume training.

I tried to restart training DASR using this :

python main.py --dir_data='my/path' \
               --model='blindsr' \
               --scale='4' \
               --blur_type='aniso_gaussian' \
                --noise=25.0 \
               --lambda_min=0.2 \
               --lambda_max=4.0 \
               --start_epoch=157\
               --resume=157\

The problem is that contrastive loss gets bigger.
I think parameters of encoder for degradation representation can't be loaded.

[Epoch 158]	Learning rate: 1.00e-4
Epoch: [0158][6400/31050]	Loss [SR loss: 9.753 | contrastive loss: 0.892 ]	Time [ 145.0 s]
Epoch: [0158][12800/31050]	Loss [SR loss: 9.747 | contrastive loss: 0.920 ]	Time [ 143.7 s]
Epoch: [0158][19200/31050]	Loss [SR loss: 9.722 | contrastive loss: 0.918 ]	Time [ 144.1 s]
[Epoch 158]	Learning rate: 1.00e-4
Epoch: [0158][6400/31050]	Loss [SR loss: 9.598 | contrastive loss: 7.457 ]	Time [ 145.2 s]

The text was updated successfully, but these errors were encountered:

LongguangWang · 2021-09-02T08:39:20Z

Hi @sujyQ, we will fix this bug in an upcoming update.

sujyQ · 2021-09-21T05:38:36Z

Hi @LongguangWang , I think here is the problem.

When set strict=True,
Traceback (most recent call last): File "test.py", line 19, in <module> model = model.Model(args, checkpoint) File "/home/hsj/d_drive/hsj/hsj/DASR_DDF/model/__init__.py", line 35, in __init__ cpu=args.cpu File "/home/hsj/d_drive/hsj/hsj/DASR_DDF/model/__init__.py", line 104, in load strict=True File "/home/hsj/anaconda3/envs/pytorch36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for BlindSR: Missing key(s) in state_dict: "E.queue", "E.queue_ptr", "E.encoder_k.E.0.weight", "E.encoder_k.E.0.bias", "E.encoder_k.E.1.weight", "E.encoder_k.E.1.bias", "E.encoder_k.E.1.running_mean", "E.encoder_k.E.1.running_var", "E.encoder_k.E.1.num_batches_tracked", "E.encoder_k.E.3.weight", "E.encoder_k.E.3.bias", "E.encoder_k.E.4.weight", "E.encoder_k.E.4.bias", "E.encoder_k.E.4.running_mean", "E.encoder_k.E.4.running_var", "E.encoder_k.E.4.num_batches_tracked", "E.encoder_k.E.6.weight", "E.encoder_k.E.6.bias", "E.encoder_k.E.7.weight", "E.encoder_k.E.7.bias", "E.encoder_k.E.7.running_mean", "E.encoder_k.E.7.running_var", "E.encoder_k.E.7.num_batches_tracked", "E.encoder_k.E.9.weight", "E.encoder_k.E.9.bias", "E.encoder_k.E.10.weight", "E.encoder_k.E.10.bias", "E.encoder_k.E.10.running_mean", "E.encoder_k.E.10.running_var", "E.encoder_k.E.10.num_batches_tracked", "E.encoder_k.E.12.weight", "E.encoder_k.E.12.bias", "E.encoder_k.E.13.weight", "E.encoder_k.E.13.bias", "E.encoder_k.E.13.running_mean", "E.encoder_k.E.13.running_var", "E.encoder_k.E.13.num_batches_tracked", "E.encoder_k.E.15.weight", "E.encoder_k.E.15.bias", "E.encoder_k.E.16.weight", "E.encoder_k.E.16.bias", "E.encoder_k.E.16.running_mean", "E.encoder_k.E.16.running_var", "E.encoder_k.E.16.num_batches_tracked", "E.encoder_k.mlp.0.weight", "E.encoder_k.mlp.0.bias", "E.encoder_k.mlp.2.weight", "E.encoder_k.mlp.2.bias".
occurs.

tongchangD · 2021-12-03T08:24:56Z

How did you solve this problem
#34 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue about resume #34

issue about resume #34

sujyQ commented Aug 30, 2021

LongguangWang commented Sep 2, 2021

sujyQ commented Sep 21, 2021 •

edited

Loading

tongchangD commented Dec 3, 2021

issue about resume #34

issue about resume #34

Comments

sujyQ commented Aug 30, 2021

LongguangWang commented Sep 2, 2021

sujyQ commented Sep 21, 2021 • edited Loading

tongchangD commented Dec 3, 2021

sujyQ commented Sep 21, 2021 •

edited

Loading