Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training error: RuntimeError: For non-complex input tensors, argument alpha must not be a complex number. #18

Open
hosea7456 opened this issue Nov 2, 2021 · 7 comments

Comments

@hosea7456
Copy link

hosea7456 commented Nov 2, 2021

Hi, thanks for your great jobs!
When I try to train a model, there was an error like that:


Traceback (most recent call last):
File "so_run.py", line 51, in
main()
File "so_run.py", line 43, in main
trainer.train()
File "/home/CCM/trainer/source_only_trainer.py", line 58, in train
self.optim.step()
File /home/anaconda3/envs/torch1.9/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/anaconda3/envs/torch1.9/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/anaconda3/envs/torch1.9/lib/python3.8/site-packages/torch/optim/sgd.py", line 110, in step
F.sgd(params_with_grad,
File "/home/anaconda3/envs/torch1.9/lib/python3.8/site-packages/torch/optim/functional.py", line 180, in sgd
param.add
(d_p, alpha=-lr)
RuntimeError: For non-complex input tensors, argument alpha must not be a complex number.


How should I fix it? Thank you.
And my config used to train is:


note: 'train'

configs of data

model: 'deeplab'
train: True
multigpu: False
fixbn: True
fix_seed: True

Optimizaers

learning_rate: 7.5e-5
num_steps: 5000
epochs: 2
weight_decay: 0.0005
momentum: 0.9
power: 0.9
round: 6

Logging

print_freq: 1
save_freq: 2000
tensorboard: False
neptune: False
screen: True
val: False
val_freq: 300

Dataset

source: 'gta5'
target: 'cityscapes'
worker: 0
batch_size: 2

#Transforms
input_src: 720
input_tgt: 720
crop_src: 600
crop_tgt: 600
mirror: True
scale_min: 0.5
scale_max: 1.5
rec: False

Model hypers

init_weight: './pretrained/DeepLab_resnet_pretrained_init-f81d91e8.pth'
restore_from: None

snapshot: './Data/snapshot/'
result: './miou_result/'
log: './log/'
plabel: './plabel'
gta5: {
data_dir: '/home/data/datasets/GTA5/',
data_list: './dataset/list/gta5_list.txt',
input_size: [1280, 720]
}
synthia: {
data_dir: '/home/guangrui/data/synthia/',
data_list: './dataset/list/synthia_list.txt',
input_size: [1280, 760]
}
cityscapes: {
data_dir: '/home/data/datasets/Cityscapes',
data_list: './dataset/list/cityscapes_train.txt',
input_size: [1024, 512]
}

@Solacex
Copy link
Owner

Solacex commented Nov 4, 2021

Hello,

Thanks for your interest on our work!
I tried to locate the problem you post but failed. But I postulate that the error is caused by the new version of pytorch, so I think using pytorch=1.7.0 may helps.

Hope it helps.

@hosea7456
Copy link
Author

Hello,

Thanks for your interest on our work! I tried to locate the problem you post but failed. But I postulate that the error is caused by the new version of pytorch, so I think using pytorch=1.7.0 may helps.

Hope it helps.

Hi, thanks for your advise. I have tried the version of pytorch==1.7.0, the before error was disappeared but another error is appaer:

Traceback (most recent call last):
File "so_run.py", line 51, in
main()
File "so_run.py", line 43, in main
trainer.train()
File "/home/CCM/trainer/source_only_trainer.py", line 58, in train
self.optim.step()
File "/home/anaconda3/envs/torch1.7/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/anaconda3/envs/torch1.7/lib/python3.8/site-packages/torch/optim/sgd.py", line 112, in step
p.add_(d_p, alpha=-group['lr'])
RuntimeError: value cannot be converted to type float without overflow: (2.10957e-06,-6.85442e-07)

I have no idea at all

@Solacex
Copy link
Owner

Solacex commented Nov 17, 2021

Hello
As far as I can postulate, it maybe because the training steps exceeds the max steps of the optimizer.
You can check it..

@Jo-wang
Copy link

Jo-wang commented Feb 9, 2022

Same error here, and I've tried to increase num_steps in so_config.yaml but it didn't work. Could you provide the parameter that you use to train source-only model?
Thank you!

@Jo-wang
Copy link

Jo-wang commented Feb 14, 2022

Hi, I just solved that several days ago. The error caused by the fixed max number of steps in adjusting learning rate. You can have a check if it's work.
Cheers,
zx

@Hyx098130
Copy link

嗨,我几天前刚刚解决了这个问题。调整学习率时固定的最大步数引起的错误。您可以检查它是否有效。干杯,zx

I also encountered this problem recently, can you elaborate on how to solve it? Thank you very much

@Jo-wang
Copy link

Jo-wang commented May 11, 2023

嗨,我几天前刚刚解决了这个问题。调整学习率时固定的最大步数引起的错误。您可以检查它是否有效。干杯,zx

I also encountered this problem recently, can you elaborate on how to solve it? Thank you very much

Hi there,
Sorry for the late reply. The issue is coming from the incorrect max step during optimizating. Here is my version:

def adjust_learning_rate(optimizer, i_iter, len_loader, args):
    lr = lr_poly(args.learning_rate, i_iter, args.epochs*len_loader, args.power)
    optimizer.param_groups[0]['lr'] = lr
    if len(optimizer.param_groups) > 1:
        optimizer.param_groups[1]['lr'] = lr * 10
    return lr

Hope this could help.

Zx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants