Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meet train error in stage 2 #27

Closed
zswmr opened this issue Mar 7, 2019 · 0 comments
Closed

meet train error in stage 2 #27

zswmr opened this issue Mar 7, 2019 · 0 comments

Comments

@zswmr
Copy link

zswmr commented Mar 7, 2019

your job is great!
but when i train on my own data, i meet a error in train stage 2:

`INFO:main: Val epoch: 219 Mean IoU: 1.000
INFO:main: Train epoch: 220 [0/44] Avg. Loss: 0.000 Avg. Time: 0.311
INFO:main: Train epoch: 220 [10/44] Avg. Loss: 0.000 Avg. Time: 0.267
INFO:main: Train epoch: 220 [20/44] Avg. Loss: 0.000 Avg. Time: 0.264
INFO:main: Train epoch: 220 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 220 [40/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 221 [0/44] Avg. Loss: 0.000 Avg. Time: 0.296
INFO:main: Train epoch: 221 [10/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 221 [20/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 221 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 221 [40/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 222 [0/44] Avg. Loss: 0.000 Avg. Time: 0.277
INFO:main: Train epoch: 222 [10/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 222 [20/44] Avg. Loss: 0.000 Avg. Time: 0.264
INFO:main: Train epoch: 222 [30/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 222 [40/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 223 [0/44] Avg. Loss: 0.000 Avg. Time: 0.303
INFO:main: Train epoch: 223 [10/44] Avg. Loss: 0.000 Avg. Time: 0.267
INFO:main: Train epoch: 223 [20/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 223 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 223 [40/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 224 [0/44] Avg. Loss: 0.000 Avg. Time: 0.288
INFO:main: Train epoch: 224 [10/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [20/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [30/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [40/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Val epoch: 224 [0/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [10/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [20/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [30/31] Mean IoU: 1.000
INFO:main: IoUs: [1. 1.]
INFO:main: Val epoch: 224 Mean IoU: 1.000

INFO:main: Train epoch: 225 [0/44] Avg. Loss: 0.000 Avg. Time: 0.316

Traceback (most recent call last):

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 429, in
main()

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 413, in main
args.freeze_bn[task_idx])

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 276, in train_segmenter
output = segmenter(input_var)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/vetec-tf/program/light-weight-refinenet/models/resnet.py", line 203, in forward
l1 = self.layer1(x)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/vetec-tf/program/light-weight-refinenet/models/resnet.py", line 135, in forward
out += residual

RuntimeError: The expanded size of the tensor (1024) must match the existing size (256) at non-singleton dimension 1`

and stage 1 is fininshed:
`INFO:main: Val epoch: 199 [0/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [10/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [20/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [30/31] Mean IoU: 1.000

INFO:main: IoUs: [1. 1.]

INFO:main: Val epoch: 199 Mean IoU: 1.000

INFO:main:Stage 1 finished, time spent 23.135min

INFO:main: Created train set = 265 examples, val set = 31 examples

INFO:main: Training Stage 2`

can you help me ? thank you!

@zswmr zswmr closed this as completed Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant