Some questions about training #132

zhanghongyong123456 · 2022-01-14T09:40:41Z

1.How to eliminate or reduce edge flickering problem，can i set --seq-length-lr Is it possible to increase the sequence length improvement，Does it work?
2.Only the composite image has no foreground image，Is it possible to remove foreground training，and foreground loss？or is there a better way？
3.How important is foreground prediction for matting

Looking forward to your reply

Asthestarsfalll · 2022-01-14T11:15:27Z

I also have the same question

Jon-drugstore · 2022-03-04T07:48:28Z

1.How to eliminate or reduce edge flickering problem，can i set --seq-length-lr Is it possible to increase the sequence length improvement，Does it work? 2.Only the composite image has no foreground image，Is it possible to remove foreground training，and foreground loss？or is there a better way？ 3.How important is foreground prediction for matting

Looking forward to your reply
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1360/1360 [02:55<00:00, 7.75it/s]
[GPU0] Validation set average loss: 0.4140332430820255
[GPU0] Training epoch: 0
[GPU2] Training epoch: 0
[GPU1] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]
[GPU3] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "train.py", line 502, in
mp.spawn(
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 130, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 2 terminated with signal SIGKILL
train step1, with 4 gpus, it has some error?
python:3.8.6, torch1.9.0

Jon-drugstore · 2022-03-15T01:05:06Z

1.How to eliminate or reduce edge flickering problem，can i set --seq-length-lr Is it possible to increase the sequence length improvement，Does it work? 2.Only the composite image has no foreground image，Is it possible to remove foreground training，and foreground loss？or is there a better way？ 3.How important is foreground prediction for matting
Looking forward to your reply
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1360/1360 [02:55<00:00, 7.75it/s]
[GPU0] Validation set average loss: 0.4140332430820255
[GPU0] Training epoch: 0
[GPU2] Training epoch: 0
[GPU1] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]
[GPU3] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "train.py", line 502, in
mp.spawn(
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 130, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 2 terminated with signal SIGKILL
train step1, with 4 gpus, it has some error?
python:3.8.6, torch1.9.0

I have solved this issue with torch.distributed.launch method to start multi-process.

PeterL1n · 2022-03-25T07:32:41Z

I think increasing the sequence length would help. But you are trading off with slower training time and more GPU memory consumption.
You don't have to produce foreground output. You will have to modify the architecture and the training code to remove foreground training.
Foreground is helpful for improving final results. Imagine the human subjects are waving hands fast. The alpha values of motion blurs are semi-transparent. Without foreground prediction, the background will leak.

zhanghongyong123456 · 2022-03-31T02:20:58Z

I think increasing the sequence length would help. But you are trading off with slower training time and more GPU memory consumption.

You don't have to produce foreground output. You will have to modify the architecture and the training code to remove foreground training.

Foreground is helpful for improving final results. Imagine the human subjects are waving hands fast. The alpha values of motion blurs are semi-transparent. Without foreground prediction, the background will leak.

您好，对于第二点，修改网络框架，我将这段代码替换成这样的，得到的alpha是纯白色的，我觉得我修改的没有毛病，只是输出形式有差异而以，

对于低分辨率数据（第一二阶段）：我采用self.project_seg = Projection(16, 1) 直接得到一维度的alpha信息，
对于高分辨率数据（也就是第三阶段）：我采用Projection（16，4）进行运算得到四维度的mat，直接将四维mat作为alpha信息进行DGF模块的细化计算，最后经过一个Projection（4，1）的维度转换得到一维度的alpha输出。
但是我测试得到的结果是这样的：

有一个大体的轮廓，我不知道这是为何，关键一点我查看tensorboard预测的效果很好（下图我预测的结果），所以不清楚问题的根源在那里

PeterL1n closed this as completed Mar 25, 2022

zhanghongyong123456 mentioned this issue Apr 14, 2022

关于训练自己数据集一些疑惑： #161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about training #132

Some questions about training #132

zhanghongyong123456 commented Jan 14, 2022

Asthestarsfalll commented Jan 14, 2022

Jon-drugstore commented Mar 4, 2022

Jon-drugstore commented Mar 15, 2022

PeterL1n commented Mar 25, 2022

zhanghongyong123456 commented Mar 31, 2022

Some questions about training #132

Some questions about training #132

Comments

zhanghongyong123456 commented Jan 14, 2022

Asthestarsfalll commented Jan 14, 2022

Jon-drugstore commented Mar 4, 2022

Jon-drugstore commented Mar 15, 2022

PeterL1n commented Mar 25, 2022

zhanghongyong123456 commented Mar 31, 2022