Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about training #132

Closed
zhanghongyong123456 opened this issue Jan 14, 2022 · 5 comments
Closed

Some questions about training #132

zhanghongyong123456 opened this issue Jan 14, 2022 · 5 comments

Comments

@zhanghongyong123456
Copy link

1.How to eliminate or reduce edge flickering problem,can i set --seq-length-lr Is it possible to increase the sequence length improvement,Does it work?
2.Only the composite image has no foreground image,Is it possible to remove foreground training,and foreground loss?or is there a better way?
3.How important is foreground prediction for matting

Looking forward to your reply

@Asthestarsfalll
Copy link

I also have the same question

@Jon-drugstore
Copy link

1.How to eliminate or reduce edge flickering problem,can i set --seq-length-lr Is it possible to increase the sequence length improvement,Does it work? 2.Only the composite image has no foreground image,Is it possible to remove foreground training,and foreground loss?or is there a better way? 3.How important is foreground prediction for matting

Looking forward to your reply
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1360/1360 [02:55<00:00, 7.75it/s]
[GPU0] Validation set average loss: 0.4140332430820255
[GPU0] Training epoch: 0
[GPU2] Training epoch: 0
[GPU1] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]
[GPU3] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "train.py", line 502, in
mp.spawn(
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 130, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 2 terminated with signal SIGKILL
train step1, with 4 gpus, it has some error?
python:3.8.6, torch1.9.0

@Jon-drugstore
Copy link

1.How to eliminate or reduce edge flickering problem,can i set --seq-length-lr Is it possible to increase the sequence length improvement,Does it work? 2.Only the composite image has no foreground image,Is it possible to remove foreground training,and foreground loss?or is there a better way? 3.How important is foreground prediction for matting
Looking forward to your reply
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1360/1360 [02:55<00:00, 7.75it/s]
[GPU0] Validation set average loss: 0.4140332430820255
[GPU0] Training epoch: 0
[GPU2] Training epoch: 0
[GPU1] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]
[GPU3] Training epoch: 0
0%| | 0/29777 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Python-3.8.6/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "train.py", line 502, in
mp.spawn(
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 130, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 2 terminated with signal SIGKILL
train step1, with 4 gpus, it has some error?
python:3.8.6, torch1.9.0

I have solved this issue with torch.distributed.launch method to start multi-process.

@PeterL1n
Copy link
Owner

  1. I think increasing the sequence length would help. But you are trading off with slower training time and more GPU memory consumption.
  2. You don't have to produce foreground output. You will have to modify the architecture and the training code to remove foreground training.
  3. Foreground is helpful for improving final results. Imagine the human subjects are waving hands fast. The alpha values of motion blurs are semi-transparent. Without foreground prediction, the background will leak.

@zhanghongyong123456
Copy link
Author

  1. I think increasing the sequence length would help. But you are trading off with slower training time and more GPU memory consumption.
  2. You don't have to produce foreground output. You will have to modify the architecture and the training code to remove foreground training.
  3. Foreground is helpful for improving final results. Imagine the human subjects are waving hands fast. The alpha values of motion blurs are semi-transparent. Without foreground prediction, the background will leak.

您好,对于第二点,修改网络框架,我将这段代码替换成这样的,得到的alpha是纯白色的,我觉得我修改的没有毛病,只是输出形式有差异而以,
image
对于低分辨率数据(第一二阶段): 我采用self.project_seg = Projection(16, 1) 直接得到一维度的alpha信息,
对于高分辨率数据(也就是第三阶段):我采用Projection(16,4)进行运算得到四维度的mat,直接将四维mat作为alpha信息进行DGF模块的细化计算,最后经过一个Projection(4,1)的维度转换得到一维度的alpha输出。
但是我测试得到的结果是这样的:
image
有一个大体的轮廓,我不知道这是为何,关键一点我查看tensorboard预测的效果很好(下图我预测的结果),所以 不清楚问题的根源在那里
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants