Not able to reproduce the results listed in the paper with my trained model #7

LigZhong · 2022-05-03T12:29:43Z

I met a problem of mode collapse when step number is larger than 300K, and with the final model I got, I am not able to reproduce the result shown int the paper. Can you give your loss curve? @Paper99

Paper99 · 2022-05-04T05:13:14Z

Due to the offset overflow of the deformable convolution, the training process may be collapsed.
Such issue also occurs in other works that use DCN.
Please refer to this issue for more details.

However, we never met this problem when we trained the final model (which takes 500K iters).
The collapsed process may lead to an undesired model, whose result is far from that of our released model.
I suggest that you could retrain it or select an early safer checkpoint to resume training.

For convenience, we provide our loss curves as follows:

I hope these could help you.

sydney0zq · 2022-09-01T07:10:31Z

Due to the offset overflow of the deformable convolution, the training process may be collapsed. Such issue also occurs in other works that use DCN. Please refer to this issue for more details.

However, we never met this problem when we trained the final model (which takes 500K iters). The collapsed process may lead to an undesired model, whose result is far from that of our released model. I suggest that you could retrain it or select an early safer checkpoint to resume training.

For convenience, we provide our loss curves as follows:

I hope these could help you.

@Paper99 I also encounter this problem on my machine, which GPU card did you use for training? I use V100 32G, and it will collapse at about 300k iter.

Paper99 · 2022-09-02T01:01:39Z

Hi, we use 8 V100 (16G) GPUs or 8 1080ti GPUs to train our model.

sydney0zq · 2022-09-02T08:31:57Z

@Paper99 How do you suggest to solve the collapse problem? If we do clipping on the DCN module's weight, I cannot confirm the range ... Do we have another replaceable module to avoid the issue?

sydney0zq · 2022-09-05T12:20:43Z

@Paper99 Hello, how do you select the final release checkpoint, The 50w iter checkpoint or select the best among several final instances?

Paper99 · 2022-09-06T01:35:43Z

Just choose the best.

MasterHow · 2022-11-06T04:38:06Z

same question.

jiahui1688 · 2023-02-27T11:15:16Z

@Paper99 How do you suggest to solve the collapse problem? If we do clipping on the DCN module's weight, I cannot confirm the range ... Do we have another replaceable module to avoid the issue?

Hi, Is the problem solved? I have the same problem. Thank you.

Paper99 mentioned this issue May 4, 2022

Not able to reproduce the results listed in the paper with my trained model #6

Closed

Paper99 closed this as completed May 20, 2022

Paper99 mentioned this issue Sep 30, 2022

about the loss #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to reproduce the results listed in the paper with my trained model #7

Not able to reproduce the results listed in the paper with my trained model #7

LigZhong commented May 3, 2022 •

edited

Loading

Paper99 commented May 4, 2022 •

edited

Loading

sydney0zq commented Sep 1, 2022 •

edited

Loading

Paper99 commented Sep 2, 2022

sydney0zq commented Sep 2, 2022

sydney0zq commented Sep 5, 2022 •

edited

Loading

Paper99 commented Sep 6, 2022

MasterHow commented Nov 6, 2022

jiahui1688 commented Feb 27, 2023

Not able to reproduce the results listed in the paper with my trained model #7

Not able to reproduce the results listed in the paper with my trained model #7

Comments

LigZhong commented May 3, 2022 • edited Loading

Paper99 commented May 4, 2022 • edited Loading

sydney0zq commented Sep 1, 2022 • edited Loading

Paper99 commented Sep 2, 2022

sydney0zq commented Sep 2, 2022

sydney0zq commented Sep 5, 2022 • edited Loading

Paper99 commented Sep 6, 2022

MasterHow commented Nov 6, 2022

jiahui1688 commented Feb 27, 2023

LigZhong commented May 3, 2022 •

edited

Loading

Paper99 commented May 4, 2022 •

edited

Loading

sydney0zq commented Sep 1, 2022 •

edited

Loading

sydney0zq commented Sep 5, 2022 •

edited

Loading