Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About comparison with other sota methods #15

Closed
Luffy03 opened this issue Apr 27, 2022 · 9 comments
Closed

About comparison with other sota methods #15

Luffy03 opened this issue Apr 27, 2022 · 9 comments

Comments

@Luffy03
Copy link

Luffy03 commented Apr 27, 2022

Sorry for bothering you again!
First, I have re-trained several models, i.e., CPS, U2PL, and AEL. But my results are far below their reported results. Among these SOTA methods with public codes, I can only reproduce expected results based on your ST++ (e.g. 74.15 for 1/16 pascal). Appreciate it!
I found there are several different settings within former methods, i.e., stronger backbone(resnet_stem), auxiliary decoders, larger crop_size, OHEM loss, Sync_bn, and several training tricks, which are extremely unfair for comparisons.
Then, I wonder whether the reviewers ask you to compare yours with these methods? Is it necessary to re-train your ST++ with the same settings? Just for curiosity.......

@LiheYoung
Copy link
Owner

LiheYoung commented Apr 27, 2022

Hi, really thanks for your appreciation, this encourages me a lot.

Actually, I also notice that some recent methods adopt several extra techniques, just as you mentioned. Moreover, they also perform sliding window evaluation on the Cityscapes, which may boost the final performance by ~2%.

Back to your questions, we are asked to compare with CPS and AEL during our submission. What we can do is to report our results when incorporated with part of these techniques. Besides, you may report your reproduced CPS/AEL/U2PL results in your work. I think you can also re-train ST++ with the same advanced techniques for a fair comparison.

@Luffy03
Copy link
Author

Luffy03 commented Apr 28, 2022

OK, thx very much!
By the way, I have some more problems with the selective training. As shown in the attached figures, the ablation study for ST++ is only employed for the pascal dataset in the arxiv version of your paper. However, I found that this strategy boosts limited performance for citysTcapes. I think it is because of the large size of cityscapes image, this image-level selecting method is weak. I also found the blind 2-stage training also achieve closed results to ST++ for cityscapes, which is quite different from that of pascal.
Thus, will the ablation study for cityscapes reported in your advanced version of ST++? Or would you please provide more details?
屏幕截图 2022-04-28 093719
屏幕截图 2022-04-28 093832

@LiheYoung
Copy link
Owner

Thanks for your results on Cityscapes!

Actually, we did not conduct ablations on the Cityscapes, since the training process is too time-consuming and unaffordable for us. I guess that on Cityscapes, selective training based on grids may be better, such as dividing the whole image into 2x2 grids.

@Luffy03
Copy link
Author

Luffy03 commented Apr 28, 2022

thx for your advice, i will give it a try

@LiheYoung
Copy link
Owner

LiheYoung commented Apr 29, 2022

Hi @Luffy03,

Could you tell me the reproduced results of these three methods? And did you adopt the same training settings as their main papers, such as the batch size and cropping size? Thanks a lot.

@Luffy03
Copy link
Author

Luffy03 commented Apr 30, 2022

Hi @Luffy03,

Could you tell me the reproduced results of these three methods? And did you adopt the same training settings as their main papers, such as the batch size and cropping size? Thanks a lot.

Hi, limited by the computation memory, I reproduce these methods based on former settings, (i.e., 321 and 721 crop size, traditional resnet, no ohem, and the same augmentations). And I add slide-window evaluation to cityscapes. All results are reported with deeplab101, batch size 16 for pascal and 12 for cityscapes. For fair comparisons, aux-decoder is used for only CPS not for others. Interestingly, I find cps_loss is useless but two decoders and cutmix_loss are useful, and I discard cutmix for fairness. These codes are copied from their official implementations and I fix them into my framework, thus there may be something wrong. I will spend more time on experiments.
屏幕截图 2022-04-30 225734

@Luffy03
Copy link
Author

Luffy03 commented Apr 30, 2022

Hi @Luffy03,
Could you tell me the reproduced results of these three methods? And did you adopt the same training settings as their main papers, such as the batch size and cropping size? Thanks a lot.

Hi, limited by the computation memory, I reproduce these methods based on former settings, (i.e., 321 and 721 crop size, traditional resnet, no ohem, and the same augmentations). And I add slide-window evaluation to cityscapes. All results are reported with deeplab101, batch size 16 for pascal and 12 for cityscapes. For fair comparisons, aux-decoder is used for only CPS not for others. Interestingly, I find cps_loss is useless but two decoders and cutmix_loss are useful, and I discard cutmix for fairness. These codes are copied from their official implementations and I fix them into my framework, thus there may be something wrong. I will spend more time on experiments. 屏幕截图 2022-04-30 225734

I have found some mistakes in my own reproduced AEL and U2PL codes compared with the official, the results reported may be not accurate. Maybe you can reproduce yourself, and I may fix them in one or two weeks.......

@LiheYoung
Copy link
Owner

Thanks for sharing your detailed experiments! In my personal experience, I also find CPS loss fails to bring obvious improvement.

After all, a fair comparison is necessary, and really looking forward to your future results~

@LiheYoung
Copy link
Owner

LiheYoung commented May 1, 2022

By the way, apart from what you mentioned, AEL and U2PL also adopt a more advanced DeepLabv3+ decoder, and their ResNet output stride is set as 8 instead of 16 in their official codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants