Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any tricks to reproduce the performance? #7

Closed
mssjtxwd opened this issue Feb 1, 2021 · 4 comments
Closed

Any tricks to reproduce the performance? #7

mssjtxwd opened this issue Feb 1, 2021 · 4 comments

Comments

@mssjtxwd
Copy link

mssjtxwd commented Feb 1, 2021

Hello, I am trying to reproduce your work with ribFrac data. From your README, I put the data in the specified directory and ran train.py directly for training,then I ran predict.py + eval.py to get the performance. But We can only achieve about 50 Recall @ 8FA on the validation set, which is significant different from the performance figure claimed by the 3rd place in the competition (the performance form on PPT shows they archived 90Recall @ 8FA on val set). Of course, the performance can be impacted by many factors, so I would like to ask if you can tell me what performance your method can achieve based on the ribFrac training set (can be the result of post-processing and TTA)

@duducheng
Copy link
Member

Hi @mssjtxwd ,

This project aims at a prototype to be a baseline method for RibFrac Challenge; However, as we are the data provider for the challenge, we would like to avoid unintended data leakage. Therefore, we did not provide full details for the models, including those in both training and inference stages.

Basically, the FracNet in our EBioMedicine is a one-stage model without false positive reduction stage. The performance in the main text is trained on RibFrac training set + 300 in-house cases, but we also provide the performance trained with public training set in the supplementary materials (pls refer to the paper). U may find it still works very well, which could be a top-ranking solution.

As for the reproducible issue, I guess 2 possible options:

  1. In the training stage, the setting in the main.py is cleaned for open-source purpose, but it is not the full training procedure in our actual implementation. We have trained the model with multi-stage training phases, i.e., training with a group of hyper-parameters and then finetune with another group. U should take care of batch size, data sampling strategy, learning rate, etc.
  2. In the inference stage, we use extra post-processing steps. In the public release version, the post-processing only includes removing the spine regions; however, in our actual implementation, we also remove the false positive far from the cage of lung parenchyma. It is not difficult to implement, but it involves our in-house lung analysis code. We are considering whether to separate these code.

Sorry to be disapointing you, but if you want to reproduce the performance in the paper, u have to tune your model a little. However, it is gauranteed that the performance could be reproduced with this one-stage model using 3D UNet as backbone.

Good luck!
Jiancheng

@kaimingkuang
Copy link
Member

kaimingkuang commented Feb 2, 2021

Hi @mssjtxwd

As per @duducheng, the training configuration in our actual implementation is different from the one in the open-sourced code. There are a few details you may try:

  • Larger batch size. This should be super important since we use a larger batch size with multiple GPUs in our training. In this repo we set the batch size to 4 to fit in one single 11G GPU;
  • Finetuning. We use a multi-stage training strategy to take the performance up a notch;
  • Better patch assembling strategy. This repo uses a simple average when dealing with overlaps across patches. Keeps only the center regions should give a better performance.

@JXQI
Copy link

JXQI commented Mar 27, 2021

Hello, I am trying to reproduce your work with ribFrac data. From your README, I put the data in the specified directory and ran train.py directly for training,then I ran predict.py + eval.py to get the performance. But We can only achieve about 50 Recall @ 8FA on the validation set, which is significant different from the performance figure claimed by the 3rd place in the competition (the performance form on PPT shows they archived 90Recall @ 8FA on val set). Of course, the performance can be impacted by many factors, so I would like to ask if you can tell me what performance your method can achieve based on the ribFrac training set (can be the result of post-processing and TTA)

my result is the same as yours ,do you reslove it?

@BelieferQAQ
Copy link

Hi @mssjtxwd

As per @duducheng, the training configuration in our actual implementation is different from the one in the open-sourced code. There are a few details you may try:

  • Larger batch size. This should be super important since we use a larger batch size with multiple GPUs in our training. In this repo we set the batch size to 4 to fit in one single 11G GPU;
  • Finetuning. We use a multi-stage training strategy to take the performance up a notch;
  • Better patch assembling strategy. This repo uses a simple average when dealing with overlaps across patches. Keeps only the center regions should give a better performance.

Excuse me,
How to use multi stage training strategy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants