Any tricks to reproduce the performance? #7

mssjtxwd · 2021-02-01T12:10:38Z

Hello, I am trying to reproduce your work with ribFrac data. From your README, I put the data in the specified directory and ran train.py directly for training，then I ran predict.py + eval.py to get the performance. But We can only achieve about 50 Recall @ 8FA on the validation set, which is significant different from the performance figure claimed by the 3rd place in the competition (the performance form on PPT shows they archived 90Recall @ 8FA on val set). Of course, the performance can be impacted by many factors, so I would like to ask if you can tell me what performance your method can achieve based on the ribFrac training set (can be the result of post-processing and TTA)

duducheng · 2021-02-02T05:10:16Z

Hi @mssjtxwd ,

This project aims at a prototype to be a baseline method for RibFrac Challenge; However, as we are the data provider for the challenge, we would like to avoid unintended data leakage. Therefore, we did not provide full details for the models, including those in both training and inference stages.

Basically, the FracNet in our EBioMedicine is a one-stage model without false positive reduction stage. The performance in the main text is trained on RibFrac training set + 300 in-house cases, but we also provide the performance trained with public training set in the supplementary materials (pls refer to the paper). U may find it still works very well, which could be a top-ranking solution.

As for the reproducible issue, I guess 2 possible options:

In the training stage, the setting in the main.py is cleaned for open-source purpose, but it is not the full training procedure in our actual implementation. We have trained the model with multi-stage training phases, i.e., training with a group of hyper-parameters and then finetune with another group. U should take care of batch size, data sampling strategy, learning rate, etc.
In the inference stage, we use extra post-processing steps. In the public release version, the post-processing only includes removing the spine regions; however, in our actual implementation, we also remove the false positive far from the cage of lung parenchyma. It is not difficult to implement, but it involves our in-house lung analysis code. We are considering whether to separate these code.

Sorry to be disapointing you, but if you want to reproduce the performance in the paper, u have to tune your model a little. However, it is gauranteed that the performance could be reproduced with this one-stage model using 3D UNet as backbone.

Good luck!
Jiancheng

kaimingkuang · 2021-02-02T06:05:38Z

Hi @mssjtxwd

As per @duducheng, the training configuration in our actual implementation is different from the one in the open-sourced code. There are a few details you may try:

Larger batch size. This should be super important since we use a larger batch size with multiple GPUs in our training. In this repo we set the batch size to 4 to fit in one single 11G GPU;
Finetuning. We use a multi-stage training strategy to take the performance up a notch;
Better patch assembling strategy. This repo uses a simple average when dealing with overlaps across patches. Keeps only the center regions should give a better performance.

JXQI · 2021-03-27T13:37:35Z

Hello, I am trying to reproduce your work with ribFrac data. From your README, I put the data in the specified directory and ran train.py directly for training，then I ran predict.py + eval.py to get the performance. But We can only achieve about 50 Recall @ 8FA on the validation set, which is significant different from the performance figure claimed by the 3rd place in the competition (the performance form on PPT shows they archived 90Recall @ 8FA on val set). Of course, the performance can be impacted by many factors, so I would like to ask if you can tell me what performance your method can achieve based on the ribFrac training set (can be the result of post-processing and TTA)

my result is the same as yours ,do you reslove it?

BelieferQAQ · 2022-06-04T02:11:42Z

Hi @mssjtxwd

As per @duducheng, the training configuration in our actual implementation is different from the one in the open-sourced code. There are a few details you may try:

Larger batch size. This should be super important since we use a larger batch size with multiple GPUs in our training. In this repo we set the batch size to 4 to fit in one single 11G GPU;

Finetuning. We use a multi-stage training strategy to take the performance up a notch;

Better patch assembling strategy. This repo uses a simple average when dealing with overlaps across patches. Keeps only the center regions should give a better performance.

Excuse me,
How to use multi stage training strategy？

duducheng closed this as completed Feb 5, 2021

kaimingkuang mentioned this issue Mar 28, 2021

The performance is bad,that average_recall is 40%。 And I trained with 1080ti , at the same time ,I just change the batch-size to 8. Compared to yours paper's performance,why the result is so bad ? #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any tricks to reproduce the performance? #7

Any tricks to reproduce the performance? #7

mssjtxwd commented Feb 1, 2021

duducheng commented Feb 2, 2021

kaimingkuang commented Feb 2, 2021 •

edited

Loading

JXQI commented Mar 27, 2021

BelieferQAQ commented Jun 4, 2022

Any tricks to reproduce the performance? #7

Any tricks to reproduce the performance? #7

Comments

mssjtxwd commented Feb 1, 2021

duducheng commented Feb 2, 2021

kaimingkuang commented Feb 2, 2021 • edited Loading

JXQI commented Mar 27, 2021

BelieferQAQ commented Jun 4, 2022

kaimingkuang commented Feb 2, 2021 •

edited

Loading