cannot reproduce the reported results #3

licaizi · 2021-12-09T08:28:42Z

a very large gap between the reported results and mines, did I miss any important tricks?
The details of my experiments are presented as follows,
1, fix two bugs:

2, prepare acdc dataset via generate_acdc.py

3, prepare running scripts:
(1) from scratch:
python train_supervised.py --device cuda:0 --batch_size 10 --epochs 200 --data_dir ./dir_for_labeled_data --lr 5e-4 --min_lr 5e-6 --dataset acdc --patch_size 352 352 --experiment_name supervised_acdc_random_sample_6_ --initial_filter_size 48 --classes 4 --enable_few_data --sampling_k 6;

(2) contrastive learning:
python train_contrast.py --device cuda:0 --batch_size 32 --epochs 300 --data_dir ./dir_for_unlabeled_data --lr 0.01 --do_contrast --dataset acdc --patch_size 352 352 --experiment_name contrast_acdc_pcl_temp01_thresh035_ --slice_threshold 0.35 --temp 0.1 --initial_filter_size 48 --classes 512 --contrastive_method pcl;

python train_contrast.py --device cuda:0 --batch_size 32 --epochs 300 --data_dir ./dir_for_unlabeled_data --lr 0.01 --do_contrast --dataset acdc --patch_size 352 352 --experiment_name contrast_acdc_gcl_temp01_thresh035_ --slice_threshold 0.35 --temp 0.1 --initial_filter_size 48 --classes 512 --contrastive_method gcl;

python train_contrast.py --device cuda:0 --batch_size 32 --epochs 300 --data_dir ./dir_for_unlabeled_data --lr 0.01 --do_contrast --dataset acdc --patch_size 352 352 --experiment_name contrast_acdc_simclr_temp01_thresh035_ --slice_threshold 0.35 --temp 0.1 --initial_filter_size 48 --classes 512 --contrastive_method simclr;

(3) finetuning:
python train_supervised.py --device cuda:0 --batch_size 10 --epochs 100 --data_dir ./dir_for_labeled_data --lr 5e-5 --min_lr 5e-6 --dataset acdc --patch_size 352 352 --experiment_name supervised_acdc_simclr_sample_6_ --initial_filter_size 48 --classes 4 --enable_few_data --sampling_k 6 --restart --pretrained_model_path ./results/contrast_acdc_simclr_temp01_thresh035_2021-12-05_09-43-38/model/latest.pth;

python train_supervised.py --device cuda:1 --batch_size 10 --epochs 100 --data_dir ./dir_for_labeled_data --lr 5e-5 --min_lr 5e-6 --dataset acdc --patch_size 352 352 --experiment_name supervised_acdc_gcl_sample_6_ --initial_filter_size 48 --classes 4 --enable_few_data --sampling_k 6 --restart --pretrained_model_path ./results/contrast_acdc_gcl_temp01_thresh035_2021-12-04_03-46-35/model/latest.pth;

python train_supervised.py --device cuda:1 --batch_size 10 --epochs 100 --data_dir ./dir_for_labeled_data --lr 5e-5 --min_lr 5e-6 --dataset acdc --patch_size 352 352 --experiment_name supervised_acdc_pcl_sample_6_ --initial_filter_size 48 --classes 4 --enable_few_data --sampling_k 6 --restart --pretrained_model_path ./results/contrast_acdc_pcl_temp01_thresh035_2021-12-02_21-48-13/model/latest.pth;

4, experimental results(ubuntu16.04, pytorch1.9, NVIDIA 2080Ti * 2, Dice metric, take sample_k=6 as an example):

dewenzeng · 2021-12-10T02:37:04Z

@CaiziLee Thanks for pointing out the bugs. I guess the batchgenerators package has been updated, some of the functions have been moved to other places.
Looks like all the pre-trained models are not working in your case, have you checked the contrastive learning loss? My suggestion is to use a larger initial learning rate in contrastive learning like 0.1. Also, during fine-tuning, set the learning rate to the same as the train from scratch. Starting from 5e-5 seems to be too small.
Here is a contrastive learning loss example and a fine-tune result example of mine

licaizi · 2021-12-13T14:42:39Z

Hi, thanks for your reply, resetting the lr of finetuning works, the best mean Dice of 5 folds of mine is 0.8421(0.025) , I think it's a normal bias, thanks again for your wonderful work.

licaizi · 2021-12-16T03:30:42Z

But, with only labeled dataset(100 patients), the best mean Dice(sample_k=6) is 0.7858, which has no improvement against the baseline(0.7883).
1, It seems like the dataset scale has a large impact on performance.
2, Considering you used all data including testing data, to some extent, is it a data leakage? Even though you did not use manual label during pretraining, the testing data has already been used in pre-training stage with self-defined labels.

dewenzeng · 2022-03-07T19:16:07Z

But, with only labeled dataset(100 patients), the best mean Dice(sample_k=6) is 0.7858, which has no improvement against the baseline(0.7883). 1, It seems like the dataset scale has a large impact on performance. 2, Considering you used all data including testing data, to some extent, is it a data leakage? Even though you did not use manual label during pretraining, the testing data has already been used in pre-training stage with self-defined labels.

@CaiziLee For your questions:

Yes, contrastive learning does rely on the dataset scale. I remember seeing a little improvement compared with vanilla baseline when using only labeled ACDC, maybe some hyperparameters are not the same. I have to rerun some of the experiments to check on that.
Yes, there is probably a data leakage. I think a better way to evaluate this is to train CL on the training set and only test on the test set. Or if using cross-validation, it's better to pre-trained a CL model on each cross-validation partition, although this would make it a little complicated. But anyway, the baselines are using the same data as pcl, so I guess that still can say something. Also, the transfer learning results do not have this problem.

gskrpb mentioned this issue Jan 20, 2022

bug report and some notice #4

Closed

xyimaging mentioned this issue Mar 1, 2022

Questions on contrastive training #5

Closed

dewenzeng closed this as completed Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot reproduce the reported results #3

cannot reproduce the reported results #3

licaizi commented Dec 9, 2021

dewenzeng commented Dec 10, 2021

licaizi commented Dec 13, 2021

licaizi commented Dec 16, 2021

dewenzeng commented Mar 7, 2022

cannot reproduce the reported results #3

cannot reproduce the reported results #3

Comments

licaizi commented Dec 9, 2021

dewenzeng commented Dec 10, 2021

licaizi commented Dec 13, 2021

licaizi commented Dec 16, 2021

dewenzeng commented Mar 7, 2022