Skip to content
This repository has been archived by the owner on Jul 14, 2023. It is now read-only.

lower AUROCs than the author got #6

Open
yoheimatt opened this issue Sep 25, 2018 · 3 comments
Open

lower AUROCs than the author got #6

yoheimatt opened this issue Sep 25, 2018 · 3 comments

Comments

@yoheimatt
Copy link

yoheimatt commented Sep 25, 2018

Thank you for sharing the code with the community.
I ran the Keras version of the code.
It seems I was unable to get the AUROC close to what you have gotten:

0 Atelectasis 0.689804
1 Cardiomegaly 0.699429
2 Effusion 0.769636
3 Infiltration 0.655084
4 Mass 0.601279
5 Nodule 0.571633
6 Pneumonia 0.634000
7 Pneumothorax 0.677171
8 Consolidation 0.725847
9 Edema 0.817075
10 Emphysema 0.603675
11 Fibrosis 0.660121
12 Pleural_Thickening 0.650140
13 Hernia 0.647572

How many epochs do I need to run?

@georgeAccnt-GH
Copy link
Contributor

First of all, thank you for trying the code. Please feel free to add pain points. I am sure there were a few. We are also working on a streamlined version that will drop the deprecated Workbench and leverage the much more useful recently released AML SDK.
About the classif performance issue, you should try around 200 epochs. The value used in the repo (1) is just for demo purposes. How many epochs did u use? If you are using Azure DLVM for training, you could scale its size up to reduce time. I think on an NC12 (2 GPUs) it will take days (about 20 to 30 minutes per epoch)

@yoheimatt
Copy link
Author

Thank you for your quick reply. I did find a small potential issue. In https://github.com/Azure/AzureChestXRay/blob/master/AzureChestXRay_AMLWB/Code/src/azure_chestxray_utils.py
I think there could be an underscore missing in 'Pleural Thickening'. Without it, the processing create zero case of positive Pleural Thickening.

I will follow your suggestion of running 200 epochs. To be honest, I ran out of patience and stopped the training at 50th epoch after I didn't see much improvement. And you are right, it takes less than 30 minutes per epoch.

@Stexan
Copy link

Stexan commented Jan 19, 2019

Hello @georgeAccnt-GH, and thank you very much for your implementation of the study! Our team has also tried to replicate your results, and while we have better results than the original poster, we still didn't reach your AUC (you have a mean of 0,84 and we have a mean of 0,81).

What happens is that around epoch 30-35, the algorithms starts overfitting, so training becomes useless as performance on the valid/test sets just drops. We have followed the exact same steps that you have implemented yourself.

Do you think the data splits have an impact and the difference might come from there? Or is there anything else you did specifically to make the network not overfit so fast (we have also tried random crops along with the augmentation techniques used in your implementation, but that didn't help much either)?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants