Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some issues about the reproducing results #12

Closed
yux94 opened this issue Jul 23, 2018 · 21 comments
Closed

Some issues about the reproducing results #12

yux94 opened this issue Jul 23, 2018 · 21 comments

Comments

@yux94
Copy link

yux94 commented Jul 23, 2018

When I tried to reproduce your codes, there are less-than-perfect results.
For example, below is the raw test_001 tiff
tif_raw_convert_img
And after the whole training with the ResNet18-CRF, I got the test prob map result as:
probmap_convert_img
while the ground-truth mask is something like:(since the camelyon16 organizers didn't provide the test GT with the format of tiff anymore, I transferred the raw tiff test file and xml file to the tiff mask with the ASAP software manually.)
npy_mask_convert_img

And I have just followed your test steps and evaluated the average FROC score for the whole test set, and got this:
froc_npy

However, the result is not at all satisfying.

And is there any other trick in your preprocess, postprocess, or the training process?

Here is the prob map of test_026:

probmap_convert_img_026

@yil8
Copy link
Collaborator

yil8 commented Jul 23, 2018

@yux94 Thanks for trying to reproduce my results. I actually feel what you got is already roughly the same as mine. First, I did not have any tricks for preprocessing/postprocessing. Everything is within the codebase. There are some important details when you try to sample the coordinates for training patches, but since I've already provided my sampled coordinates, it doesn't matter anyway. As for testing reproducibility, have you tried to use the ckpt I provided within the codebase to generate the probability map before you trained your own? If you use my ckpt, you should be able to get a probability map of Test_001.tif as this:
screen shot 2018-07-23 at 11 12 26 am
I would highly recommend using cmap=jet to plot the probability map as compared to black/white in your case, which does not quite differentiate numbers around 0.5

In [1]: import numpy as np

In [2]: from matplotlib import pyplot as plt

In [3]: probs_map = np.load('./Test_001.npy')

In [4]: plt.imshow(probs_map.transpose(), vmin=0, vmax=1, cmap='jet')
Out[4]: <matplotlib.image.AxesImage at 0x7f35e994ca58>

In [5]: plt.show()

And I would subjectively argue this figure matches the ground truth annotation pretty well. If you could reproduce this probability map and the corresponding FROC score of ~0.8 (as already achieved by one user), then at least there should be no problems in the postprocessing steps.

@yux94
Copy link
Author

yux94 commented Jul 24, 2018

Thank you so much for your generous help and suggestions!
After plotting the prob map with cmap=jet, I got the probability map of Test_001.tif with my ckpt as this:
probmap_convert_img_001_plot

Maybe I should train again and check my whole process, thank you so much!

@yil8
Copy link
Collaborator

yil8 commented Jul 24, 2018

@yux94 This one does look worse than my result. In addition to try my ckpt, it would be helpful to also plot your training/validation curve that I can also compare with mine.

@yux94
Copy link
Author

yux94 commented Jul 24, 2018

@yil8 Many thanks!

@yux94
Copy link
Author

yux94 commented Jul 27, 2018

When I try to resample the training patches randomly by myself and train the network again, I got the prob map with test_084 like:
probmap_convert_img_084_plot_resample_jet
And below is the prob map with my previous reproduced ckpt:
probmap_convert_img_084_plot_reproduce_jet
And this is your result:
probmap_convert_img_084_plot_rawbaidu_jet
That's very confusing since you have said that one user have already achieved good performance. And I am working on retraining the network again. Besides, it would be very nice if you could provide the detailed process of your sampling with hard mining. (#14 )

@yil8
Copy link
Collaborator

yil8 commented Jul 30, 2018

@yux94 When I said other users achieved good performance, I mean they used my provided ckpt and achieved 0.8+ FROC score. Your last heatmap plot based on my ckpt also looks good, and I guess if you calculate the FROC score, it probably will be around 0.8 as well. For the training part, again due to non-determinism of GPU convolution, it's almost impossible to achieve numerical identical results for retraining. But I would still suggest you plot your training curve, so that I can get some rough ideas. I'm currently traveling for business trip, and will try to find some time to implement the hard-negative sampling part once I'm back to US.

@yux94
Copy link
Author

yux94 commented Jul 30, 2018

Thank you so much for your patient and timely reply.
This is my training curve with 20 epoch, should I train with more epoch till the curve is stable?
getimage

@yux94
Copy link
Author

yux94 commented Jul 30, 2018

getimage 1
And this is my resampling training curve with 20 epoch.

@yil8
Copy link
Collaborator

yil8 commented Jul 31, 2018

@yux94 your first curve looks very similar to mine, which converges to ~0.92 valid accuracy. I guess your second curve does not include hard negative examples, thus it converges to higher accuracy. For the curve with hard negative examples, did you train you model using exactly the same config/command I provided in the README?

@yux94
Copy link
Author

yux94 commented Aug 3, 2018

Yes... Pretty sure. I will check again , many thanks!

@yil8
Copy link
Collaborator

yil8 commented Aug 3, 2018

@yux94 sorry I couldn't help more on the training side. BTW, what's your FROC score for each case?

@yux94
Copy link
Author

yux94 commented Aug 28, 2018

Sorry for bothering you again, we have tried to use the .ckpt you provided within the codebase to generate the probability map, and the final FROC score is not satisfing either.

FP 0.25 0.5 1 2 4 8 Avg
NCRF Model 0.5265 0.6106 0.6681 0.7257 0.7743 0.8053 0.6851

So we checked the probability maps. First, we generate coordinates of the detected tumor region with the nms.py. Next, we pick out normal cases (48 out pf 129) which is the false positive, and draw the histogram.
getimage 3

According to this histogram, a good FROC score might be achieved only if the threshold is set to ~0.9.

@yux94
Copy link
Author

yux94 commented Aug 28, 2018

Sorry , Test_049 and Test_114 are not excluded , thus I got the bad result.

@yux94 yux94 closed this as completed Aug 28, 2018
@yil8
Copy link
Collaborator

yil8 commented Aug 28, 2018

@yux94 did you obtain 0.80+ FROC score after excluding Test_049 and Test_114?

@Hukongtao
Copy link

@yux94 "I transferred the raw tiff test file and xml file to the tiff mask with the ASAP software manually". How did you do that?I need the test GT,too. I just used the ASAP look the tif.But I don't know how to produce the GT.

@yux94
Copy link
Author

yux94 commented Sep 8, 2018

@yil8 yes, I got the 0.80+ FROC score with your provided .ckpt after excluding Test_049 and Test_114, but my reproduced result is not satisfying either.

@yux94
Copy link
Author

yux94 commented Sep 8, 2018

@Hukongtao First you open the .tif file with the ASAP. Next load the .xml file and save it as .araw file. Then open the .tif and .araw file and save it(If I remember correctly). Here is my another solution by cv2.fillPoly: https://github.com/yux94/Pathology/blob/master/bin/xml2mask_2.py
Maybe the second method is more convenient.

@Hukongtao
Copy link

@yux94 嗯嗯,代码可以使用。我把生成的结果转化成黑白图像得到的是图很大,但是前景只有很小的一块诶。和你在上面展示的不一样

@Hukongtao
Copy link

@yux94 大神方便留个QQ或者微信么,有些问题还是想向您请教。或者您加我QQ:1821141394

@yux94
Copy link
Author

yux94 commented Sep 8, 2018

Excuse me, but I have one question again. Did you first train the resnet18 and then finetune the model with the crf model?

@yil8
Copy link
Collaborator

yil8 commented Sep 8, 2018

@yux94 Not quite sure what do you mean your "reproduced result is not satisfying either" exactly. FROC of 0.8+ is pretty good as far as I know. Do you have some specific examples? I trained resnet18 together with crf from scratch without finetune.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants