Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to achieve the publication results by using the pre-trained model and binatize.py? #17

Closed
Ladyliu47 opened this issue Apr 16, 2021 · 3 comments

Comments

@Ladyliu47
Copy link

Thank you for the great work in document image binarization~

Is "model_weights_dibco_6_256x256_s96_aug_m205_f64_k5_s2_se3_e200_b32_esp.h5" corresponds to the model trained on the dibco series database except h-dibco2016?

When I run the binatize.py with dataset of h-dibco2016(http://vc.ee.duth.gr/h-dibco2016/benchmark/), the evaluation results are not so good as the conclusions of your publication. The average FM value of ten images is 86.39965. How to reach the FM value of SAE method: 91.65?

After using the pre-trained model, do I need some other operations to achieve the publication results?

By the way,the following is the evaluation results of h-dibco2016 in other pre-trained models:
DIBCO2016_Dataset.csv
image
What puzzles me is that some evaluation results are better than published ones, and some are worse. But I did not carry out additional training. I just specified the pre-trained model in the binatize.py. As for the evaluation, I use os.popen() to call the command line and execute: BinEvalWeights.exe and DIBCO_metrics.exe. It's a very simple process. Did I neglect some operations when using the pre-trained model?

Thank you again and I am looking forward to your help sincerely~

@ajgallego
Copy link
Owner

ajgallego commented May 1, 2021

Dear @Ladyliu47, thank you for your interest in this project.

Yes, the model you are referring to is the one that corresponds to the model trained on the dibco series database except h-dibco2016.

That difference in the result can be for several reasons. What threshold are you using? Evaluate different thresholds to see if the result improves. Are you sure the test partition is the same? There were DIBCO folders that included handwritten and printed images. On the other hand, the tool I used for evaluation was HDIBCO's from 2013.

If you still cannot improve the result, I would have to review the model. The process to use it is simple as you say.

Kind regards

@Ladyliu47
Copy link
Author

Dear @ajgallego, thanks for your response and suggestions.

The threshold I used for evaluation was 0.5. Inspired by your suggestion, the evaluation results based on different thresholds are as follows. Although the results are different, none of them can achieve the F-measure value of 91.65.

The test partition is the 10 original images of h-dibco2016(http://vc.ee.duth.gr/h-dibco2016/benchmark/DIBCO2016_dataset-original.zip). I didn't find the printed images of dibco2016. It seems like that there are only handwritten images in the dataset of dibco2016. Have you ever use the printed images of dibco2016 during the test? If so, please tell me the website of the printed images of dibco2016. Thank you very much.

The tool I used for evaluation was HDIBCO's from 2016(http://vc.ee.duth.gr/h-dibco2016/benchmark/DIBCO_metrics.zip). Due to the different versions of MATLAB, I can't use the evaluation tool of 2013. By the way, I have tested the sample images in 2013 based on the evaluation tool of 2016. And the results are the same as 2013, I think that there is no difference between the evaluation tools in different years.

Last but not least, have you ever evaluated the images from dibco2017(https://vc.ee.duth.gr/dibco2017/benchmark/) ? How about your results?

Thank you again for your help and I am looking forward to your reply.

<style> </style>
threshold 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
F-measure 86.87473 86.6786 86.66247 86.6474 86.63688 86.62508 86.61825 86.61082 86.59814 86.58317
Fps 90.39805 90.87379 90.89219 90.90235 90.91124 90.91526 90.92552 90.93636 90.9465 90.96493
PSNR 17.65381 17.63626 17.63381 17.63129 17.62909 17.62643 17.62559 17.62415 17.62146 17.61902
DRD 4.88334 4.86222 4.86404 4.86617 4.86759 4.86992 4.8703 4.87095 4.87247 4.87385
Recall 86.2434 84.87905 84.80306 84.74897 84.70734 84.66843 84.6339 84.59439 84.54435 84.4734
Precision 88.39292 89.42374 89.47259 89.50066 89.52428 89.54259 89.56579 89.59255 89.6205 89.66643
Rps 94.62979 94.15385 94.12175 94.10299 94.08874 94.07118 94.0588 94.04371 94.02554 94.00041
Pps 86.91282 88.13689 88.19566 88.22944 88.25749 88.27917 88.30724 88.33895 88.37199 88.42613

Attached file is the results of each test image at different thresholds:
DIBCO2016_threshold.csv

@ajgallego
Copy link
Owner

Dear @Ladyliu47, sorry for the delay. I did not receive notification of this comment.

I don't know why you don't get the same results. It may be some difference in the data used or perhaps in the weights file that is published. During experimentation, I ran hundreds of tests changing parameters. There may be an error in the published weight file and it may not be the one with which I got the best result for that dataset.

I have not tested with Dibco2017. I have not continued working on this project as I have focused on other research tasks, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants