New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
print error - ICDAR2017_shared_task_workflows.ipynb #16
Comments
Thanks! The signature of wf.list_steps() changed, so, yes, you should do print(wf.list_steps()). Please note that the workflow is about preprocessing the vudnc data, this has nothing to do with the icdar 2017 shared task. Also, I do not recommend using the vudnc data, because it is very noisy. But if you do want to preprocess it anyway, you should do
|
You are correct. I meant that I was not able to run vudnc-preprocess-pack.cwl. For good results in english, do you recommend using the english monograph partition of ICDAR? I trained with both monograph and the periodical partitions in separated but the validation accuracy and loss were not good (and also the tests I made). I would like to help with some additional documentation to improve reproducibility, but I need a roadmap of how to get significant results (mainly for english documents). |
Unfortunately, ochre is not (yet) fit for training good ocr post-correction models. I plan to work on it in the future, but only as a hobby project. So no promises there! Generally speaking, the OCR post-correction datasets are small. That's why I'm making a list of them, so they can be used for generalization. I don't think that training on the English monograph data will give you a model that will work on other data, because OCR errors tend to depend on time period, font, the ocr software that was used, etc. |
Hi guys,
I suggest change print wf.list_steps() to print (wf.list_steps()) in the notebook ICDAR2017_shared_task_workflows.ipynb
Also, I would not able to run cwltool ochre/cwl/ICDAR2017_shared_task_workflows. That is what I have got:
ochre/cwl/vudnc-preprocess-pack.cwl: error: argument --archive is required
The text was updated successfully, but these errors were encountered: