Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not produce a valid workspace #60

Closed
mikegerber opened this issue Jul 29, 2020 · 4 comments
Closed

Does not produce a valid workspace #60

mikegerber opened this issue Jul 29, 2020 · 4 comments

Comments

@mikegerber
Copy link
Contributor

It seems ocrd-olena-binarize does not produce a valid workspace anymore since ocrd workspace validate now checks PcGtdIs against METS IDs:

<report valid="false">
  <warning>pc:PcGts/@pcGtsId differs from mets:file/@ID: "" !== "OCR-D-IMG-BINPAGE_00000024"</warning>
</report>
@mikegerber
Copy link
Contributor Author

@kba Should the test target include a test for this, i.e. validating? (I believe it should, for every processor, but we should discuss it.)

@mikegerber mikegerber changed the title Does not oroduce a valid workspace Does not produce a valid workspace Jul 29, 2020
@kba
Copy link
Member

kba commented Jul 29, 2020

@kba Should the test target include a test for this, i.e. validating?

It probably should, yes.

At the very least, this particular test should be skippable. OCR-D/core#553

@mikegerber
Copy link
Contributor Author

@kba Another idea: I routinely run validation after running each processor to catch problems early. If there was a standard option --validate in core (supplemented by a config file that configures e.g. --skip options), this pattern:

ocrd-olena-binarize --overwrite -I $INPUT_FILE_GRP -O OCR-D-IMG-BINPAGE,OCR-D-IMG-BIN -P impl sauvola-ms-split
ocrd workspace validate $validate_options
                                                                                                                                                              
ocrd-sbb-textline-detector --overwrite -I OCR-D-IMG-BINPAGE -O OCR-D-SEG-LINE -P model /var/lib/textline_detection
ocrd workspace validate $validate_options

would simplify to:

ocrd-olena-binarize --validate --overwrite -I $INPUT_FILE_GRP -O OCR-D-IMG-BINPAGE,OCR-D-IMG-BIN -P impl sauvola-ms-split
ocrd-sbb-textline-detector --validate --overwrite -I OCR-D-IMG-BINPAGE -O OCR-D-SEG-LINE -P model /var/lib/textline_detection

I might be the one user to use the validation a lot, so I am not sure if it's really a good fit as a "core option".

@kba
Copy link
Member

kba commented Jul 30, 2020

that's a good idea and would not be difficult to implement. can you open an issue in OCR-D/core with that proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants