"Load Result" with parsererror #326

l0rn0r · 2022-11-08T15:42:36Z

Hello
I'm running the OCR4all Docker container on my Ubuntu 20.04.
It works quite well but there is an error, when I tried to load a PageXML in LAREX.

When I have a page in the LAREX editor, which went through every OCR4all steps till recognition, I wanted to load an already existing PageXML of this page - to check if I could load a ground truth text for training - I get the error message:
"Couldn't retrieve annotations from file."

And in the console it says
"request:/file/upload/annotations - fail 'parsererror'"
which is indicated by
Larex/resources/js/viewer/communicator.js, Line 17 - failed Post-request.
The writing permissions of the data-folder on the server should be good (777).
The PageXML file is v2013-07-15.

Any hint for this problem?
Or any hint how to load ground truth from existing PageXMLs to train a new model?

The text was updated successfully, but these errors were encountered:

bertsky · 2022-11-08T16:25:50Z

Don't remember anything about OCR4all integration (request API), but I often see this error with valid PAGE-XML files when

a TextEquiv has no Unicode or Plaintext element
some @regionRef does not exist
some OrderedGroup(Indexed) or UnorderedGroup(Indexed) is empty (has no child elements)
some @points are negative or float (which is also invalid by schema)

(This is due to the parser from PRImA being not very robust, and not exposing the internal cause of error correctly.)

maxnth · 2023-02-20T13:47:12Z

Excuse the late reply, I somehow totally overlooked this issue.
As already mentioned above, this is most likely caused by an PAGE XML file which isn't valid according to the schema.
If you could upload the XML file which causes the error, I'll have a look at it.

bertsky · 2023-02-26T19:23:06Z

Except for the last point (@points format), these are all cases which do not violate the schema. It's only the PRImA parser that fails. This is reproducible with all PRImA tools (editor, converter, viewer, layout evaluation), too.

I don't have examples readily available, but it should be straightforward to construct some from your existing test cases.

maxnth · 2023-02-26T19:59:20Z

Except for the last point (@Points format), these are all cases which do not violate the schema.

I'm not an XML schema expert so the following train of thought might be flawed but I'd be interested to know why the above mentioned cases wouldn't make the XML invalid?

@regionRef has IDREF as type and AFAIK this should always require the referenced ID to be present in the document according to the XML Schema Definition to make the document valid, doesn't it?
e.g. OrderedGroup requires minOccurs="1" for either RegionRefIndexed / OrderedGroupIndexed / UnorderedGroupIndexed so it being completely empty shouldn't be valid according to the schema
As there isn't any minOccurs value explicitly set for Unicode elements in a TextEquiv it defaults to minOccurs="1" and therefore should be mandatory to make the document valid

bertsky · 2023-02-26T20:33:42Z

@regionRef has IDREF as type and AFAIK this should always require the referenced ID to be present in the document according to the XML Schema Definition to make the document valid, doesn't it?

You're right. Dangling IDREF should make the document invalid as of XML specification. I had based my judgement on the behaviour of the libxml2 implementation, which does not check IDREF.

e.g. OrderedGroup requires minOccurs="1" for either RegionRefIndexed / OrderedGroupIndexed / UnorderedGroupIndexed so it being completely empty shouldn't be valid according to the schema

Right again, my bad.

As there isn't any minOccurs value explicitly set for Unicode elements in a TextEquiv it defaults to minOccurs="1" and therefore should be mandatory to make the document valid

Again, you're spot on. Sorry for my sloppy nonsense! (I carried this misconception with me for quite some time...)

maxnth · 2023-02-27T07:01:45Z

I'll close this for now, feel free to reopen this @l0rn0r if the issue still persists and isn't caused by invalid PAGE XML (or if the invalid PAGE XML is produced by OCR4all).

maxnth added the Type: Bug Indicates an unexpected problem or unintended behavior. label Feb 20, 2023

maxnth closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Load Result" with parsererror #326

"Load Result" with parsererror #326

l0rn0r commented Nov 8, 2022 •

edited

bertsky commented Nov 8, 2022

maxnth commented Feb 20, 2023

bertsky commented Feb 26, 2023

maxnth commented Feb 26, 2023

bertsky commented Feb 26, 2023

maxnth commented Feb 27, 2023

"Load Result" with parsererror #326

"Load Result" with parsererror #326

Comments

l0rn0r commented Nov 8, 2022 • edited

bertsky commented Nov 8, 2022

maxnth commented Feb 20, 2023

bertsky commented Feb 26, 2023

maxnth commented Feb 26, 2023

bertsky commented Feb 26, 2023

maxnth commented Feb 27, 2023

l0rn0r commented Nov 8, 2022 •

edited