Creates no new document version on higher execution time #12

Jack12816 · 2023-06-27T17:09:29Z

Hey @aborroy - first off: thanks for the OCR transformer, it looks nice and lean!

I'm struggling with a migration from a hand-rolled OCR pipeline with Alfresco 5.0 (CE) to your OCR transformer with Alfresco 7.4 (CE). The direct integration as a folder rule would be much simpler. My setup works so far that I can upload the quick.pdf from this repo and the OCR magic (new document version) works as expected. That's great!

Here's my problem: When I upload a real PDF file (426kb, one page, PDF version 1.4) then no new document version is created, never. My guess is that the issue is caused by resource limits. I've experimented with file size and I think it's more related to the execution time. A bigger file (508kb, one page, PDF version 1.4) sometimes succeeds in a new document version, but not always. I'm pretty sure it's not the file size as the OCR transformer does not configure the maxSourceSizeBytes - which defaults to -1 (no limit) according to the docs.

Here are some screenshots:

I searched for transformer timeouts and configured on the repository the following settings:

-Dtransformer.timeout.default=300
-Dtransformserver.transformationTimeout=300
-Dcontent.transformer.default.timeoutMs=300000

but this does not change the situation. Unfortunately, I was not able to figure out where the transformOptions.get(TIMEOUT) comes from or how to set it properly.

While digging into this I recognized, when the execution time is less than 5 seconds the new document version is created. I didn't found any defaults for the transformOptions regarding the timeout.

Maybe you could give me a hint? :)

The text was updated successfully, but these errors were encountered:

Jack12816 · 2023-06-27T17:25:17Z

Looks like I found something on the repository.properties!

httpclient.config.transform.socketTimeout=5000
httpclient.config.transform.connectionRequestTimeout=5000
httpclient.config.transform.connectionTimeout=5000

When I set these configs higher it works as expected. So nevermind! ✌️

Jack12816 · 2023-06-27T17:26:12Z

Maybe this is useful to someone else who runs into this issue.

aborroy · 2023-07-10T07:10:23Z

Thanks @Jack12816

Jack12816 closed this as completed Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creates no new document version on higher execution time #12

Creates no new document version on higher execution time #12

Jack12816 commented Jun 27, 2023 •

edited

Loading

Jack12816 commented Jun 27, 2023

Jack12816 commented Jun 27, 2023

aborroy commented Jul 10, 2023

Creates no new document version on higher execution time #12

Creates no new document version on higher execution time #12

Comments

Jack12816 commented Jun 27, 2023 • edited Loading

Jack12816 commented Jun 27, 2023

Jack12816 commented Jun 27, 2023

aborroy commented Jul 10, 2023

Jack12816 commented Jun 27, 2023 •

edited

Loading