Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates no new document version on higher execution time #12

Closed
Jack12816 opened this issue Jun 27, 2023 · 3 comments
Closed

Creates no new document version on higher execution time #12

Jack12816 opened this issue Jun 27, 2023 · 3 comments

Comments

@Jack12816
Copy link

Jack12816 commented Jun 27, 2023

Hey @aborroy - first off: thanks for the OCR transformer, it looks nice and lean!

I'm struggling with a migration from a hand-rolled OCR pipeline with Alfresco 5.0 (CE) to your OCR transformer with Alfresco 7.4 (CE). The direct integration as a folder rule would be much simpler. My setup works so far that I can upload the quick.pdf from this repo and the OCR magic (new document version) works as expected. That's great!

Here's my problem: When I upload a real PDF file (426kb, one page, PDF version 1.4) then no new document version is created, never. My guess is that the issue is caused by resource limits. I've experimented with file size and I think it's more related to the execution time. A bigger file (508kb, one page, PDF version 1.4) sometimes succeeds in a new document version, but not always. I'm pretty sure it's not the file size as the OCR transformer does not configure the maxSourceSizeBytes - which defaults to -1 (no limit) according to the docs.

Here are some screenshots:

I searched for transformer timeouts and configured on the repository the following settings:

-Dtransformer.timeout.default=300
-Dtransformserver.transformationTimeout=300
-Dcontent.transformer.default.timeoutMs=300000    

but this does not change the situation. Unfortunately, I was not able to figure out where the transformOptions.get(TIMEOUT) comes from or how to set it properly.

While digging into this I recognized, when the execution time is less than 5 seconds the new document version is created. I didn't found any defaults for the transformOptions regarding the timeout.

Maybe you could give me a hint? :)

@Jack12816
Copy link
Author

Looks like I found something on the repository.properties!

httpclient.config.transform.socketTimeout=5000
httpclient.config.transform.connectionRequestTimeout=5000
httpclient.config.transform.connectionTimeout=5000

When I set these configs higher it works as expected. So nevermind! ✌️

@Jack12816
Copy link
Author

Maybe this is useful to someone else who runs into this issue.

@aborroy
Copy link
Owner

aborroy commented Jul 10, 2023

Thanks @Jack12816

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants