Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow not finishing #53

Closed
github-cli opened this issue Mar 14, 2021 · 7 comments
Closed

workflow not finishing #53

github-cli opened this issue Mar 14, 2021 · 7 comments

Comments

@github-cli
Copy link

it seems this workflow does not finish on larger higher res files, if i manually start ocrmypdf --redo-ocr input.odf output.pdf then it works fine but running "sudo -u www-data php cron.php" only updates smaller files (although it seems to start as it takes quite some time if new large scans were added but the files are never updated).
any way to debug this? isnt this the exact same command being used by the workflow?

@R0Wi
Copy link
Contributor

R0Wi commented Mar 14, 2021

Hi @github-cli, yes basically this command is issued. To be precise it's ocrmypdf --redo-ocr -q - - | cat and then the output stream is captured.

First thing i'd to is setting a more verbose loglevel in your NC config and then paste the results here if possible.

@github-cli
Copy link
Author

this is the relevant output, as for the rror messages in line 9+10, the same ones appear if i run ocrmypdf manually but it still finishes and creates the file correctly

{"reqId":"S1NK4jBJTnlJOXW26kaQ","level":4,"time":"2021-03-15T07:20:08+01:00","remoteAddr":"10.0.0.111","user":"user","app":"no app in context","method":"GET","url":"/ocs/v2.php/apps/files/api/v1/directEditing?format=json","message":{"Exception":"Error","Message":"Call to undefined method OC\\AllConfig::isGlobalScaleEnabled()","Code":0,"Trace":[{"file":"/var/www/nextcloud/apps/richdocuments/lib/AppInfo/Application.php","line":154,"function":"updateCSP","class":"OCA\\Richdocuments\\AppInfo\\Application","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Bootstrap/FunctionInjector.php","line":68,"function":"OCA\\Richdocuments\\AppInfo\\{closure}","class":"OCA\\Richdocuments\\AppInfo\\Application","type":"->","args":["*** sensitive parameters replaced ***"]},{"file":"/var/www/nextcloud/lib/private/AppFramework/Bootstrap/BootContext.php","line":52,"function":"injectFn","class":"OC\\AppFramework\\Bootstrap\\FunctionInjector","type":"->"},{"file":"/var/www/nextcloud/apps/richdocuments/lib/AppInfo/Application.php","line":156,"function":"injectFn","class":"OC\\AppFramework\\Bootstrap\\BootContext","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Bootstrap/Coordinator.php","line":176,"function":"boot","class":"OCA\\Richdocuments\\AppInfo\\Application","type":"->"},{"file":"/var/www/nextcloud/lib/private/legacy/OC_App.php","line":197,"function":"bootApp","class":"OC\\AppFramework\\Bootstrap\\Coordinator","type":"->"},{"file":"/var/www/nextcloud/lib/private/legacy/OC_App.php","line":137,"function":"loadApp","class":"OC_App","type":"::"},{"file":"/var/www/nextcloud/apps/dav/lib/AppInfo/Application.php","line":124,"function":"loadApps","class":"OC_App","type":"::"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Bootstrap/Coordinator.php","line":176,"function":"boot","class":"OCA\\DAV\\AppInfo\\Application","type":"->"},{"file":"/var/www/nextcloud/lib/private/legacy/OC_App.php","line":197,"function":"bootApp","class":"OC\\AppFramework\\Bootstrap\\Coordinator","type":"->"},{"file":"/var/www/nextcloud/lib/private/legacy/OC_App.php","line":137,"function":"loadApp","class":"OC_App","type":"::"},{"file":"/var/www/nextcloud/ocs/v1.php","line":57,"function":"loadApps","class":"OC_App","type":"::"},{"file":"/var/www/nextcloud/ocs/v2.php","line":24,"args":["/var/www/nextcloud/ocs/v1.php"],"function":"require_once"}],"File":"/var/www/nextcloud/apps/richdocuments/lib/AppInfo/Application.php","Line":225,"CustomMessage":"Could not boot richdocumentsCall to undefined method OC\\AllConfig::isGlobalScaleEnabled()"},"userAgent":"Mozilla/5.0 (Windows) mirall/3.1.3stable-Win64 (build 20210218) (Nextcloud)","version":"21.0.0.18"} {"reqId":"SoUwYmU8rRPwg8YaAFYR","level":1,"time":"2021-03-15T07:23:33+01:00","remoteAddr":"10.0.0.111","user":"user","app":"workflowengine","method":"PUT","url":"/remote.php/dav/files/user/+NextCloud/+Scans/user/ScanFromPC_HQ_2021-03-09_131313.pdf","message":"Flow handling done for event \\OCP\\Files::postCreate","userAgent":"Mozilla/5.0 (Windows) mirall/3.1.3stable-Win64 (build 20210218) (Nextcloud)","version":"21.0.0.18"} {"reqId":"SoUwYmU8rRPwg8YaAFYR","level":1,"time":"2021-03-15T07:23:33+01:00","remoteAddr":"10.0.0.111","user":"user","app":"workflowengine","method":"PUT","url":"/remote.php/dav/files/user/+NextCloud/+Scans/user/ScanFromPC_HQ_2021-03-09_131313.pdf","message":"Flow handling done for event \\OCP\\Files::postWrite","userAgent":"Mozilla/5.0 (Windows) mirall/3.1.3stable-Win64 (build 20210218) (Nextcloud)","version":"21.0.0.18"} {"reqId":"GZQmPqLDc56rWRWcqdsL","level":1,"time":"2021-03-15T07:23:34+01:00","remoteAddr":"10.0.0.111","user":"user","app":"workflowengine","method":"PUT","url":"/remote.php/dav/files/user/+NextCloud/+Scans/user/2021-03-06_150232_somepdf_HQ_.pdf","message":"Flow handling done for event \\OCP\\Files::postCreate","userAgent":"Mozilla/5.0 (Windows) mirall/3.1.3stable-Win64 (build 20210218) (Nextcloud)","version":"21.0.0.18"} {"reqId":"GZQmPqLDc56rWRWcqdsL","level":1,"time":"2021-03-15T07:23:34+01:00","remoteAddr":"10.0.0.111","user":"user","app":"workflowengine","method":"PUT","url":"/remote.php/dav/files/user/+NextCloud/+Scans/user/2021-03-06_150232_somepdf_HQ_.pdf","message":"Flow handling done for event \\OCP\\Files::postWrite","userAgent":"Mozilla/5.0 (Windows) mirall/3.1.3stable-Win64 (build 20210218) (Nextcloud)","version":"21.0.0.18"} {"reqId":"BDLkqeUZmxqshGhBlDKZ","level":1,"time":"2021-03-15T07:23:34+01:00","remoteAddr":"10.0.0.111","user":"user","app":"workflowengine","method":"PUT","url":"/remote.php/dav/files/user/+NextCloud/+Scans/user/ScanFromPC_HQ_2021-03-06_154928.pdf","message":"Flow handling done for event \\OCP\\Files::postCreate","userAgent":"Mozilla/5.0 (Windows) mirall/3.1.3stable-Win64 (build 20210218) (Nextcloud)","version":"21.0.0.18"} {"reqId":"BDLkqeUZmxqshGhBlDKZ","level":1,"time":"2021-03-15T07:23:35+01:00","remoteAddr":"10.0.0.111","user":"user","app":"workflowengine","method":"PUT","url":"/remote.php/dav/files/user/+NextCloud/+Scans/user/ScanFromPC_HQ_2021-03-06_154928.pdf","message":"Flow handling done for event \\OCP\\Files::postWrite","userAgent":"Mozilla/5.0 (Windows) mirall/3.1.3stable-Win64 (build 20210218) (Nextcloud)","version":"21.0.0.18"} {"reqId":"R5jlrbUhHQAQ4prqTDLm","level":1,"time":"2021-03-15T07:23:37+01:00","remoteAddr":"","user":"--","app":"passwords","method":"","url":"--","message":"Passwords runs cron.php in global mode","userAgent":"--","version":"21.0.0.18"} {"reqId":"R5jlrbUhHQAQ4prqTDLm","level":1,"time":"2021-03-15T07:24:05+01:00","remoteAddr":"","user":"user","app":"workflow_ocr","method":"","url":"--","message":"OCR for file /user/files/+NextCloud/+Scans/user/ScanFromPC_HQ_2021-03-09_131313.pdf not possible. Message: OCRmyPDF exited abnormally with exit-code 0. Message: 2 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 1 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 3 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 4 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 5 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.","userAgent":"--","version":"21.0.0.18"} {"reqId":"R5jlrbUhHQAQ4prqTDLm","level":1,"time":"2021-03-15T07:24:48+01:00","remoteAddr":"","user":"user","app":"workflow_ocr","method":"","url":"--","message":"OCR for file /user/files/+NextCloud/+Scans/user/2021-03-06_150232_somepdf_HQ_.pdf not possible. Message: OCRmyPDF exited abnormally with exit-code 0. Message: 2 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 1 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 3 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 4 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 5 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 6 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 7 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n\n 8 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.","userAgent":"--","version":"21.0.0.18"} {"reqId":"R5jlrbUhHQAQ4prqTDLm","level":1,"time":"2021-03-15T07:26:16+01:00","remoteAddr":"","user":"user","app":"workflowengine","method":"","url":"--","message":"Flow handling done for event \\OCP\\Files::postWrite","userAgent":"--","version":"21.0.0.18"}

@R0Wi
Copy link
Contributor

R0Wi commented Mar 15, 2021

Thanks for this. Well the relevant line is {"reqId":"R5jlrbUhHQAQ4prqTDLm","level":1,"time":"2021-03-15T07:24:48+01:00","remoteAddr":"","user":"user","app":"workflow_ocr","method":"","url":"--","message":"OCR for file /user/files/+NextCloud/+Scans/user/2021-03-06_150232_somepdf_HQ_.pdf not possible. Message: OCRmyPDF exited abnormally with exit-code 0. Message: 2 **** Error: stream operator isn't terminated by valid EOL.\n Output may be incorrect.\n [...] like you see ocrMyPdf is complaining about the file and the OCR process itself. So like you said the default behaviour of the commandline tool is that it raises a warning but writes the file nevertheless. In our case we decided not to store any PDF file which might be corrupted by the OCR process but rather keep the original file then.

If it's possible for you please paste the mentioned PDF file here, maybe @bahnwaerter could have a look at it and say what's wrong?

@github-cli
Copy link
Author

I have an example I can send, not exactly confidential but can I still share in private?
I scanned the exact same document with 100ppi which works fine and again with 200ppi which doesnt work...

@R0Wi
Copy link
Contributor

R0Wi commented Mar 15, 2021

Ok then it would be nice if you could send me an email with both files attached.

@bahnwaerter FYI

@bahnwaerter
Copy link
Collaborator

Hey @github-cli, thanks for sharing your original PDF files with @R0Wi and me.

I've taken a look at those PDF files and analyzed them. The PDF file of the low quality scan is compliant with the PDF 1.7 standard, whereas the PDF file of the high quality scan is not syntactically well-formed. Therefore, the PDF file of the high quality scan does not conform to any of the available PDF standards. Furthermore, I noticed that both PDF files were created by the HP scan tool. This scan tool seems to create faulty PDF files as the analysis of issue #42 shows.

To solve this issue, you can repair your faulty PDF files before uploading them to your Nextcloud server. Please follow the solution described in #42 (comment).

I will close this issue as it is related to the HP scan tool. But feel free to reopen it, if we can help somehow.

@bahnwaerter
Copy link
Collaborator

Duplicate of #42

@bahnwaerter bahnwaerter marked this as a duplicate of #42 Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants