Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline is slower for files which are combined (input files with same prefix) #36

Open
peterdekker opened this issue Jun 12, 2018 · 0 comments
Assignees

Comments

@peterdekker
Copy link

peterdekker commented Jun 12, 2018

When running the pipeline for input files with the same prefix, the files are combined to one output file. Now I am testing on many-core hardware, it becomes apparent that combining files makes the pipeline much slower, especially in the OCR step. Probably this happens because parallelization across multiple CPU cores cannot be applied.

This is not a problem in itself, but I think it is good to notify users that files with the same prefix will be combined, and will have longer processing time. Or give them a choice to enable/disable combination. Now, a small difference in input, gives a large difference in processing time.

EDIT: I saw there is a "Reassamble PDF" option in the webinterface, but same-prefix files were also combined if I disabled this option.

@proycon proycon removed their assignment Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants