Pagewise Processing #2

krvoigt · 2021-12-07T12:41:37Z

Current situation

Processors iterate over the files in a workspace on their own. While it is possible to restrict the processing to a single page or a list/range of pages, the API is targeted towards processors deriving the pages to process on their own. Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two (i.e. if doing pagewise processing with pageID restriction, the setup in process still happens for every call.

How it should be

The process method should be deprecated and replaced with a process_page method.

Processors should have a setup method that encapsulates all the post-initialization but pre-processing steps necessary for processing.

Steps

Refactor processor code in OCR-D/core to provide entry points for process_page and setup
Deprecate process
Test
Change all the processors
Communicate change in Tech Call
Reflect changed API in documentation

The text was updated successfully, but these errors were encountered:

paulpestov · 2022-01-17T08:47:07Z

Maybe we could describe more what problem we are trying to solve and what users can expect after the implementation.

Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two..
E.g. why is it useful to make this separation?

PS: I think the purpose behind this feature would normally serve as epic description (Like "ruduce processing time by X to meet metric Y") and one of the actual user stories from that epic would be "as processor dev I want to process pages in parallel"

kba · 2022-01-17T13:49:34Z

Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two..
E.g. why is it useful to make this separation?

It improves performance because setting up the processor can be done just once instead of with every call to process.

krvoigt added the Epic label Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pagewise Processing #2

Pagewise Processing #2

krvoigt commented Dec 7, 2021 •

edited by kba

paulpestov commented Jan 17, 2022

kba commented Jan 17, 2022

Pagewise Processing #2

Pagewise Processing #2

Comments

krvoigt commented Dec 7, 2021 • edited by kba

Current situation

How it should be

Steps

paulpestov commented Jan 17, 2022

kba commented Jan 17, 2022

krvoigt commented Dec 7, 2021 •

edited by kba