You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Processors iterate over the files in a workspace on their own. While it is possible to restrict the processing to a single page or a list/range of pages, the API is targeted towards processors deriving the pages to process on their own. Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two (i.e. if doing pagewise processing with pageID restriction, the setup in process still happens for every call.
How it should be
The process method should be deprecated and replaced with a process_page method.
Processors should have a setup method that encapsulates all the post-initialization but pre-processing steps necessary for processing.
Steps
Refactor processor code in OCR-D/core to provide entry points for process_page and setup
Deprecate process
Test
Change all the processors
Communicate change in Tech Call
Reflect changed API in documentation
The text was updated successfully, but these errors were encountered:
Maybe we could describe more what problem we are trying to solve and what users can expect after the implementation.
Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two..
E.g. why is it useful to make this separation?
PS: I think the purpose behind this feature would normally serve as epic description (Like "ruduce processing time by X to meet metric Y") and one of the actual user stories from that epic would be "as processor dev I want to process pages in parallel"
Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two..
E.g. why is it useful to make this separation?
It improves performance because setting up the processor can be done just once instead of with every call to process.
Current situation
Processors iterate over the files in a workspace on their own. While it is possible to restrict the processing to a single page or a list/range of pages, the API is targeted towards processors deriving the pages to process on their own. Setup functionality (like loading models or other data into memory) is intertwined with processing, making it difficult to separate the two (i.e. if doing pagewise processing with
pageID
restriction, the setup inprocess
still happens for every call.How it should be
The
process
method should be deprecated and replaced with aprocess_page
method.Processors should have a
setup
method that encapsulates all the post-initialization but pre-processing steps necessary for processing.Steps
process_page
andsetup
process
The text was updated successfully, but these errors were encountered: