-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Question
I am experiencing an issue with the processing time of OCR using Docling. I would like to better understand under what conditions OCR is activated and how I can significantly reduce the execution time.
Details of the Problem:
-
Code Executed:
time docling ./test01.pdf --pdf-backend dlparse_v4 --no-enrich-formula --force-ocr --no-enrich-code --ocr --ocr-engine easyocr --ocr-lang en --device auto --num-threads 8 --to md -
Execution Time:
real 49m43.488suser 71m3.873ssys 40m28.598s
-
Test File:
https://drive.google.com/file/d/1gX3say7OZ7danpJq9jAGIudf4zoFwYSN/view?usp=sharing
Questions and Requests:
-
OCR Activation Conditions: How can I determine when Docling will activate OCR? Are there specific settings that influence this decision?
-
Average Execution Time: What is the expected average execution time for OCR in Docling? The time obtained in my test seems excessive.
-
Processing Time Optimization: Are there ways to significantly reduce the processing time using cpu?
- Parameter Optimization: Are there specific parameters that can be adjusted to improve performance?
- System Configuration Improvement: Suggestions for hardware or configurations that can improve OCR performance.
Version of Docling:
docling 2.28.0
docling-core 2.23.2
docling-ibm-models 3.4.1
docling-parse 4.0.0