Custom OCR #1502

pusapatiakhilraju · 2025-05-01T05:33:59Z

Question

Can I create my custom ocr class and pass it in to ocr_options? Any example code that can help me get started?
...

will this work?

from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline
from docling.datamodel.pipeline_options import PdfPipelineOptions
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from docling.datamodel.base_models import InputFormat
from docling_core.types.doc import ImageRefMode
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode, EasyOcrOptions, TesseractOcrOptions, OcrMacOptions, RapidOcrOptions, smolvlm_picture_description
from docling.datamodel.settings import settings
from PIL import Image
import os
from docling.models.base_model import BaseEnrichmentModel
from docling.pipeline.standard_pdf_pipeline import StandardPdfPipeline


class SuryaOcrModel(BaseEnrichmentModel):
    def __init__(self, enabled: bool = True):
        self.enabled = enabled
        self.recognition_predictor = RecognitionPredictor()
        self.detection_predictor = DetectionPredictor()

    def is_processable(self, doc, element) -> bool:
        return self.enabled and element.type == "page_image"

    def __call__(self, doc, element_batch: Iterable) -> Iterable:
        for element in element_batch:
            image: Image.Image = element.get_image(doc).convert("RGB")

            # Run Surya OCR
            prediction = self.recognition_predictor([image], [None], self.detection_predictor)[0][0]

            for line in prediction.text_lines:
                text = line.text.strip()
                if not text:
                    continue

                l, t, r, b = line.bbox  # Already in LTRB
                bbox = BoundingBox.from_ltrb(l, t, r, b)
                doc.add_item(TextItem(text=text, bbox=bbox, page_no=element.page_no))

            yield element

class SuryaOcrPipeline(StandardPdfPipeline):
    def __init__(self, pipeline_options):
        super().__init__(pipeline_options)
        self.enrichment_pipe = []
        self.enrichment_pipe.append(SuryaOcrModel(enabled=True))

    @classmethod
    def get_default_options(cls):
        return PdfPipelineOptions(
            generate_page_images=True,
            images_scale=2.0,
            do_ocr=True
        )

Converting

input_pdf_path = Path("./img/test.png")
output_dir = Path("parsed-doc-advanced/test")
output_dir.mkdir(parents=True, exist_ok=True)

pipeline_options = PyMuPdfOcrPipeline.get_default_options()

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_cls=SuryaOcrPipeline,
            pipeline_options=pipeline_options
        )
    }
)

result = converter.convert(input_pdf_path)

is this the right way to use the custom OCR? I create a class and use it in pipeline_cls

The text was updated successfully, but these errors were encountered:

dolfim-ibm · 2025-05-02T06:49:29Z

Note that we won't accept contribution adding dependencies with incompatible license (Surya is licensed as GPL).

This is the reason we have a plugin system for users to contribute their own third-party integration. You can read more in the plugin docs: https://docling-project.github.io/docling/concepts/plugins/

Bill-XU · 2025-05-08T09:34:53Z

Hi, @dolfim-ibm

I added following configuration to pyproject.toml, which is under my project's root named "fastapi_test".

[project.entry-points."docling"]
custom_ocr = "fastapi_test.docling_custom"

And I created a python file "docling_custom.py" under my project's root, within which ocr_engines method is defined as the following:

def ocr_engines():
    return {
        "ocr_engines": [
            CustomOcrModel,
        ]
    }

But when I ran converter, errors occurred.

...

KeyError: <class 'docling_custom.CustomOcrOptions'>

During handling of the above exception, another exception occurred:

...

RuntimeError: No class found with the name 'custom_ocr', known classes are:
	'easyocr' => <class 'docling.models.easyocr_model.EasyOcrModel'>
	'ocrmac' => <class 'docling.models.ocr_mac_model.OcrMacModel'>
	'rapidocr' => <class 'docling.models.rapid_ocr_model.RapidOcrModel'>
	'tesserocr' => <class 'docling.models.tesseract_ocr_model.TesseractOcrModel'>
	'tesseract' => <class 'docling.models.tesseract_ocr_cli_model.TesseractOcrCliModel'>

I have set the ocr_options of pipeline_options with my custom ocr options, in which the value of kind is "custom_ocr".
It seems that the ocr factory did not know custom_ocr . How can I fix this ?

Best regards,
Bill

Bill-XU · 2025-05-08T17:07:21Z

@dolfim-ibm @pusapatiakhilraju
Okay, I figured it our myself. Below is how I made it. Just for future reference.

To create a plugin and use it in one's project, it needs to create an individual project for the plugin first.
In my case, I created a new project "custom_docling".
And then, prepare the folder structure of this new project for future install.
In my case, my project looks like this.

custom_docling
├───src
│   └───custom_docling
│       ├───plugins
│       │   ├───__init__.py
│       │   └───custom_ocr.py
│       └───__init__.py
└───pyproject.toml

Choose a packaging tool and edit the project packaging information.
In my case, I chose setuptools for packaging (following docling's plugin guide: https://docling-project.github.io/docling/concepts/plugins/), so I wrote pyproject.toml as below.

[build-system]
requires = ["setuptools >= 65.0.0"]
build-backend = "setuptools.build_meta"

[project]
name = "custom_docling"
version = "0.0.1"
dependencies = [
  "docling>=2.30.0",
  "openai>=1.65.0",
]

[project.entry-points."docling"]
custom_ocr = "custom_docling.plugins.custom_ocr"

*path of entry-point should not begin from "src", it is ignored by python.
4. After all above, go on coding the plugin, but don't forget add a method "ocr_engines" to the plugin.
In my case, I added the method in custom_ocr.py.

def ocr_engines():
    return {
        "ocr_engines": [
            CustomOcrModel,
        ]
    }

While coding options for the model, should configure the kind correctly.
In my case, I created the options like this.

class CustomOcrOptions(OcrOptions):
    kind : ClassVar[str] = "custom_ocr"
    ...

Some notes on implementing an OCR plugin.

Options class should inherits OcrOptions (from docling.datamodel.pipeline_options).
Options class must declare a class variable "kind" and set its default value as the same as configured in pyproject.toml.
Model class should inherits BaseOcrModel (from docling.models.base_ocr_model).
Model class should implements following method:
- def init
- def call
- def get_options_type
get_options_type is a classmethod and should return type of Options.

After everything is done, use pip install -e xxx to install this plugin in the main project.

That's all.

pusapatiakhilraju · 2025-05-09T06:27:02Z

thank you.

cau-git · 2025-05-23T08:43:25Z

@Bill-XU Thanks for outlining your findings and solution. I will close this issue as resolved.
@pusapatiakhilraju Feel free to re-open if you have further questions or feedback.

pusapatiakhilraju added the question label May 1, 2025

cau-git closed this as completed May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom OCR #1502

Custom OCR #1502

pusapatiakhilraju commented May 1, 2025 •

edited

Loading

dolfim-ibm commented May 2, 2025

Uh oh!

Bill-XU commented May 8, 2025

Uh oh!

Bill-XU commented May 8, 2025

Uh oh!

pusapatiakhilraju commented May 9, 2025

Uh oh!

cau-git commented May 23, 2025

Uh oh!

Custom OCR #1502

Custom OCR #1502

Comments

pusapatiakhilraju commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Question

dolfim-ibm commented May 2, 2025

Uh oh!

Bill-XU commented May 8, 2025

Uh oh!

Bill-XU commented May 8, 2025

Uh oh!

pusapatiakhilraju commented May 9, 2025

Uh oh!

cau-git commented May 23, 2025

Uh oh!

pusapatiakhilraju commented May 1, 2025 •

edited

Loading