You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to start a fresh discussion around the two PRs I made because I think it could help a lot of people using Docling in the most common way: take messy documents, convert them into text/structure, and feed that into AI systems.
It does that now - but it can take a long time. What if it would yield it as it goes along? Not only that, but you'll have a consistent client in 12+ languages. Finally, your documents will serialize with the same API years later.
Right now, a lot of Docling usage ends up looking like this:
send in a PDF or Office document;
get back Markdown, JSON, chunks, tables, pictures, metadata, etc.;
pass that output into another service for embedding, indexing, RAG, agents, search, storage, evaluation, or enrichment.
Perfect. That's why I love it.
But once Docling becomes part of a larger pipeline, the boundaries start to matter a lot. If every service is passing around large JSON payloads, every client has to know the shape of the response, every integration has to keep its own assumptions straight, and every non-Python consumer has to rebuild the contract by hand.
This PR is meant to make that boundary much stronger and faster.
tl;dr Summary - and there's a repo to try it out in multiple languages
gRPC - what is it?
I've introduced a non-invasive gRPC server. gRPC was already popular but now it's especially getting popular to use as an API that works far better than rest by modernizing your API with true streaming calls.
A large amount of servers are introducing grpc endpoints - including a data loading API by OpenSearch. As a follow up - that would be pretty easy to tack on to really make your pipeline faster and cheaper.
Strongly typed responses
Al gives Docling a strongly typed transfer format and gRPC service surface, while keeping the existing Python document model as the source of truth. If you just want to convert docs to text for AI, the benefit is not "protobuf for protobuf's sake". The benefit is that Docling output can move through production systems with clearer contracts, less glue code, and a better path toward streaming and high-throughput use cases. Not only that, instant 12 language APIs - no more parsing JSON and dealing with JSON library annoyances.
Client/server problems of pipelines
Here is the pain point in pipeline form. The document conversion step can be expensive for long PDFs, and once the output is JSON, each downstream hop tends to parse, inspect, reshape, and serialize it again:
flowchart LR
A[Long PDF] --> B[Docling conversion]
B --> C[Large JSON response]
C --> D[Parse in app]
D --> E[Chunk / enrich]
E --> F[Embed / index]
F --> G[RAG or agent]
B -. minutes for large docs .-> C
C -. repeated JSON encode/decode .-> F
Loading
The gRPC/protobuf path is meant to make that boundary cheaper and cleaner. Docling still does the hard conversion work, but the result can move through the rest of the pipeline as generated, typed messages instead of a hand-maintained JSON contract:
flowchart LR
A[PDF or Office doc] --> B[Docling gRPC service]
B --> C[Typed DoclingDocument protobuf]
C --> D[Streaming status / future partial results]
C --> E[Chunking service]
E --> F[Embedding / indexing]
F --> G[AI application]
C -. generated clients .-> H[Python / Go / Java / TS / Rust]
Loading
What you get for free
You don't wait minutes to see data
A simple improvement from here, you can get each page live as it's parsed. Instead of waiting for 100 pages to timeout and mess with timeout settings, you can get it page-by-page without polling.
You get all the advantages of security, mTLS, etc that are built into the gRPC server
The gRPC servers for each language are fast as hell and beasts when it comes to speed.
12 supported languages OOTB - and a governed schema that you can register and evolve
The docling format is amazing - it's here to stay. This has a 1:1 parity of data representation that makes it easy for both LLMs and coders to navigate.
Try it
If you have a document pipeline, I would really appreciate people trying this with one real document and telling me what feels wrong. Not a polished benchmark, just:
pick a PDF or Office document you already use with Docling;
run it through the gRPC path;
look at the generated client objects / protobuf output;
tell me where the contract feels awkward, missing, too strict, or too loose.
If you do try it, the most useful feedback would be concrete: language used, document type, whether the generated client felt natural, and what you would need before trusting this in a real pipeline.
I would really like feedback, questions, objections, or even "this does not solve my use case because..." comments. The point of this discussion is to figure out whether this is the right shape and hopefully consensus to get this feature in.
Why I think this matters:
I process a lot of text for a living. I contribute to tokenizers on apache, help speed up search engines, and deal with garbage data all the time. I feel like docling needs something like this to augment it's capabilities to the next level while still maintaining it's current JSON posture.
For AI pipelines, Docling output often needs to move between multiple services, not just back to one Python caller.
A typed schema lets clients generate code instead of guessing what a JSON object contains.
gRPC avoids a lot of repeated JSON encode/decode overhead for systems that process many documents.
Strong types make it easier to catch integration mistakes early, especially in non-Python services.
This keeps the existing Docling document model in charge and adds conversion/validation around it instead of making the core model care about protobuf.
A few things I especially want feedback on:
Strong typing vs. flexibility
I tried to avoid falling back to JSON blobs except where the underlying model really is open-ended. Enums, oneofs, structured messages, typed lists, and typed references are used where possible. For places where Docling may add future enum values, the schema uses typed enum fields with *_raw companions so clients get strong typing today without breaking on newer values tomorrow.
Schema validation
The conversion code includes validation intended to catch drift between the Pydantic model and the protobuf schema. The goal is that if DoclingDocument grows new fields or changes shape, the mismatch is visible instead of silently losing information.
Non-invasive design
I tried hard not to make the Pydantic model "protobuf aware". The core model should not need protobuf annotations, custom wrappers, or wire-format assumptions. The conversion layer carries that responsibility.
Pipeline performance
For services that chain parsing, enrichment, chunking, indexing, and storage, a typed binary transport can cut down on repeated JSON encode/decode costs and reduce boundary friction. It also makes it easier for non-Python services to participate without hand-maintained JSON contracts.
Future streaming
The current PR is not trying to solve every streaming use case at once, but the gRPC boundary gives us a path toward it. Progress updates, partial results, page-level output, chunk streams, or future incremental conversion flows become much more natural once the service contract is typed and RPC-based.
This is also split across the two places where I think it belongs: docling-core #546 owns the protobuf schema and conversion for the document model, while docling-serve #504 owns the gRPC service surface. The PR discussions are already tied together, and I also put together a small examples repo showing clients in Python, Go, Java, TypeScript/Node, and Rust: ai-pipestream/docling-grpc-examples. That part matters to me because the point is not only Python-to-Python performance; it is making Docling easier to plug into mixed-language AI systems.
The main thing I would like to get right is the shape of the contract before it goes much further. If the maintainers think this direction is useful, I would appreciate feedback on:
whether the protobuf package/message layout feels maintainable;
whether any fields are too close to implementation details rather than document semantics;
whether the drift validation is strict enough, or too strict;
whether the gRPC service surface should stay minimal for now or expose more REST parity immediately;
any naming or compatibility concerns that would make this harder to support long term.
I know this is a fairly large PR, but the intent is conservative: keep Docling's existing model as the system of record, add a strongly typed schema around it, and make it easier to run Docling as part of fast distributed document pipelines.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
docling-project/docling-serve#504
docling-project/docling-core#546
I wanted to start a fresh discussion around the two PRs I made because I think it could help a lot of people using Docling in the most common way: take messy documents, convert them into text/structure, and feed that into AI systems.
It does that now - but it can take a long time. What if it would yield it as it goes along? Not only that, but you'll have a consistent client in 12+ languages. Finally, your documents will serialize with the same API years later.
Right now, a lot of Docling usage ends up looking like this:
Perfect. That's why I love it.
But once Docling becomes part of a larger pipeline, the boundaries start to matter a lot. If every service is passing around large JSON payloads, every client has to know the shape of the response, every integration has to keep its own assumptions straight, and every non-Python consumer has to rebuild the contract by hand.
This PR is meant to make that boundary much stronger and faster.
tl;dr Summary - and there's a repo to try it out in multiple languages
gRPC - what is it?
I've introduced a non-invasive gRPC server. gRPC was already popular but now it's especially getting popular to use as an API that works far better than rest by modernizing your API with true streaming calls.
A large amount of servers are introducing grpc endpoints - including a data loading API by OpenSearch. As a follow up - that would be pretty easy to tack on to really make your pipeline faster and cheaper.
Strongly typed responses
Al gives Docling a strongly typed transfer format and gRPC service surface, while keeping the existing Python document model as the source of truth. If you just want to convert docs to text for AI, the benefit is not "protobuf for protobuf's sake". The benefit is that Docling output can move through production systems with clearer contracts, less glue code, and a better path toward streaming and high-throughput use cases. Not only that, instant 12 language APIs - no more parsing JSON and dealing with JSON library annoyances.
Client/server problems of pipelines
Here is the pain point in pipeline form. The document conversion step can be expensive for long PDFs, and once the output is JSON, each downstream hop tends to parse, inspect, reshape, and serialize it again:
flowchart LR A[Long PDF] --> B[Docling conversion] B --> C[Large JSON response] C --> D[Parse in app] D --> E[Chunk / enrich] E --> F[Embed / index] F --> G[RAG or agent] B -. minutes for large docs .-> C C -. repeated JSON encode/decode .-> FThe gRPC/protobuf path is meant to make that boundary cheaper and cleaner. Docling still does the hard conversion work, but the result can move through the rest of the pipeline as generated, typed messages instead of a hand-maintained JSON contract:
flowchart LR A[PDF or Office doc] --> B[Docling gRPC service] B --> C[Typed DoclingDocument protobuf] C --> D[Streaming status / future partial results] C --> E[Chunking service] E --> F[Embedding / indexing] F --> G[AI application] C -. generated clients .-> H[Python / Go / Java / TS / Rust]What you get for free
You don't wait minutes to see data
A simple improvement from here, you can get each page live as it's parsed. Instead of waiting for 100 pages to timeout and mess with timeout settings, you can get it page-by-page without polling.
You get all the advantages of security, mTLS, etc that are built into the gRPC server
The gRPC servers for each language are fast as hell and beasts when it comes to speed.
12 supported languages OOTB - and a governed schema that you can register and evolve
The docling format is amazing - it's here to stay. This has a 1:1 parity of data representation that makes it easy for both LLMs and coders to navigate.
Try it
If you have a document pipeline, I would really appreciate people trying this with one real document and telling me what feels wrong. Not a polished benchmark, just:
There is a small examples repo here with clients in Python, Go, Java, TypeScript/Node, and Rust:
ai-pipestream/docling-grpc-examples.Quick snippet in go and python
There is a small examples repo here with clients in Python, Go, Java, TypeScript/Node, and Rust:
ai-pipestream/docling-grpc-examples.The client shape is pretty small. In Python:
And the same basic shape in Go:
If you do try it, the most useful feedback would be concrete: language used, document type, whether the generated client felt natural, and what you would need before trusting this in a real pipeline.
I would really like feedback, questions, objections, or even "this does not solve my use case because..." comments. The point of this discussion is to figure out whether this is the right shape and hopefully consensus to get this feature in.
Why I think this matters:
I process a lot of text for a living. I contribute to tokenizers on apache, help speed up search engines, and deal with garbage data all the time. I feel like docling needs something like this to augment it's capabilities to the next level while still maintaining it's current JSON posture.
A few things I especially want feedback on:
Strong typing vs. flexibility
I tried to avoid falling back to JSON blobs except where the underlying model really is open-ended. Enums, oneofs, structured messages, typed lists, and typed references are used where possible. For places where Docling may add future enum values, the schema uses typed enum fields with
*_rawcompanions so clients get strong typing today without breaking on newer values tomorrow.Schema validation
The conversion code includes validation intended to catch drift between the Pydantic model and the protobuf schema. The goal is that if
DoclingDocumentgrows new fields or changes shape, the mismatch is visible instead of silently losing information.Non-invasive design
I tried hard not to make the Pydantic model "protobuf aware". The core model should not need protobuf annotations, custom wrappers, or wire-format assumptions. The conversion layer carries that responsibility.
Pipeline performance
For services that chain parsing, enrichment, chunking, indexing, and storage, a typed binary transport can cut down on repeated JSON encode/decode costs and reduce boundary friction. It also makes it easier for non-Python services to participate without hand-maintained JSON contracts.
Future streaming
The current PR is not trying to solve every streaming use case at once, but the gRPC boundary gives us a path toward it. Progress updates, partial results, page-level output, chunk streams, or future incremental conversion flows become much more natural once the service contract is typed and RPC-based.
This is also split across the two places where I think it belongs:
docling-core#546 owns the protobuf schema and conversion for the document model, whiledocling-serve#504 owns the gRPC service surface. The PR discussions are already tied together, and I also put together a small examples repo showing clients in Python, Go, Java, TypeScript/Node, and Rust:ai-pipestream/docling-grpc-examples. That part matters to me because the point is not only Python-to-Python performance; it is making Docling easier to plug into mixed-language AI systems.The main thing I would like to get right is the shape of the contract before it goes much further. If the maintainers think this direction is useful, I would appreciate feedback on:
I know this is a fairly large PR, but the intent is conservative: keep Docling's existing model as the system of record, add a strongly typed schema around it, and make it easier to run Docling as part of fast distributed document pipelines.
Beta Was this translation helpful? Give feedback.
All reactions