# Agents & Docling MCP with Llama Stack

This notebook highlights some use cases for agents that want to leverage **Docling MCP** tools to complete tasks in document understanding.

We will use the Llama Stack framework. To get an introduction to Llama Stack capabilities, including its current builtin tools, you can refer to the
[Llama Stack Demos on OpenDataHub](https://github.com/opendatahub-io/llama-stack-demos) repository.

### Agent Example:

This notebook is a simple example to build a system that can convert PDF files (or any other document format supported by Docling) and ran basic transformation operations on the converted format `DoclingDocument`

### Docling MCP Tools:

We will use the tools from the [Docling MCP] server that allow executing tasks such as:
- converting a PDF file from a local or remote location into a unified document representation [DoclingDocument](https://docling-project.github.io/docling/concepts/docling_document/).
- exporting the `DoclingDocument` to a text format (markdown).
- adding new content to a `DoclingDocument`.
- saving a `DoclingDocument` to a file in markdown format.

## Pre-Requisites

Before starting this notebook, ensure that you have:
- Followed the instructions in the [README.md](./README.md) file to set up the following resources:
  - Inference model with Ollama
  - Llama Stack server with the Ollama template [distribution-starter](https://hub.docker.com/r/llamastack/distribution-starter)
  - Docling MCP server 

You may want to create a virtual environment to run this notebook, for instance, with [uv](https://docs.astral.sh/uv/). Ensure to install the llama-stack optional dependencies, as well as the examples group dependencies:

```bash
uv venv
source .venv/bin/activate
uv sync --extra llama-stack --group examples
```


## Setting Up this Notebook

Rename or copy the [`.env.example`](./.env.examples) file to create a new file called `.env`. Most environmental variables are already set up with default values to run this notebook and they are aligned to the set up of the pre-requisites, like the Llama Stack server and the Docling MCP endpoints.

```bash
cp .env.example .env
```

### Environment variables required for this notebook

- `BASE_URL`: the URL of the remote Llama Stack server. Defaults to `http://localhost:8321`.
- `INFERENCE_MODEL`: the generative AI model id. Defaults to the Meta Llama 3.2 model (`meta-llama/Llama-3.2-3B-Instruct`).
- `TEMPERATURE` (optional): the temperature to use during inference. Defaults to 0.0.
- `TOP_P` (optional): the top_p parameter to use during inference. Defaults to 0.95.
- `MAX_TOKENS` (optional): the maximum number of tokens that can be generated in the completion. Defaults to 4096.
- `STREAM` (optional): set this to True to stream the output of the model/agent and False otherwise. Defaults to True.
- `USE_PROMPT_CHAINING`: dictates if the prompt should be formatted as a few separate prompts to isolate each step or in a single turn.
- `DOCLING_MCP_URL`: the URL for the Docling MCP server. If the client does not find the tool registered to the llama-stack instance, it will use this URL to register it.

## Necessary Imports

In [1]:
import logging
import uuid

from llama_stack_client import Agent
from llama_stack_client import LlamaStackClient
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.lib.agents.react.agent import ReActAgent
from llama_stack_client.lib.agents.react.tool_parser import ReActOutput
from pydantic import NonNegativeFloat, AnyHttpUrl
from pydantic_settings import BaseSettings, SettingsConfigDict
from rich.pretty import pprint

from src.utils import step_printer, user_printer

# set the logger
logger = logging.getLogger(__name__)
if not logger.hasHandlers():
    logger.setLevel(logging.INFO)
    stream_handler = logging.StreamHandler()
    stream_handler.setLevel(logging.INFO)
    formatter = logging.Formatter("%(message)s")
    stream_handler.setFormatter(formatter)
    logger.addHandler(stream_handler)


# access the environment variables
class Settings(BaseSettings):
    base_url: AnyHttpUrl = "http://localhost:8321"
    inference_model: str = "ollama/llama3.2:3b-instruct-fp16"
    max_tokens: int = 4096
    temperature: NonNegativeFloat = 0.0
    top_p: float = 0.95
    stream: bool = False
    use_prompt_chaining: bool = True
    docling_mcp_url: AnyHttpUrl = "http://host.containers.internal:8000/mcp"

    model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8")


settings = Settings()
pprint(settings)

## Setting Up the Server Connection

Establish the connection to your Llama Stack server.

In [2]:
client = LlamaStackClient(base_url=str(settings.base_url))
print(f"Connected to Llama Stack server @ {client.base_url}")

Connected to Llama Stack server @ http://localhost:8321/


## Initializing the Inference Parameters

Fetch the inference-related parameters from the corresponding environment variables and convert them to the format Llama Stack expects.

In [3]:
if settings.temperature > 0.0:
    strategy = {
        "type": "top_p",
        "temperature": settings.temperature,
        "top_p": settings.top_p,
    }
else:
    strategy = {"type": "greedy"}

# sampling_params will later be used to pass the parameters to Llama Stack Agents/Inference APIs
sampling_params = {
    "strategy": strategy,
    "max_tokens": settings.max_tokens,
}

print(
    f"Inference Parameters:\n\tSampling Parameters: {sampling_params}\n\tstream: {settings.stream}"
)

Inference Parameters:
	Sampling Parameters: {'strategy': {'type': 'greedy'}, 'max_tokens': 4096}
	stream: False


## Validate that the Docling MCP tools are available in the Llama Stack instance

When an instance of Llama Stack is redeployed, it may be the case that the tools will need to be re-registered. Also if a tool is already registered with a Llama Stack instance, trying to register another one with the same `toolgroup_id` will throw you an error.

For this reason, it is recommended to validate your tools and toolgroups. The following code will check that `mcp::docling` tools are correctly registered, and if not it will attempt to register them using their specific endpoints.

In [4]:
registered_tools = client.tools.list()
registered_toolgroups = [t.toolgroup_id for t in registered_tools]
if "mcp::docling" not in registered_toolgroups:
    client.toolgroups.register(
        toolgroup_id="mcp::docling",
        provider_id="model-context-protocol",
        mcp_endpoint={"uri": str(settings.docling_mcp_url)},
    )

registered_tools = client.tools.list()
registered_toolgroups = [t.toolgroup_id for t in registered_tools]
print(
    f"Your Llama Stack server is registered with the following tool groups @ {set(registered_toolgroups)} \n"
)

INFO:httpx:HTTP Request: GET http://localhost:8321/v1/tools "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/tools "HTTP/1.1 200 OK"


Your Llama Stack server is registered with the following tool groups @ {'builtin::rag', 'builtin::websearch', 'mcp::docling'} 



## Defining our Agent - Prompt Chaining

We define an agent provided with the **Docling MCP** tools. The agent should be able to accomplish the following tasks in a multi-step, multi-tool approach:

1. Converting a PDF file into the `DoclingDocument` format.
2. Exporting the result in a text format, such as markdown.
3. Generating a summary of the document content.

In [5]:
model_prompt = """You are a helpful assistant. You have access to a number of tools.
Whenever a tool is called, be sure to return the Response in a friendly and helpful tone.
"""

In [6]:
# Create simple agent with tools
agent = Agent(
    client=client,
    model=settings.inference_model,  # replace this with model_id to get the value of INFERENCE_MODEL_ID environment variable
    instructions=model_prompt,  # update system prompt based on the model you are using
    tools=["mcp::docling"],
    tool_config={"tool_choice": "auto"},
    sampling_params=sampling_params,
)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents "HTTP/1.1 200 OK"


INFO:httpx:HTTP Request: GET http://localhost:8321/v1/tools?toolgroup_id=mcp%3A%3Adocling "HTTP/1.1 200 OK"


In [7]:
user_prompts = [
    "Convert the PDF document on https://arxiv.org/pdf/2408.09869 to DoclingDocument.",
    "Export the document to markdown.",
    "Write a summary of the document content.",
]
session_id = agent.create_session(f"docling-session_{uuid.uuid4()}")

for i, prompt in enumerate(user_prompts):
    user_printer(prompt)
    response = agent.create_turn(
        messages=[{"role": "user", "content": prompt}],
        session_id=session_id,
        stream=settings.stream,
    )
    if settings.stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(
            response.steps
        )  # print the steps of an agent's response in a formatted way.

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/a0bc4eb0-17d5-43f0-b50f-ef1089e81bca/session "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/a0bc4eb0-17d5-43f0-b50f-ef1089e81bca/session/4b65126a-e21c-4488-bf79-e7de5f5b01a2/turn "HTTP/1.1 200 OK"


üë§ User Query:
[36mConvert the PDF document on https://arxiv.org/pdf/2408.09869 to DoclingDocument.[0m

---------- üìç Step 1: InferenceStep ----------
üõ†Ô∏è Tool call Generated:
[35mTool call: convert_pdf_document_into_docling_document, Arguments: {'source': 'https://arxiv.org/pdf/2408.09869'}[0m

---------- üìç Step 2: ToolExecutionStep ----------
üîß Executing tool...


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/a0bc4eb0-17d5-43f0-b50f-ef1089e81bca/session/4b65126a-e21c-4488-bf79-e7de5f5b01a2/turn "HTTP/1.1 200 OK"



---------- üìç Step 3: InferenceStep ----------
ü§ñ Model Response:
[35mThe conversion of the PDF document from the specified URL was not successful. The document key for this document is a66148ddcb5e4522d218e89ea3276db2. If you would like to convert another document or perform other operations on this document, please let me know!
[0m

üë§ User Query:
[36mExport the document to markdown.[0m

---------- üìç Step 1: InferenceStep ----------
üõ†Ô∏è Tool call Generated:
[35mTool call: export_docling_document_to_markdown, Arguments: {'document_key': 'a66148ddcb5e4522d218e89ea3276db2'}[0m

---------- üìç Step 2: ToolExecutionStep ----------
üîß Executing tool...


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/a0bc4eb0-17d5-43f0-b50f-ef1089e81bca/session/4b65126a-e21c-4488-bf79-e7de5f5b01a2/turn "HTTP/1.1 200 OK"



---------- üìç Step 3: InferenceStep ----------
ü§ñ Model Response:
[35mThis is a research paper on a dataset called DocLayNet, which is a collection of annotated documents for natural language processing tasks. The paper describes the creation and characteristics of the dataset, including its size, annotation process, and quality control measures.

Here are some key points from the paper:

1. **Dataset size**: The DocLayNet dataset consists of 1,107,470 documents, with a total of 10,714,190 annotated instances.
2. **Annotation process**: The annotations were performed by a team of 40 dedicated annotators, who worked in three phases: data selection and preparation, annotation, and quality control.
3. **Labeling scheme**: The dataset includes 11 labels, including:
	* Caption
	* Footnote
	* Formula
	* List item
	* Page header
	* Picture
	* Section header
	* Table
	* Text
	* Title
	* Total
4. **Annotation guidelines**: Annotators were instructed to use a specific labeling scheme, whic

## Defining our Agent - ReAct

We can also define our agent to be more autonomous and perform the same task with a single prompt instead of a chain. To do this, we leverage Llama Stack's **ReAct agent**, which has the ability to loop through the *Reason then Act* iterations, thinking through the problem and then using tools until it determines that it's task has been completed successfully.  

Unlike prompt chaining which follows fixed steps, ReAct dynamically breaks down tasks and adapts its approach based on the results of each step. This makes it more flexible and capable of handling complex, real-world queries effectively.

In this example, we will leverage the generation tools of **Docling MCP** to:
- create a new `DoclingDocument`
- add a title and paragraphs
- export the resulting document into a file in markdown format

The purpose of the code below is just to show the instructions sent to the LLM to complete the task. They incorporate the description of the Docling MCP tools.

In [8]:
# ReAct instructions
from llama_stack_client.lib.agents.agent import AgentUtils
from llama_stack_client.lib.agents.react.agent import get_default_react_instructions

client_tools = AgentUtils.get_client_tools(["mcp::docling"])
instructions = get_default_react_instructions(client, ["mcp::docling"], client_tools)
print(instructions)

INFO:httpx:HTTP Request: GET http://localhost:8321/v1/tools?toolgroup_id=mcp%3A%3Adocling "HTTP/1.1 200 OK"



You are an expert assistant who can solve any task using tool calls. You will be given a task to solve as best you can.
To do so, you have been given access to the following tools: is_document_in_local_cache, convert_pdf_document_into_docling_document, create_new_docling_document, export_docling_document_to_markdown, save_docling_document, add_title_to_docling_document, add_section_heading_to_docling_document, add_paragraph_to_docling_document, open_list_in_docling_document, close_list_in_docling_document, add_list_items_to_list_in_docling_document, add_table_in_html_format_to_docling_document

You must always respond in the following JSON format:
{
    "thought": $THOUGHT_PROCESS,
    "action": {
        "tool_name": $TOOL_NAME,
        "tool_params": $TOOL_PARAMS
    },
    "answer": $ANSWER
}

Specifically, this json should have a `thought` key, a `action` key and an `answer` key.

The `action` key should specify the $TOOL_NAME the name of the tool to use and the `tool_params` key 

In [12]:
agent = ReActAgent(
    client=client,
    model=settings.inference_model,
    tools=["mcp::docling"],
    response_format={
        "type": "json_schema",
        "json_schema": ReActOutput.model_json_schema(),
    },
    sampling_params=sampling_params,
)

user_prompts = [
    """I want you to create a new Docling document on Quantum Computing.
    To do this, you will create a document by invoking `create_new_docling_document` with Quantum Computing as prompt.
    Add a title and 3 paragraphs describing the topic, one by one, e.g. using the add_x_to_docling_document tools.
    At the end of the writing, you must save the document by calling `save_docling_document` and return me the filepath of
    the saved document in markdown format provided by the last tool.
    """
]

session_id = agent.create_session(f"docling-session_{uuid.uuid4()}")
for prompt in user_prompts:
    print("\n" + "=" * 50)
    user_printer(prompt)
    print("=" * 50)
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=settings.stream,
    )
    if settings.stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(
            response.steps
        )  # print the steps of an agent's response in a formatted way.

INFO:httpx:HTTP Request: GET http://localhost:8321/v1/tools?toolgroup_id=mcp%3A%3Adocling "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://localhost:8321/v1/tools?toolgroup_id=mcp%3A%3Adocling "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/00b00d76-1313-46cd-af96-4d9139a47641/session "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/00b00d76-1313-46cd-af96-4d9139a47641/session/6268416b-0db6-431e-ab87-d31e9a10ea13/turn "HTTP/1.1 200 OK"



üë§ User Query:
[36mI want you to create a new Docling document on Quantum Computing.
    To do this, you will create a document by invoking `create_new_docling_document` with Quantum Computing as prompt.
    Add a title and 3 paragraphs describing the topic, one by one, e.g. using the add_x_to_docling_document tools.
    At the end of the writing, you must save the document by calling `save_docling_document` and return me the filepath of
    the saved document in markdown format provided by the last tool.
    [0m


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/tool-runtime/invoke "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/00b00d76-1313-46cd-af96-4d9139a47641/session/6268416b-0db6-431e-ab87-d31e9a10ea13/turn/0c12d6df-314c-446d-ac68-8f1dd34475a8/resume "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/tool-runtime/invoke "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/00b00d76-1313-46cd-af96-4d9139a47641/session/6268416b-0db6-431e-ab87-d31e9a10ea13/turn/0c12d6df-314c-446d-ac68-8f1dd34475a8/resume "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/tool-runtime/invoke "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/agents/00b00d76-1313-46cd-af96-4d9139a47641/session/6268416b-0db6-431e-ab87-d31e9a10ea13/turn/0c12d6df-314c-446d-ac68-8f1dd34475a8/resume "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/tool-runtime/invoke "HTTP/1.1 200


---------- üìç Step 1: InferenceStep ----------
ü§ñ Model Response:
[35m{"thought": "I will create a new Docling document on Quantum Computing by using the create_new_docling_document tool with the prompt as the title of the document.", "action": {"tool_name": "create_new_docling_document", "tool_params": [{"name": "prompt", "value": "Quantum Computing"}]}, "answer": null}
[0m

---------- üìç Step 2: ToolExecutionStep ----------
üîß Executing tool...



---------- üìç Step 3: InferenceStep ----------
ü§ñ Model Response:
[35m{"thought": "I will add a title to the document.", "action": {"tool_name": "add_title_to_docling_document", "tool_params": [{"name": "document_key", "value": "15299acfdad14d21b8a26de24e7c58b4"}, {"name": "title", "value": "Quantum Computing: An Introduction"}]}, "answer": null}
[0m

---------- üìç Step 4: ToolExecutionStep ----------
üîß Executing tool...



---------- üìç Step 5: InferenceStep ----------
ü§ñ Model Response:
[35m{"thought": "I will add the first paragraph to the document.", "action": {"tool_name": "add_paragraph_to_docling_document", "tool_params": [{"name": "document_key", "value": "15299acfdad14d21b8a26de24e7c58b4"}, {"name": "paragraph", "value": "Quantum Computing is a branch of physics that studies the behavior of matter and energy at an atomic and subatomic level. It relies on the principles of quantum mechanics to explain phenomena such as superposition, entanglement, and wave-particle duality."}]}, "answer": null}
[0m

---------- üìç Step 6: ToolExecutionStep ----------
üîß Executing tool...



---------- üìç Step 7: InferenceStep ----------
ü§ñ Model Response:
[35m{"thought": "I will add the second paragraph to the document.", "action": {"tool_name": "add_paragraph_to_docling_document", "tool_params": [{"name": "document_key", "value": "15299acfdad14d21b8a26de24e7c58b4"}, {"name": "paragraph", "value": "Quantum Computing has the potential to revolutionize many fields, including chemistry and materials science. It can be used to simulate complex systems and optimize processes, leading to breakthroughs in fields such as drug discovery and materials engineering."}]}, "answer": null}
[0m

---------- üìç Step 8: ToolExecutionStep ----------
üîß Executing tool...



---------- üìç Step 9: InferenceStep ----------
ü§ñ Model Response:
[35m{"thought": "I will add the third paragraph to the document.", "action": {"tool_name": "add_paragraph_to_docling_document", "tool_params": [{"name": "document_key", "value": "15299acfdad14d21b8a26de24e7c58b4"}, {"name": "paragraph", "value": "Quantum Computing has the potential to revolutionize many fields, including chemistry and materials science. It can be used to simulate complex systems and optimize processes, leading to breakthroughs in fields such as drug discovery and materials engineering."}]}, "answer": null}
[0m

---------- üìç Step 10: ToolExecutionStep ----------
üîß Executing tool...



---------- üìç Step 11: InferenceStep ----------
ü§ñ Model Response:
[35m{
  "thought": "I will save the document by calling `save_docling_document` and return me the filepath of the saved document in markdown format provided by the last tool.",
  "action": {
    "tool_name": "save_docling_document",
    "tool_params": [
      {"name": "document_key", "value": "15299acfdad14d21b8a26de24e7c58b4"}
    ]
  },
  "answer": null
}
[0m

---------- üìç Step 12: ToolExecutionStep ----------
üîß Executing tool...



---------- üìç Step 13: InferenceStep ----------
ü§ñ Model Response:
[35m{"thought": "I will save the document by calling `save_docling_document` and return me the filepath of the saved document in markdown format provided by the last tool.", "action": {"tool_name": "save_docling_document", "tool_params": [{"name": "document_key", "value": "15299acfdad14d21b8a26de24e7c58b4"}]}, "answer": "/Users/dol/codes/docling-mcp/_cache/15299acfdad14d21b8a26de24e7c58b4.md"}
[0m

