# Docling image parsing example
![alt text](images/Docling%20qs%20earch.png)

based on this https://docling-project.github.io/docling/examples/pictures_description_api/


### 0. Preperation

install docling and split the document and managable sizes (for demo purposes only)

In [2]:
!pip install docling

Collecting docling
  Downloading docling-2.38.0-py3-none-any.whl.metadata (10 kB)
Collecting pydantic<3.0.0,>=2.0.0 (from docling)
  Downloading pydantic-2.11.7-py3-none-any.whl.metadata (67 kB)
Collecting docling-core<3.0.0,>=2.29.0 (from docling-core[chunking]<3.0.0,>=2.29.0->docling)
  Downloading docling_core-2.38.1-py3-none-any.whl.metadata (6.5 kB)
Collecting docling-ibm-models<4.0.0,>=3.4.4 (from docling)
  Downloading docling_ibm_models-3.6.0-py3-none-any.whl.metadata (6.7 kB)
Collecting docling-parse<5.0.0,>=4.0.0 (from docling)
  Downloading docling_parse-4.0.5-cp313-cp313-macosx_14_0_arm64.whl.metadata (9.6 kB)
Collecting filetype<2.0.0,>=1.2.0 (from docling)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting pypdfium2<5.0.0,>=4.30.0 (from docling)
  Downloading pypdfium2-4.30.1-py3-none-macosx_11_0_arm64.whl.metadata (48 kB)
Collecting pydantic-settings<3.0.0,>=2.3.0 (from docling)
  Downloading pydantic_settings-2.10.0-py3-none-any.whl.metadata 

In [28]:
# split pdf using python script
# this snippet includes an interesting diagrtam
!python pdf_splitter_tool.py ../../00_setup/01_raw_data/db2_data_mov_115.pdf 19 20 pdf_snippet_with_diagram_0.pdf

Pages 19-20 extracted to pdf_snippet_with_diagram_0.pdf


### 1. Basic docling parsing

This ignores images and just leaves a < image > marker in the parsed markdown

In [30]:
from docling.document_converter import DocumentConverter

source = "pdf_snippet_with_diagram_0.pdf"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"
# store md in a file with the same name as the pdf + basic_pipeline.md
with open(f"{source.split('.')[0]}_1_basic_pipeline.md", "w") as f:
    f.write(result.document.export_to_markdown())



## Typed table export considerations

You can use the Db2 export utility can be used to move data out of typed tables for a later import. Export moves data from one hierarchical structure of typed tables to another by following a specific order and creating an intermediate flat file.

When working with typed tables, the export utility controls what is placed in the output file; specify only the target table name and, optionally, the WHERE clause. You can express subselect statements only by specifying the target table name and the WHERE clause. You cannot specify a fullselect or selectstatement when exporting a hierarchy.

## Preservation of hierarchies using traverse order

Typed tables can be in a hierarchy. There are several ways you can move data across hierarchies:

- · Movement from one hierarchy to an identical hierarchy
- · Movement from one hierarchy to a subsection of a larger hierarchy
- · Movement from a subsection of a large hierarchy to a separate hierarchy

Identificatio

### 2. Setting up vision based parsing

This will send all detected images to a multimodel LLM in watsonx

In [34]:
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import (
    PdfPipelineOptions,
    PictureDescriptionApiOptions,
)
from docling.document_converter import DocumentConverter, PdfFormatOption
import requests
from docling_core.types.doc import PictureItem
from dotenv import load_dotenv

In [35]:
def watsonx_vlm_options():
    load_dotenv()
    api_key = "J0tL1Ye1j3y5ZZwSgwMXFMbdZVIJPS1ehYsP5wFGoaVt"
    project_id = "0701487d-97b1-4583-8980-b3cbfcac032a"

    def _get_iam_access_token(api_key: str) -> str:
        res = requests.post(
            url="https://iam.cloud.ibm.com/identity/token",
            headers={
                "Content-Type": "application/x-www-form-urlencoded",
            },
            data=f"grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey={api_key}",
        )
        res.raise_for_status()
        api_out = res.json()
        print(f"{api_out=}")
        return api_out["access_token"]

    options = PictureDescriptionApiOptions(
        url="https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2023-05-29",
        params=dict(
            model_id="meta-llama/llama-3-2-90b-vision-instruct",
            project_id=project_id,
            parameters=dict(
                max_new_tokens=400,
            ),
        ),
        headers={
            "Authorization": "Bearer " + _get_iam_access_token(api_key=api_key),
        },
        prompt="""You are creating a description of the image to make it usable in a RAG pipeline. 
                    Describe the image in a few sentences. 
                    If it is a diagram, make sure to describe the relationships between the elements.
                    Be consise and accurate.""",
        timeout=60,
    )
    return options

In [37]:
def main(source):

    pipeline_options = PdfPipelineOptions(
        enable_remote_services=True  # <-- this is required!
    )
    pipeline_options.do_picture_description = True
    pipeline_options.picture_description_options = watsonx_vlm_options()

    doc_converter = DocumentConverter(
        format_options={
            InputFormat.PDF: PdfFormatOption(
                pipeline_options=pipeline_options,
            )
        }
    )
    result = doc_converter.convert(source)
    

    for element, _level in result.document.iterate_items():
        if isinstance(element, PictureItem):
            print(
                f"Picture {element.self_ref}\n"
                f"Caption: {element.caption_text(doc=result.document)}\n"
                f"Annotations: {element.annotations}"
            )
    with open(f"{source.split('.')[0]}_2_advanced_pipeline.md", "w") as f:
        f.write(result.document.export_to_markdown())
        
if __name__ == "__main__":
    main(source)
    

api_out={'access_token': 'eyJraWQiOiIyMDE5MDcyNCIsImFsZyI6IlJTMjU2In0.eyJpYW1faWQiOiJJQk1pZC02NjUwMDI0SFhOIiwiaWQiOiJJQk1pZC02NjUwMDI0SFhOIiwicmVhbG1pZCI6IklCTWlkIiwianRpIjoiNDllZDE3MTAtOGViYS00YzQ1LTg2Y2UtZGI5NWVmZjQ3Nzg0IiwiaWRlbnRpZmllciI6IjY2NTAwMjRIWE4iLCJnaXZlbl9uYW1lIjoiTWF4aW1pbGlhbiIsImZhbWlseV9uYW1lIjoiSmVzY2giLCJuYW1lIjoiTWF4aW1pbGlhbiBKZXNjaCIsImVtYWlsIjoiTWF4aW1pbGlhbi5KZXNjaEBpYm0uY29tIiwic3ViIjoiTWF4aW1pbGlhbi5KZXNjaEBpYm0uY29tIiwiYXV0aG4iOnsic3ViIjoiTWF4aW1pbGlhbi5KZXNjaEBpYm0uY29tIiwiaWFtX2lkIjoiSUJNaWQtNjY1MDAyNEhYTiIsIm5hbWUiOiJNYXhpbWlsaWFuIEplc2NoIiwiZ2l2ZW5fbmFtZSI6Ik1heGltaWxpYW4iLCJmYW1pbHlfbmFtZSI6Ikplc2NoIiwiZW1haWwiOiJNYXhpbWlsaWFuLkplc2NoQGlibS5jb20ifSwiYWNjb3VudCI6eyJ2YWxpZCI6dHJ1ZSwiYnNzIjoiY2Y0ZDMyOTNjNGU1NjgyMjNmZGIxYjQwOGZmY2U3NTIiLCJpbXNfdXNlcl9pZCI6IjEzNDA0MjA4IiwiZnJvemVuIjp0cnVlLCJpbXMiOiIyMDMyMzQwIn0sIm1mYSI6eyJpbXMiOnRydWV9LCJpYXQiOjE3NTA3NjQzMDksImV4cCI6MTc1MDc2NzkwOSwiaXNzIjoiaHR0cHM6Ly9pYW0uY2xvdWQuaWJtLmNvbS9pZGVudGl0eSIsImdyYW50X3R5cGUiOiJ1cm



Picture #/pictures/0
Caption: Figure 1. An example of a hierarchy
Annotations: [DescriptionAnnotation(kind='description', text='The image presents a flowchart illustrating the relationships between various entities, including Person, Employee, Student, Manager, and Architect. The chart is divided into several sections, each representing a distinct entity or relationship.\n\n**Person Section:**\n\n*   The Person section is located at the top of the chart and contains the following information:\n    *   Person_id (OID, Name, Age)\n    *   Employee Employee_t (SerialNum, Salary, REF (Department_t))\n    *   Student Student_t (SerialNum, Marks)\n\n**Employee Section:**\n\n*   The Employee section is connected to the Person section and contains the following information:\n    *   Employee_id (SerialNum, Salary, REF (Department_t))\n\n**Student Section:**\n\n*   The Student section is also connected to the Person section and contains the following information:\n    *   Student_id (SerialNum,