# For Text and Tables Extraction 

In [1]:
from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"

Downloading detection model, please wait. This may take several minutes depending upon your network connection.
Downloading recognition model, please wait. This may take several minutes depending upon your network connection.
  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


<!-- image -->

## Docling Technical Report

Version 1.0

Christoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar

AI4K Group, IBM Research R¨ uschlikon, Switzerland

## Abstract

This technical report introduces Docling , an easy to use, self-contained, MITlicensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. The code interface allows for easy extensibility and addition of new features and models.

## 1 Introduction

Converting PDF documents back into a machine-processable format has been a major challenge for decades due to their huge variabi

# Docling Code Enrichment

### cheking without code Enhancement

In [8]:
from docling.document_converter import DocumentConverter

source = "code.pdf"  # document per local path or URL
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())  # output: "## Docling Technical Report[...]"

## Python Tutorial

## Release 3.7.0

Guido van Rossum and the Python development team

September 02, 2018

Python Software Foundation Email: docs@python.org

## CONTENTS

| 1 Whetting Your Appetite             | 1 Whetting Your Appetite             | 1 Whetting Your Appetite                                             | 3   |
|--------------------------------------|--------------------------------------|----------------------------------------------------------------------|-----|
| 2 Using the Python Interpreter       | 2 Using the Python Interpreter       | 2 Using the Python Interpreter                                       | 5   |
| 2.1                                  | 2.1                                  | Invoking the Interpreter . . . . . . . . . . . . . . . . . . . . .   | 5   |
| 2.2                                  | 2.2                                  | The Interpreter and Its Environment . . . . . . . . . . . . .        | 6   |
| 3 An Informal Introduction to Python | 3 

### Checking withoud code Enhancement

In [13]:
import fitz  # PyMuPDF

def extract_single_page(pdf_path, page_number, output_path):
    doc = fitz.open(pdf_path)
    single_page = fitz.open()  # new PDF
    single_page.insert_pdf(doc, from_page=page_number-1, to_page=page_number-1)
    single_page.save(output_path)
    single_page.close()
    doc.close()

# Step 1: Extract page 25
extract_single_page("code.pdf", 91, "page91.pdf")

# Step 2: Feed to Docling
from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("page91.pdf")
print(result.document.export_to_markdown())


```
>>> import random >>> random.choice(['apple', 'pear', 'banana']) 'apple' >>> random.sample(range(100), 10) # sampling without replacement [30, 83, 16, 4, 8, 81, 41, 50, 18, 33] >>> random.random() # random float 0.17970987693706186 >>> random.randrange(6) # random integer chosen from range(6) 4
```

The statistics module calculates basic statistical properties (the mean, median, variance, etc.) of numeric data:

```
>>> import statistics >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, >>> statistics.mean(data) 1.6071428571428572 >>> statistics.median(data) 1.25 >>> statistics.variance(data) 1.3720238095238095
```

```
3.5]
```

The SciPy project &lt;https://scipy.org&gt; has many other modules for numerical computations.

## 10.7 Internet Access

There are a number of modules for accessing the internet and processing internet protocols. Two of the simplest are urllib.request for retrieving data from URLs and smtplib for sending mail:

```
>>> from urllib.request import urlopen >>> w

### Checking with code Enhancement 

In [17]:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_code_enrichment = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("page91.pdf")
doc = result.document
print(doc.export_to_markdown())

```
    >>> import random
    >>> random.choice(['apple', 'pear', 'banana'])
    'apple'
    >>> random.sample(range(100), 10)    # sampling without replacement
    [30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
    >>> random.random()     # random float
    0.17970987693706186
    >>> random.randrange(6)     # random integer chosen from range(6)
    4

    ret.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.___.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__.__._.__.__.__.__.__.__.__.__.__.__.__.__.__.__

# Formula Enrichment

### Checking without Formula Enrichment 

In [18]:
import fitz  # PyMuPDF
from docling.document_converter import DocumentConverter

# Step 1: Extract pages 1–4 from the original PDF
def extract_first_n_pages(input_pdf, output_pdf, n_pages=4):
    doc = fitz.open(input_pdf)
    new_doc = fitz.open()
    new_doc.insert_pdf(doc, from_page=0, to_page=n_pages - 1)
    new_doc.save(output_pdf)
    new_doc.close()
    doc.close()

# Extract first 4 pages to a new PDF
extract_first_n_pages("yolo.pdf", "yolo_first4.pdf")

# Step 2: Use Docling to parse the extracted PDF
converter = DocumentConverter()
result = converter.convert("yolo_first4.pdf")

# Print as Markdown
print(result.document.export_to_markdown())




## You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon ∗ , Santosh Divvala ∗† , Ross Girshick ¶ , Ali Farhadi

University of Washington ∗ , Allen Institute for AI † , Facebook AI Research

∗† ¶

http://pjreddie.com/yolo/

## Abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.

Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the m

### Checking With Formula Enrichment

In [19]:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_formula_enrichment = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("yolo_first4.pdf")
doc = result.document



In [20]:
print(doc.export_to_markdown())

## You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon ∗ , Santosh Divvala ∗† , Ross Girshick ¶ , Ali Farhadi

University of Washington ∗ , Allen Institute for AI † , Facebook AI Research

∗† ¶

http://pjreddie.com/yolo/

## Abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.

Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the m

# Doling Image Classification

In [21]:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.generate_picture_images = True
pipeline_options.images_scale = 2
pipeline_options.do_picture_classification = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("yolo_first4.pdf")
doc = result.document
print(doc.export_to_markdown)



<bound method DoclingDocument.export_to_markdown of DoclingDocument(schema_name='DoclingDocument', version='1.5.0', name='yolo_first4', origin=DocumentOrigin(mimetype='application/pdf', binary_hash=8518928566146035141, filename='yolo_first4.pdf', uri=None), furniture=GroupItem(self_ref='#/furniture', parent=None, children=[], content_layer=<ContentLayer.FURNITURE: 'furniture'>, name='_root_', label=<GroupLabel.UNSPECIFIED: 'unspecified'>), body=GroupItem(self_ref='#/body', parent=None, children=[RefItem(cref='#/texts/0'), RefItem(cref='#/texts/1'), RefItem(cref='#/texts/2'), RefItem(cref='#/texts/3'), RefItem(cref='#/texts/4'), RefItem(cref='#/texts/5'), RefItem(cref='#/texts/6'), RefItem(cref='#/texts/7'), RefItem(cref='#/texts/8'), RefItem(cref='#/texts/9'), RefItem(cref='#/texts/10'), RefItem(cref='#/texts/11'), RefItem(cref='#/texts/12'), RefItem(cref='#/pictures/0'), RefItem(cref='#/pictures/1'), RefItem(cref='#/pictures/2'), RefItem(cref='#/texts/20'), RefItem(cref='#/texts/21'),

In [22]:
print(doc.export_to_markdown())

## You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon ∗ , Santosh Divvala ∗† , Ross Girshick ¶ , Ali Farhadi

University of Washington ∗ , Allen Institute for AI † , Facebook AI Research

∗† ¶

http://pjreddie.com/yolo/

## Abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.

Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the m

# Picture Discription

In [23]:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("yolo_first4.pdf")
doc = result.document
print(doc.export_to_markdown())

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but t

OSError: [WinError 1314] A required privilege is not held by the client: '..\\..\\blobs\\214a30563adf9598b26cdd3ad3bbafca9d7af45d' -> 'C:\\Users\\ibnes\\.cache\\huggingface\\hub\\models--HuggingFaceTB--SmolVLM-256M-Instruct\\snapshots\\7e3e67edbbed1bf9888184d9df282b700a323964\\chat_template.json'

# Picture in Base64 format

In [24]:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.datamodel.base_models import InputFormat

pipeline_options = PdfPipelineOptions()
pipeline_options.generate_picture_images = True
pipeline_options.images_scale = 2
pipeline_options.generate_page_images = True

converter = DocumentConverter(format_options={
    InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
})

result = converter.convert("yolo_first4.pdf")
doc = result.document
print(doc.export_to_markdown)



<bound method DoclingDocument.export_to_markdown of DoclingDocument(schema_name='DoclingDocument', version='1.5.0', name='yolo_first4', origin=DocumentOrigin(mimetype='application/pdf', binary_hash=8518928566146035141, filename='yolo_first4.pdf', uri=None), furniture=GroupItem(self_ref='#/furniture', parent=None, children=[], content_layer=<ContentLayer.FURNITURE: 'furniture'>, name='_root_', label=<GroupLabel.UNSPECIFIED: 'unspecified'>), body=GroupItem(self_ref='#/body', parent=None, children=[RefItem(cref='#/texts/0'), RefItem(cref='#/texts/1'), RefItem(cref='#/texts/2'), RefItem(cref='#/texts/3'), RefItem(cref='#/texts/4'), RefItem(cref='#/texts/5'), RefItem(cref='#/texts/6'), RefItem(cref='#/texts/7'), RefItem(cref='#/texts/8'), RefItem(cref='#/texts/9'), RefItem(cref='#/texts/10'), RefItem(cref='#/texts/11'), RefItem(cref='#/texts/12'), RefItem(cref='#/pictures/0'), RefItem(cref='#/pictures/1'), RefItem(cref='#/pictures/2'), RefItem(cref='#/texts/20'), RefItem(cref='#/texts/21'),

In [25]:
print(doc.export_to_markdown())

## You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon ∗ , Santosh Divvala ∗† , Ross Girshick ¶ , Ali Farhadi

University of Washington ∗ , Allen Institute for AI † , Facebook AI Research

∗† ¶

http://pjreddie.com/yolo/

## Abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance.

Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the m