diff --git a/README.md b/README.md index d5035b2..6702f2d 100644 --- a/README.md +++ b/README.md @@ -166,9 +166,9 @@ For more information take a look at our [Getting Started with Parxy tutorial](./ | [**Pypdfium2**](https://github.com/pypdfium2-team/pypdfium2) | `pypdfium2` | ✅ | ✅ | Preview | | [**pdfplumber**](https://github.com/jsvine/pdfplumber) | `pdfplumber` | ✅ | ✅ | Preview | | [**PDFMiner**](https://github.com/pdfminer/pdfminer.six) | `pdfminer` | ✅ | ✅ | Preview | +| [**Docling**](https://docling-project.github.io/docling/) | `docling` | ✅ | ✅ | Preview | | [**Unstructured.io** cloud service](https://docs.unstructured.io/open-source/introduction/overview) | | | | Planned | | [**Chunkr**](https://www.chunkr.ai/) | | | | Planned | -| [**Docling**](https://docling-project.github.io/docling/) | | | | Planned | ...and more can be added via the [live extension](#live-extension)! diff --git a/docs/howto/configure_docling.md b/docs/howto/configure_docling.md new file mode 100644 index 0000000..56669b2 --- /dev/null +++ b/docs/howto/configure_docling.md @@ -0,0 +1,338 @@ +--- +title: Configure Docling +description: How to set up the Docling driver against a self-hosted or remote docling instance, configure OCR, PDF backend and table extraction, and override options on a per-document basis. +--- + +# How to Configure Docling + +This guide shows you how to configure the Docling driver for document processing using a [docling-serve](https://github.com/docling-project/docling-serve) instance. + +## Prerequisites + +- Parxy installed with Docling support: `pip install parxy[docling]` or via UV `uv add parxy[docling]` +- At least 10GB of free space on disk to running Docling locally or a docling-serve instance running remotely + +## Quick Start + +### Step 1: Start Docling + +Parxy comes with a sample Docker Compose file to run Docling. Generate it in your current directory with: + +```bash +parxy docker +``` + +Then pull the image and start the service: + +```bash +docker compose pull docling && docker compose up -d docling +``` + +It may take some minutes to download and start Docling as the image (`ghcr.io/docling-project/docling-serve-cu128:v1.18.0`) is about 10 GB after download. + + +### Step 2: Parse a Document + +Parse using the command line or as a + + +```python +from parxy_core.facade.parxy import Parxy + +doc = Parxy.parse("document.pdf", driver_name="docling") +print(f"Processed {len(doc.pages)} pages") +``` + +No `.env` configuration is required when docling-serve is running on the default address (`http://localhost:5001`). + +## Configuration Options + +### Environment Variables + +All Docling configuration uses environment variables with the `PARXY_DOCLING_` prefix: + +#### Connection + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `PARXY_DOCLING_BASE_URL` | string | `http://localhost:5001` | Base URL of the docling-serve instance | +| `PARXY_DOCLING_API_KEY` | string | None | API key for authenticated docling-serve instances | +| `PARXY_DOCLING_TIMEOUT` | float | `240.0` | HTTP request timeout in seconds | + +#### Extraction + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `PARXY_DOCLING_DO_OCR` | bool | `false` | Enable OCR on bitmap content (slower but handles scanned PDFs) | +| `PARXY_DOCLING_DO_TABLE_STRUCTURE` | bool | `true` | Extract table structure | +| `PARXY_DOCLING_PDF_BACKEND` | string | `docling_parse` | PDF backend: `docling_parse`, `pypdfium2` | +| `PARXY_DOCLING_TABLE_MODE` | string | `accurate` | Table extraction mode: `fast` or `accurate` | +| `PARXY_DOCLING_INCLUDE_IMAGES` | bool | `false` | Include images in output | +| `PARXY_DOCLING_IMAGES_SCALE` | float | None | Scale factor for images (server default: 2.0) | +| `PARXY_DOCLING_DO_PICTURE_CLASSIFICATION` | bool | `false` | Classify pictures in documents | +| `PARXY_DOCLING_DO_PICTURE_DESCRIPTION` | bool | `false` | Generate descriptions for pictures (requires a VLM configured on the server) | + +### Example `.env` file + +```bash +PARXY_DOCLING_BASE_URL=http://docling-server:5001 +PARXY_DOCLING_API_KEY=your-secret-key +PARXY_DOCLING_DO_OCR=false +PARXY_DOCLING_PDF_BACKEND=docling_parse +PARXY_DOCLING_TABLE_MODE=accurate +PARXY_DOCLING_INCLUDE_IMAGES=false +``` + +## Supported Extraction Levels + +| Level | Description | +|-------|-------------| +| `page` | Page-level text only — text items are concatenated per page | +| `block` | Page + individual blocks (`TextBlock`, `TableBlock`, `ImageBlock`) with bounding boxes | + +```python +# Page-level extraction (default) +doc = Parxy.parse("document.pdf", driver_name="docling", level="page") + +# Block-level extraction +doc = Parxy.parse("document.pdf", driver_name="docling", level="block") +``` + +## Input Types + +The Docling driver accepts all standard Parxy input types. + +### Local Files + +The file is read and sent to docling-serve as a base64-encoded payload: + +```python +doc = Parxy.parse("/path/to/document.pdf", driver_name="docling") +``` + +### URLs + +The URL is passed directly to docling-serve, which downloads the document server-side: + +```python +doc = Parxy.parse("https://arxiv.org/pdf/2206.01062", driver_name="docling") +``` + +## Per-Call Configuration Overrides + +You can override any extraction option for a specific document by passing kwargs to `Parxy.parse()`. This is useful when most documents use the default configuration but some need different settings. + +```python +from parxy_core.facade.parxy import Parxy + +# Default configuration +doc1 = Parxy.parse("digital-pdf.pdf", driver_name="docling") + +# Enable OCR for a scanned document +doc2 = Parxy.parse( + "scanned-invoice.pdf", + driver_name="docling", + do_ocr=True, +) + +# Faster table extraction for a document with simple tables +doc3 = Parxy.parse( + "report.pdf", + driver_name="docling", + table_mode="fast", +) + +# Include images in the output +doc4 = Parxy.parse( + "illustrated-manual.pdf", + driver_name="docling", + level="block", + include_images=True, +) +``` + +### Supported Per-Call Options + +| Option | Type | Description | +|--------|------|-------------| +| `do_ocr` | bool | Enable OCR on bitmap content | +| `pdf_backend` | string | PDF backend (`docling_parse`, `pypdfium2`) | +| `table_mode` | string | Table extraction mode (`fast` or `accurate`) | +| `include_images` | bool | Include images in output | +| `images_scale` | float | Scale factor for extracted images | +| `do_picture_classification` | bool | Classify pictures in documents | +| `do_picture_description` | bool | Generate descriptions for pictures | + +## Document Structure Roles + +Docling labels each extracted element with a semantic category. Parxy maps these to WAI-ARIA document structure roles: + +| Docling Label | WAI-ARIA Role | Description | +|---------------|---------------|-------------| +| `title` | `doc-title` | Document title | +| `section_header` | `heading` | Section headings | +| `paragraph` | `paragraph` | Main body text | +| `list_item` | `list` | List items | +| `code` | `generic` | Code blocks | +| `formula` | `generic` | Mathematical formulas | +| `caption` | `generic` | Figure and table captions | +| `footnote` | `doc-footnote` | Footnotes | +| `page_header` | `doc-pageheader` | Page headers | +| `page_footer` | `doc-pagefooter` | Page footers | +| `table` | `table` | Tables | +| `picture` | `figure` | Images and figures | +| `chart` | `figure` | Charts | + +Access roles in your code: + +```python +doc = Parxy.parse("document.pdf", driver_name="docling", level="block") + +for page in doc.pages: + for block in page.blocks: + print(f"Role: {block.role}, Category: {block.category}") + if block.role == "heading": + print(f" Heading level: {block.level}") +``` + +## Bounding Boxes + +Each block includes bounding box coordinates derived from the Docling JSON output: + +```python +doc = Parxy.parse("document.pdf", driver_name="docling", level="block") + +for page in doc.pages: + print(f"Page {page.number} dimensions: {page.width} x {page.height}") + if page.blocks: + for block in page.blocks: + if block.bbox: + print(f" Block at ({block.bbox.x0:.1f}, {block.bbox.y0:.1f}) " + f"to ({block.bbox.x1:.1f}, {block.bbox.y1:.1f})") +``` + +## Use Cases + +### Scanned Documents + +For documents that are image-based (scanned pages with no embedded text), enable OCR: + +```python +doc = Parxy.parse( + "scanned-contract.pdf", + driver_name="docling", + do_ocr=True, +) +``` + +> **Note**: OCR is significantly slower and more resource-intensive than text extraction. _EasyOCR_ is the default OCR in docling-serve + +### Documents with Complex Tables + +Docling's `accurate` table mode uses TableFormer, a deep learning model for precise table structure extraction: + +```python +doc = Parxy.parse( + "financial-report.pdf", + driver_name="docling", + level="block", + table_mode="accurate", # default — use "fast" for simple tables +) + +# Tables are extracted as TableBlock with markdown text +for page in doc.pages: + if page.blocks: + for block in page.blocks: + if block.role == "table": + print(block.text) # Markdown table format +``` + +### Illustrated Documents + +To include images alongside text in the extracted output: + +```python +doc = Parxy.parse( + "illustrated-guide.pdf", + driver_name="docling", + level="block", + include_images=True, +) + +from parxy_core.models import ImageBlock + +for page in doc.pages: + if page.blocks: + for block in page.blocks: + if isinstance(block, ImageBlock): + print(f"Image on page {page.number}: {block.alt_text}") +``` + +### Filtering by Block Role + +Extract only main body text, skipping headers and footers: + +```python +doc = Parxy.parse("document.pdf", driver_name="docling", level="block") + +skip_roles = {"doc-pageheader", "doc-pagefooter", "doc-footnote"} +body_blocks = [ + block + for page in doc.pages + if page.blocks + for block in page.blocks + if block.role not in skip_roles +] +``` + +## Troubleshooting + +### Connection Errors + +If you see `Cannot connect to Docling server`: + +1. Verify docling-serve is running: `curl http://localhost:5001/health` +2. Check the `PARXY_DOCLING_BASE_URL` value matches the actual address +3. Ensure no firewall or network policy blocks the port + +### TaskNotFound Errors + +Parxy uses the async API to process documents. +When using local workers the task queue is held in memory by the Uvicorn worker. Using multiple `UVICORN_WORKERS` may +result in the task stored in a different worker process that the one serving the original request. + +When using the `local` queue ensure that `UVICORN_WORKERS` is set to `1` as the workers do not share the queue. +If needed you may use Redis or other queue + + +``` +UVICORN_WORKERS=1 +``` + + +### Authentication Errors + +If you see `AuthenticationException`: + +1. Verify `PARXY_DOCLING_API_KEY` matches the key set in `DOCLING_SERVE_API_KEY` on the server +2. Ensure the key is set in your `.env` file or environment before starting your application + +### Timeout Errors + +For large documents or slow hardware, the default 240-second timeout may not be enough: + +```bash +PARXY_DOCLING_TIMEOUT=300 +``` + +### Error without message + +Docling-serve return a generic failure in case a document has more pages than `DOCLING_SERVE_MAX_NUM_PAGES`. + + +## See Also + +- [docling-serve GitHub Repository](https://github.com/docling-project/docling-serve) +- [Docling Project](https://github.com/docling-project/docling) +- [Document Structure Roles](../explanation/document-roles.md) +- [Getting Started Tutorial](../tutorials/getting_started.md) diff --git a/docs/supported_services.md b/docs/supported_services.md index 202b302..c9c6fb0 100644 --- a/docs/supported_services.md +++ b/docs/supported_services.md @@ -19,6 +19,7 @@ Parxy supports the following document processing services and libraries. The **E | [**Pypdfium2**](https://github.com/pypdfium2-team/pypdfium2) | Preview | `pypdfium2` | ✅ | ✅ | | [**pdfplumber**](https://github.com/jsvine/pdfplumber) | Preview | `pdfplumber` | ✅ | ✅ | | [**PDFMiner**](https://github.com/pdfminer/pdfminer.six) | Preview | `pdfminer` | ✅ | ✅ | +| [**Docling**](https://docling-project.github.io/docling/) | Preview | `docling` | ✅ | ✅ | Status meanings: **Live** = stable; **Preview** = functional but the API may change. diff --git a/pyproject.toml b/pyproject.toml index 7f7f004..582affe 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -21,7 +21,7 @@ dependencies = [ "opentelemetry-exporter-otlp>=1.37.0", "opentelemetry-proto>=1.37.0", "opentelemetry-sdk>=1.37.0", - + "httpx>=0.28.0", ] [project.scripts] @@ -55,6 +55,9 @@ pdfplumber = [ pdfminer = [ "pdfminer.six>=20251230", ] +docling = [ + "docling-slim[service-client]>=2.93.0", +] all = [ "llama-cloud>=2.0.0", "llmwhisperer-client>=2.4.2", @@ -64,6 +67,7 @@ all = [ "pypdfium2>=5.7.1", "pdfplumber>=0.11.0", "pdfminer.six>=20251230", + "docling-slim[service-client]>=2.93.0", ] diff --git a/src/parxy_cli/compose.example.yaml b/src/parxy_cli/compose.example.yaml index d9d4fb1..5f0d63e 100644 --- a/src/parxy_cli/compose.example.yaml +++ b/src/parxy_cli/compose.example.yaml @@ -1,12 +1,67 @@ services: - ## PDFAct service - pdfact: - image: "ghcr.io/data-house/pdfact:main" - ports: - - "4567:4567" - networks: - - parxy + ## PDFAct service + pdfact: + image: "ghcr.io/data-house/pdfact:main" + ports: + - "4567:4567" + networks: + - parxy + + docling: + image: ghcr.io/docling-project/docling-serve-cu128:v1.18.0 + restart: unless-stopped + # TODO: add a volume where models are downloaded so the next start is faster + ports: + - "5001:5001" + environment: + # Configuration https://github.com/docling-project/docling-serve/blob/main/docs/configuration.md + # Server settings + # Force 1 worker when DOCLING_SERVE_ENG_KIND=local. Task queue is in memory and not shared between worker processes + - UVICORN_WORKERS=1 + - UVICORN_TIMEOUT_KEEP_ALIVE=1800 + - DOCLING_SERVE_LOG_LEVEL=debug + # Models + - DOCLING_SERVE_LOAD_MODELS_AT_BOOT=True + # UI + - DOCLING_SERVE_ENABLE_UI=1 + # timeouts + - DOCLING_SERVE_MAX_DOCUMENT_TIMEOUT=1800 + - DOCLING_SERVE_MAX_SYNC_WAIT=180 # 3 minutes + # GPU configuration + - DOCLING_DEVICE=cuda + - DOCLING_CUDA_USE_FLASH_ATTENTION2=true + # Worker settings (single GPU - local queue - avoid contention) + - DOCLING_SERVE_ENG_KIND=local + - DOCLING_SERVE_ENG_LOC_NUM_WORKERS=1 + - DOCLING_SERVE_ENG_LOC_SHARE_MODELS=true + # Batch sizes https://docling-project.github.io/docling/usage/gpu/ + - DOCLING_PERF_PAGE_BATCH_SIZE=4 + - DOCLING_PERF_ELEMENTS_BATCH_SIZE=8 + - DOCLING_SERVE_LAYOUT_BATCH_SIZE=16 + - DOCLING_SERVE_TABLE_BATCH_SIZE=4 + - DOCLING_SERVE_OCR_BATCH_SIZE=4 + - DOCLING_SERVE_MAX_NUM_PAGES=100 # maximum number of pages to process + # CPU threading + - DOCLING_NUM_THREADS=4 + deploy: + resources: + limits: + memory: 16g + reservations: + memory: 8g + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] + healthcheck: + test: ["CMD-SHELL", "curl -f http://localhost:5001/health || exit 1"] + interval: 60s + timeout: 10s + retries: 5 + start_period: 180s # Models take time to load + + ## Open Telemetry collector ## To enable traces collection set PARXY_TRACING_ENABLE=True @@ -21,5 +76,5 @@ services: # command: ["--config=/etc/otelcol-contrib/config.yaml"] networks: - parxy: - driver: bridge + parxy: + driver: bridge diff --git a/src/parxy_core/drivers/__init__.py b/src/parxy_core/drivers/__init__.py index 812cf15..110555b 100644 --- a/src/parxy_core/drivers/__init__.py +++ b/src/parxy_core/drivers/__init__.py @@ -13,3 +13,4 @@ ) from parxy_core.drivers.pdfplumber import PDFPlumberDriver as PDFPlumberDriver from parxy_core.drivers.pdfminer import PDFMinerDriver as PDFMinerDriver +from parxy_core.drivers.docling import DoclingDriver as DoclingDriver diff --git a/src/parxy_core/drivers/abstract_driver.py b/src/parxy_core/drivers/abstract_driver.py index 38180c6..d9a61b6 100644 --- a/src/parxy_core/drivers/abstract_driver.py +++ b/src/parxy_core/drivers/abstract_driver.py @@ -139,9 +139,8 @@ def parse( return document except Exception as ex: - self._logger.error( + self._logger.debug( f'Error while parsing file {file if isinstance(file, str) else "stream"} using {self.__class__.__name__}', - exc_info=True, ) tracer.count( diff --git a/src/parxy_core/drivers/docling.py b/src/parxy_core/drivers/docling.py new file mode 100644 index 0000000..6b7977c --- /dev/null +++ b/src/parxy_core/drivers/docling.py @@ -0,0 +1,533 @@ +"""Docling Serve backend driver for parxy. + +IBM Docling Serve provides high-quality PDF to JSON conversion via a REST API. +Requires a running docling-serve instance. + +Documentation: https://github.com/docling-project/docling-serve + +Configuration: + Environment variables: PARXY_DOCLING_BASE_URL, PARXY_DOCLING_DO_OCR, etc. + Config file: .env file with parxy_docling_ prefix + +Example .env configuration: + PARXY_DOCLING_BASE_URL=http://localhost:5001 + PARXY_DOCLING_API_KEY=secret + PARXY_DOCLING_DO_OCR=false + PARXY_DOCLING_PDF_BACKEND=docling_parse + PARXY_DOCLING_TABLE_MODE=accurate + PARXY_DOCLING_INCLUDE_IMAGES=false +""" + +import io +import time +from typing import Optional + +import validators + +from parxy_core.drivers import Driver +from parxy_core.exceptions import AuthenticationException, ParsingException +from parxy_core.models import ( + BoundingBox, + Document, + ImageBlock, + Page, + TableBlock, + TextBlock, +) + +try: + from docling.service_client import DoclingServiceClient + from docling.service_client.client import ExperimentalWarning + from docling.service_client import StatusWatcherKind + from docling.service_client.exceptions import ( + DoclingServiceClientError, + ServiceError, + TaskNotFoundError, + TaskTimeoutError, + ) +except ImportError: + DoclingServiceClient = None # type: ignore[assignment,misc] + ExperimentalWarning = None # type: ignore[assignment,misc] + DoclingServiceClientError = None # type: ignore[assignment,misc] + ServiceError = None # type: ignore[assignment,misc] + TaskNotFoundError = None # type: ignore[assignment,misc] + TaskTimeoutError = None # type: ignore[assignment,misc] + StatusWatcherKind = None # type: ignore[assignment,misc] + +# Grace window to absorb 404s while the server propagates the submitted task +_TASK_NOT_FOUND_GRACE = 10.0 + +_PER_CALL_OPTIONS = frozenset( + { + 'do_ocr', + 'pdf_backend', + 'table_mode', + 'include_images', + 'images_scale', + 'do_picture_classification', + 'do_picture_description', + } +) + +DOCLING_LABEL_TO_ROLE: dict[str, str] = { + 'title': 'doc-title', + 'section_header': 'heading', + 'paragraph': 'paragraph', + 'list_item': 'list', + 'code': 'generic', + 'formula': 'generic', + 'caption': 'generic', + 'footnote': 'doc-footnote', + 'page_header': 'doc-pageheader', + 'page_footer': 'doc-pagefooter', + 'table': 'table', + 'picture': 'figure', + 'chart': 'figure', + 'document_index': 'generic', + 'checkbox_selected': 'generic', + 'checkbox_unselected': 'generic', +} + +DOCLING_LOCAL_URL = 'http://localhost:5001' + + +class DoclingDriver(Driver): + """PDF parser using IBM Docling Serve API. + + Calls a running docling-serve instance for document processing. + Docling uses deep learning models for document understanding. + + By default, OCR is DISABLED. Enable via: + - Environment: PARXY_DOCLING_DO_OCR=true + - Config: parxy_docling_do_ocr=true in .env + + Requires a running docling-serve instance. Quick start: + docker run -p 5001:5001 ghcr.io/docling-project/docling-serve-cu128:v1.18.0 + + Per-call options (passed as kwargs to parse()): + do_ocr, pdf_backend, table_mode, include_images, images_scale, + do_picture_classification, do_picture_description + """ + + supported_levels = ['page', 'block'] + + def _initialize_driver(self): + if DoclingServiceClient is None: + raise ImportError( + 'docling is required. Install with: pip install parxy[docling]' + ) + + if self._config: + self._base_url = getattr( + self._config, 'base_url', DOCLING_LOCAL_URL + ).rstrip('/') + self._api_key = getattr(self._config, 'api_key', None) + self._timeout = getattr(self._config, 'timeout', 120.0) + self._poll_wait = getattr(self._config, 'poll_wait', 5.0) + else: + self._base_url = DOCLING_LOCAL_URL + self._api_key = None + self._timeout = 120.0 + self._poll_wait = 5.0 + + return self + + def _get_opt(self, overrides: dict, key: str, default=None): + if key in overrides: + return overrides[key] + if self._config and hasattr(self._config, key): + val = getattr(self._config, key) + if val is not None: + return val + return default + + def _get_api_key_str(self) -> str: + if self._api_key is None: + return '' + if hasattr(self._api_key, 'get_secret_value'): + return self._api_key.get_secret_value() + return str(self._api_key) + + def _build_docling_options(self, overrides: dict): + from docling.datamodel.base_models import OutputFormat + from docling.datamodel.service.options import ConvertDocumentsOptions + + return ConvertDocumentsOptions( + to_formats=[OutputFormat.JSON], + do_ocr=self._get_opt(overrides, 'do_ocr', False), + do_table_structure=True, + pdf_backend=self._get_opt(overrides, 'pdf_backend', 'docling_parse'), + table_mode=self._get_opt(overrides, 'table_mode', 'accurate'), + include_images=self._get_opt(overrides, 'include_images', False), + images_scale=self._get_opt(overrides, 'images_scale', 2.0), + abort_on_error=False, + do_picture_classification=self._get_opt( + overrides, 'do_picture_classification', False + ), + do_picture_description=self._get_opt( + overrides, 'do_picture_description', False + ), + ) + + def _handle( + self, file: str | io.BytesIO | bytes, level: str = 'page', **kwargs + ) -> Document: + """Parse a document via the Docling Serve API (blocking). + + Uses the official docling-serve client SDK, which handles async job + submission, polling, and result retrieval transparently. + + Parameters + ---------- + file : str | io.BytesIO | bytes + Path, URL or stream of the file to parse. + level : str, optional + Desired extraction level. Default is "page". + **kwargs : dict + Per-call configuration overrides. Supported options: + + - do_ocr: Enable OCR on bitmap content + - pdf_backend: PDF backend (docling_parse, pypdfium2, etc.) + - table_mode: Table extraction mode ('fast' or 'accurate') + - include_images: Include images in output (default False) + - images_scale: Scale factor for images (default 2.0) + - do_picture_classification: Classify pictures + - do_picture_description: Generate picture descriptions + + Returns + ------- + Document + A parsed Document in unified format. + """ + import json + import warnings + from pathlib import Path + from docling_core.types.io import DocumentStream + from docling.datamodel.base_models import ConversionStatus + + overrides = {k: v for k, v in kwargs.items() if k in _PER_CALL_OPTIONS} + is_url = isinstance(file, str) and validators.url(file) is True + + if is_url: + filename = file + stream_for_trace = file.encode('utf-8') + source: str | DocumentStream = file + else: + filename, stream_for_trace = self.handle_file_input(file) + name = Path(filename).name if filename else 'document.pdf' + source = DocumentStream(name=name, stream=io.BytesIO(stream_for_trace)) + + with self._trace_parse(filename, stream_for_trace, **kwargs) as span: + options = self._build_docling_options(overrides) + + span.set_attribute('docling.do_ocr', options.do_ocr) + span.set_attribute('docling.pdf_backend', str(options.pdf_backend.value)) + span.set_attribute('docling.table_mode', str(options.table_mode.value)) + span.set_attribute('docling.include_images', options.include_images) + span.set_attribute('docling.images_scale', options.images_scale) + if options.do_picture_classification: + span.set_attribute('docling.do_picture_classification', True) + if options.do_picture_description: + span.set_attribute('docling.do_picture_description', True) + + try: + with warnings.catch_warnings(): + warnings.simplefilter('ignore', ExperimentalWarning) + client = DoclingServiceClient( + url=self._base_url, + api_key=self._get_api_key_str(), + poll_server_wait=self._poll_wait, + poll_client_interval=self._poll_wait, + job_timeout=self._timeout, + status_watcher=StatusWatcherKind.POLLING, + ) + + job = client.submit(source, options=options) + except ServiceError as e: + if e.status_code == 401: + raise AuthenticationException( + 'Authentication failed. Check PARXY_DOCLING_API_KEY.', + self.__class__, + ) from e + raise ParsingException( + f'Docling service error: {e}', + self.__class__, + ) from e + except DoclingServiceClientError as e: + raise ParsingException( + f'Network error communicating with Docling server at {self._base_url}: {e}', + self.__class__, + ) from e + + span.set_attribute('docling.task_id', job.task_id) + + # Drive polling manually so TaskNotFoundError during task propagation + # is absorbed within a grace window rather than surfacing immediately. + deadline = time.monotonic() + self._timeout + not_found_grace = time.monotonic() + _TASK_NOT_FOUND_GRACE + + while True: + remaining = deadline - time.monotonic() + if remaining <= 0: + raise ParsingException( + f'Docling processing timed out after {self._timeout}s' + f' (task {job.task_id})', + self.__class__, + ) + + poll_started = time.monotonic() + try: + job.poll(wait=min(self._poll_wait, remaining)) + except TaskNotFoundError as e: + grace_remaining = not_found_grace - time.monotonic() + if grace_remaining > 0: + time.sleep(min(self._poll_wait, grace_remaining)) + continue + raise ParsingException( + f'Docling task not found (task {job.task_id})', + self.__class__, + ) from e + except DoclingServiceClientError as e: + raise ParsingException( + f'Docling client error: {e}', + self.__class__, + ) from e + + if job.done: + break + + # Client-side sleep to pace polling when server ignores wait param + poll_elapsed = time.monotonic() - poll_started + sleep_for = max(0.0, self._poll_wait - poll_elapsed) + if sleep_for > 0: + time.sleep(sleep_for) + + try: + result = job.result() + except DoclingServiceClientError as e: + raise ParsingException( + f'Docling client error fetching result: {e}', + self.__class__, + ) from e + + if result.status not in ( + ConversionStatus.SUCCESS, + ConversionStatus.PARTIAL_SUCCESS, + ): + errors = [err.error_message for err in result.errors] + raise ParsingException( + f'Docling processing failed: {errors}', self.__class__ + ) + + if result.document is None: + raise ParsingException( + 'Docling API returned no document', self.__class__ + ) + + doc_dict = json.loads(result.document.model_dump_json()) + document = _docling_json_to_document( + doc_dict, filename=filename, level=level + ) + span.set_attribute('output.pages', len(document.pages)) + + return document + + +def _docling_json_to_document( + json_content: dict, filename: str, level: str +) -> Document: + """Convert a Docling JSON document to a parxy Document.""" + items_by_ref: dict[str, tuple[str, dict]] = {} + for i, item in enumerate(json_content.get('texts', [])): + items_by_ref[item.get('self_ref', f'#/texts/{i}')] = ('text', item) + for i, item in enumerate(json_content.get('tables', [])): + items_by_ref[item.get('self_ref', f'#/tables/{i}')] = ('table', item) + for i, item in enumerate(json_content.get('pictures', [])): + items_by_ref[item.get('self_ref', f'#/pictures/{i}')] = ('picture', item) + + groups_by_ref: dict[str, dict] = { + g.get('self_ref', f'#/groups/{i}'): g + for i, g in enumerate(json_content.get('groups', [])) + } + + ordered: list[tuple[str, dict]] = [] + _traverse(json_content.get('body', {}), items_by_ref, groups_by_ref, ordered) + + # Fallback when body is missing or empty: sort by page then top-most bbox + # (descending t = top-to-bottom in Docling BOTTOMLEFT coordinates) + if not ordered: + all_items: list = [] + for _, (item_type, item_data) in items_by_ref.items(): + prov = item_data.get('prov', [{}]) + p = prov[0] if prov else {} + all_items.append( + ( + p.get('page_no', 1), + -p.get('bbox', {}).get('t', 0.0), + item_type, + item_data, + ) + ) + all_items.sort(key=lambda x: (x[0], x[1])) + ordered = [(t, d) for _, _, t, d in all_items] + + items_by_page: dict[int, list[tuple[str, dict]]] = {} + for item_type, item_data in ordered: + prov = item_data.get('prov', [{}]) + page_no = (prov[0] if prov else {}).get('page_no', 1) + items_by_page.setdefault(page_no, []).append((item_type, item_data)) + + raw_pages: dict[str, dict] = json_content.get('pages', {}) + if raw_pages: + page_nos = sorted(int(k) for k in raw_pages.keys()) + elif items_by_page: + page_nos = sorted(items_by_page.keys()) + else: + page_nos = [] + + do_blocks = level == 'block' + pages: list[Page] = [] + + for page_no in page_nos: + raw_page = raw_pages.get(str(page_no), {}) + size = raw_page.get('size', {}) + width: Optional[float] = size.get('width') + height: Optional[float] = size.get('height') + page_items = items_by_page.get(page_no, []) + + if do_blocks: + blocks: list = [] + text_parts: list[str] = [] + for item_type, item_data in page_items: + if item_type == 'text': + block = _make_text_block(item_data, page_no) + blocks.append(block) + if block.text: + text_parts.append(block.text) + elif item_type == 'table': + block = _make_table_block(item_data, page_no) + blocks.append(block) + if block.text: + text_parts.append(block.text) + elif item_type == 'picture': + blocks.append(_make_image_block(item_data, page_no)) + pages.append( + Page( + number=page_no, + width=width, + height=height, + text='\n'.join(text_parts), + blocks=blocks if blocks else None, + ) + ) + else: + text_parts = [] + for item_type, item_data in page_items: + if item_type == 'text': + text = item_data.get('text', '') or '' + if text: + text_parts.append(text) + elif item_type == 'table': + md = _table_to_markdown(item_data) + if md: + text_parts.append(md) + pages.append( + Page( + number=page_no, + width=width, + height=height, + text='\n'.join(text_parts), + blocks=None, + ) + ) + + return Document(filename=filename, pages=pages) + + +def _traverse( + node: dict, + items_by_ref: dict[str, tuple[str, dict]], + groups_by_ref: dict[str, dict], + result: list[tuple[str, dict]], +) -> None: + for child in node.get('children', []): + ref = child.get('$ref', '') + if ref in items_by_ref: + result.append(items_by_ref[ref]) + elif ref in groups_by_ref: + _traverse(groups_by_ref[ref], items_by_ref, groups_by_ref, result) + + +def _extract_bbox(prov_list: list) -> Optional[BoundingBox]: + if not prov_list: + return None + bbox = prov_list[0].get('bbox', {}) + if not bbox: + return None + # Docling BOTTOMLEFT coords: l=left, r=right, t=top-y, b=bottom-y + return BoundingBox( + x0=bbox.get('l', 0.0), + y0=bbox.get('b', 0.0), + x1=bbox.get('r', 0.0), + y1=bbox.get('t', 0.0), + ) + + +def _make_text_block(item: dict, page_no: int) -> TextBlock: + label = item.get('label', 'paragraph') + role = DOCLING_LABEL_TO_ROLE.get(label, 'paragraph') + return TextBlock( + type='text', + role=role, + category=label, + level=item.get('level'), + text=item.get('text', '') or '', + bbox=_extract_bbox(item.get('prov', [])), + page=page_no, + ) + + +def _make_table_block(item: dict, page_no: int) -> TableBlock: + label = item.get('label', 'table') + role = DOCLING_LABEL_TO_ROLE.get(label, 'table') + return TableBlock( + type='table', + role=role, + category=label, + text=_table_to_markdown(item), + bbox=_extract_bbox(item.get('prov', [])), + page=page_no, + ) + + +def _make_image_block(item: dict, page_no: int) -> ImageBlock: + label = item.get('label', 'picture') + role = DOCLING_LABEL_TO_ROLE.get(label, 'figure') + captions = item.get('captions', []) + alt_text = captions[0].get('text', '') if captions else None + return ImageBlock( + type='image', + role=role, + category=label, + alt_text=alt_text or None, + bbox=_extract_bbox(item.get('prov', [])), + page=page_no, + ) + + +def _table_to_markdown(table_item: dict) -> str: + grid = table_item.get('data', {}).get('grid', []) + if not grid: + return '' + + rows = [] + for row in grid: + cells = [cell.get('text', '') if cell else '' for cell in row] + rows.append('| ' + ' | '.join(cells) + ' |') + + if rows: + num_cols = len(grid[0]) if grid[0] else 0 + separator = '| ' + ' | '.join(['---'] * num_cols) + ' |' + rows.insert(1, separator) + + return '\n'.join(rows) diff --git a/src/parxy_core/drivers/factory.py b/src/parxy_core/drivers/factory.py index 1e48bc5..af05ed5 100644 --- a/src/parxy_core/drivers/factory.py +++ b/src/parxy_core/drivers/factory.py @@ -12,6 +12,7 @@ from parxy_core.drivers.pypdfium2 import PyPDFium2Driver from parxy_core.drivers.pdfplumber import PDFPlumberDriver from parxy_core.drivers.pdfminer import PDFMinerDriver +from parxy_core.drivers.docling import DoclingDriver from parxy_core.models import ( PdfActConfig, LandingAIConfig, @@ -19,6 +20,7 @@ LlmWhispererConfig, UnstructuredLocalConfig, ParxyConfig, + DoclingConfig, ) from parxy_core.logging import create_isolated_logger from parxy_core.tracing import tracer @@ -221,6 +223,9 @@ def _create_pdfplumber_driver(self) -> PDFPlumberDriver: def _create_pdfminer_driver(self) -> PDFMinerDriver: return PDFMinerDriver(logger=self._logger) + def _create_docling_driver(self) -> DoclingDriver: + return DoclingDriver(config=DoclingConfig(), logger=self._logger) + def _create_landingai_driver(self) -> LandingAIADEDriver: """Create a LandingAI ADE Driver instance. @@ -298,6 +303,7 @@ def get_supported_drivers(self) -> List[str]: 'pypdfium', 'pdfplumber', 'pdfminer', + 'docling', ] return supported_drivers diff --git a/src/parxy_core/logging/logger.py b/src/parxy_core/logging/logger.py index 904b85a..2fded7e 100644 --- a/src/parxy_core/logging/logger.py +++ b/src/parxy_core/logging/logger.py @@ -3,6 +3,15 @@ from datetime import datetime +class ParxyLogger(logging.Logger): + """Logger that includes stack traces only when DEBUG level is active.""" + + def error(self, msg, *args, **kwargs): + if 'exc_info' not in kwargs: + kwargs['exc_info'] = self.isEnabledFor(logging.DEBUG) + super().error(msg, *args, **kwargs) + + def create_isolated_logger( name: str, level: int = logging.ERROR, @@ -28,7 +37,7 @@ def create_isolated_logger( Configured logger instance """ - logger = logging.getLogger(name) + logger = ParxyLogger(name) logger.setLevel(level) logger.propagate = propagate logger.handlers.clear() diff --git a/src/parxy_core/models/__init__.py b/src/parxy_core/models/__init__.py index 8d91e2e..60dc8f4 100644 --- a/src/parxy_core/models/__init__.py +++ b/src/parxy_core/models/__init__.py @@ -27,4 +27,5 @@ LandingAIConfig as LandingAIConfig, LlmWhispererConfig as LlmWhispererConfig, UnstructuredLocalConfig as UnstructuredLocalConfig, + DoclingConfig as DoclingConfig, ) diff --git a/src/parxy_core/models/config.py b/src/parxy_core/models/config.py index 905ebe3..1261a5a 100644 --- a/src/parxy_core/models/config.py +++ b/src/parxy_core/models/config.py @@ -221,3 +221,47 @@ class UnstructuredLocalConfig(BaseConfig): model_config = SettingsConfigDict( env_prefix='parxy_unstructured_local_', env_file='.env', extra='ignore' ) + + +class DoclingConfig(BaseConfig): + """Configuration values for Docling Serve. All env variables must start with `parxy_docling_`""" + + base_url: str = 'http://localhost:5001' + """The base URL of the Docling Serve API.""" + + api_key: Optional[SecretStr] = Field(exclude=True, default=None) + """Optional API key for authenticated docling-serve instances.""" + + timeout: float = 240.0 + """HTTP request timeout in seconds. Default 240.""" + + do_ocr: Optional[bool] = False + """Enable OCR on bitmap content. Default False.""" + + do_table_structure: Optional[bool] = True + """Enable table structure extraction. Default True.""" + + pdf_backend: Optional[str] = 'docling_parse' + """PDF backend to use. Options: docling_parse, pypdfium2. Default docling_parse.""" + + table_mode: Optional[str] = 'accurate' + """Table extraction mode. Options: fast, accurate. Default accurate.""" + + include_images: Optional[bool] = False + """Include images in output. Default False.""" + + images_scale: Optional[float] = None + """Scale factor for images. Default None (uses server default of 2.0).""" + + do_picture_classification: Optional[bool] = False + """Classify pictures in documents. Default False.""" + + do_picture_description: Optional[bool] = False + """Generate descriptions for pictures. Default False.""" + + poll_wait: float = 10.0 + """Server-side long-polling wait duration in seconds. Default 10.""" + + model_config = SettingsConfigDict( + env_prefix='parxy_docling_', env_file='.env', extra='ignore' + ) diff --git a/tests/drivers/test_docling.py b/tests/drivers/test_docling.py new file mode 100644 index 0000000..6fd591c --- /dev/null +++ b/tests/drivers/test_docling.py @@ -0,0 +1,937 @@ +import json as _json +import os + +import httpx +import pytest +from unittest.mock import Mock, patch, MagicMock + +from parxy_core.models import Page, TextBlock, TableBlock, ImageBlock + +from parxy_core.drivers import DoclingDriver +from parxy_core.exceptions import ( + FileNotFoundException, + AuthenticationException, + ParsingException, +) + +_DOCLING_URL = 'http://localhost:5001' + + +def _is_docling_available() -> bool: + try: + with httpx.Client(timeout=2.0) as client: + client.get(_DOCLING_URL) + return True + except Exception: + return False + + +docling_live = pytest.mark.skipif( + not _is_docling_available(), + reason='Docling Serve not available at localhost:5001', +) + + +def _docling_response(pages: dict, texts=None, tables=None, pictures=None, groups=None): + """Build a minimal DoclingDocument JSON dict (as returned by model_dump_json).""" + texts = texts or [] + tables = tables or [] + pictures = pictures or [] + groups = groups or [] + + body_children = ( + [{'$ref': t['self_ref']} for t in texts] + + [{'$ref': t['self_ref']} for t in tables] + + [{'$ref': p['self_ref']} for p in pictures] + ) + + return { + 'schema_name': 'DoclingDocument', + 'version': '1.0.0', + 'pages': pages, + 'texts': texts, + 'tables': tables, + 'pictures': pictures, + 'groups': groups, + 'body': { + 'self_ref': '#/body', + 'children': body_children, + 'label': 'unspecified', + 'name': 'body', + }, + } + + +def _mock_conversion_result(json_content: dict): + """Build a mock ConversionResult with SUCCESS status.""" + from docling.datamodel.base_models import ConversionStatus + + result = Mock() + result.status = ConversionStatus.SUCCESS + result.errors = [] + mock_doc = Mock() + mock_doc.model_dump_json.return_value = _json.dumps(json_content) + result.document = mock_doc + return result + + +def _mock_docling_client(json_content: dict, task_id: str = 'test-task-123'): + """Return a mock DoclingServiceClient instance (to assign to MockCls.return_value). + + The driver polls via job.poll() until job.done, then calls job.result(). + Mock: poll() succeeds immediately, done is truthy (Mock default), result() returns + the conversion result. + """ + mock_client = MagicMock() + mock_job = Mock() + mock_job.task_id = task_id + # poll() returns a truthy Mock (no side_effect → no exception) + # done is a Mock attribute → truthy → loop breaks after first poll + mock_job.result.return_value = _mock_conversion_result(json_content) + mock_client.submit.return_value = mock_job + return mock_client + + +class TestDoclingDriver: + def __fixture_path(self, file: str) -> str: + current_dir = os.path.dirname(os.path.abspath(__file__)) + fixtures_dir = os.path.join(os.path.dirname(current_dir), 'fixtures') + return os.path.join(fixtures_dir, file) + + # ── construction ────────────────────────────────────────────────────────── + + def test_docling_driver_can_be_created(self): + driver = DoclingDriver() + + assert driver.supported_levels == ['page', 'block'] + + # ── level validation ────────────────────────────────────────────────────── + + def test_docling_driver_unrecognized_level_handled(self): + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + + with pytest.raises(ValueError) as excinfo: + driver.parse(path, level='custom') + + assert 'not supported' in str(excinfo.value) + assert '[custom]' in str(excinfo.value) + + # ── file not found ──────────────────────────────────────────────────────── + + def test_docling_driver_handle_not_existing_file(self): + driver = DoclingDriver() + path = self.__fixture_path('non-existing-file.pdf') + + with pytest.raises(FileNotFoundException): + driver.parse(path) + + # ── page-level extraction ───────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_read_document_page_level(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + texts=[ + { + 'self_ref': '#/texts/0', + 'text': 'Hello world', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + } + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='page') + + assert document is not None + assert document.metadata is None + assert len(document.pages) == 1 + page = document.pages[0] + assert isinstance(page, Page) + assert page.number == 1 + assert page.blocks is None + assert page.text == 'Hello world' + assert page.width == 595.3 + assert page.height == 841.9 + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_read_empty_document_page_level( + self, MockDoclingServiceClient + ): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + texts=[], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='page') + + assert document is not None + assert len(document.pages) == 1 + assert document.pages[0].number == 1 + assert document.pages[0].text == '' + assert document.pages[0].blocks is None + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_keeps_empty_pages(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={ + '1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}, + '2': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 2}, + '3': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 3}, + }, + texts=[ + { + 'self_ref': '#/texts/0', + 'text': 'Page one content', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + }, + { + 'self_ref': '#/texts/1', + 'text': 'Page three content', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 3, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + }, + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='page') + + assert len(document.pages) == 3 + assert document.pages[0].number == 1 + assert document.pages[0].text == 'Page one content' + assert document.pages[1].number == 2 + assert document.pages[1].text == '' + assert document.pages[2].number == 3 + assert document.pages[2].text == 'Page three content' + + # ── block-level extraction ──────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_read_document_block_level(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + texts=[ + { + 'self_ref': '#/texts/0', + 'text': 'Document title', + 'label': 'title', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 800.0, + 'r': 540.0, + 'b': 780.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + }, + { + 'self_ref': '#/texts/1', + 'text': 'Section heading', + 'label': 'section_header', + 'level': 1, + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 760.0, + 'r': 540.0, + 'b': 740.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + }, + { + 'self_ref': '#/texts/2', + 'text': 'A paragraph of text.', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + }, + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='block') + + assert document is not None + assert len(document.pages) == 1 + page = document.pages[0] + assert page.number == 1 + assert page.blocks is not None + assert len(page.blocks) == 3 + + title_block = page.blocks[0] + assert isinstance(title_block, TextBlock) + assert title_block.role == 'doc-title' + assert title_block.category == 'title' + assert title_block.text == 'Document title' + + heading_block = page.blocks[1] + assert isinstance(heading_block, TextBlock) + assert heading_block.role == 'heading' + assert heading_block.level == 1 + + para_block = page.blocks[2] + assert isinstance(para_block, TextBlock) + assert para_block.role == 'paragraph' + assert para_block.text == 'A paragraph of text.' + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_table_block(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + tables=[ + { + 'self_ref': '#/tables/0', + 'label': 'table', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 500.0, + 'r': 540.0, + 'b': 400.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + 'data': { + 'num_rows': 2, + 'num_cols': 2, + 'grid': [ + [{'text': 'Col A'}, {'text': 'Col B'}], + [{'text': 'Val 1'}, {'text': 'Val 2'}], + ], + }, + } + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='block') + + assert len(document.pages) == 1 + page = document.pages[0] + assert page.blocks is not None + assert len(page.blocks) == 1 + block = page.blocks[0] + assert isinstance(block, TableBlock) + assert block.role == 'table' + assert '| Col A | Col B |' in block.text + assert '| Val 1 | Val 2 |' in block.text + assert '| --- | --- |' in block.text + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_image_block(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + pictures=[ + { + 'self_ref': '#/pictures/0', + 'label': 'picture', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 100.0, + 't': 600.0, + 'r': 400.0, + 'b': 450.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + 'captions': [{'text': 'Figure 1: A diagram'}], + } + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='block') + + assert len(document.pages) == 1 + page = document.pages[0] + assert page.blocks is not None + assert len(page.blocks) == 1 + block = page.blocks[0] + assert isinstance(block, ImageBlock) + assert block.role == 'figure' + assert block.alt_text == 'Figure 1: A diagram' + + # ── bounding box ────────────────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_block_bbox(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + texts=[ + { + 'self_ref': '#/texts/0', + 'text': 'text', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + } + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='block') + + block = document.pages[0].blocks[0] + assert block.bbox is not None + assert block.bbox.x0 == 72.0 + assert block.bbox.x1 == 540.0 + + # ── API request structure ───────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_sends_json_format(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + driver.parse(path, level='page') + + mock_client = MockDoclingServiceClient.return_value + submit_args = mock_client.submit.call_args + source = submit_args[0][0] + options = submit_args[1]['options'] + + from docling_core.types.io import DocumentStream + + assert isinstance(source, DocumentStream) + assert any(f.value == 'json' for f in options.to_formats) + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_url_uses_http_sources(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + driver.parse('http://example.com/doc.pdf', level='page') + + mock_client = MockDoclingServiceClient.return_value + submit_args = mock_client.submit.call_args + source = submit_args[0][0] + + assert source == 'http://example.com/doc.pdf' + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_per_call_ocr_override(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + driver.parse(path, level='page', do_ocr=True) + + mock_client = MockDoclingServiceClient.return_value + options = mock_client.submit.call_args[1]['options'] + assert options.do_ocr is True + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_per_call_pdf_backend_override( + self, MockDoclingServiceClient + ): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + driver.parse(path, level='page', pdf_backend='pypdfium2', table_mode='fast') + + mock_client = MockDoclingServiceClient.return_value + options = mock_client.submit.call_args[1]['options'] + assert options.pdf_backend.value == 'pypdfium2' + assert options.table_mode.value == 'fast' + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_include_images_default_false( + self, MockDoclingServiceClient + ): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + driver.parse(path, level='page') + + mock_client = MockDoclingServiceClient.return_value + options = mock_client.submit.call_args[1]['options'] + assert options.include_images is False + + # ── error handling ──────────────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_auth_error(self, MockDoclingServiceClient): + from docling.service_client.exceptions import ServiceError + + mock_client = MagicMock() + mock_client.submit.side_effect = ServiceError( + message='Unauthorized', status_code=401 + ) + MockDoclingServiceClient.return_value = mock_client + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + + with pytest.raises(AuthenticationException): + driver.parse(path) + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_server_error(self, MockDoclingServiceClient): + from docling.service_client.exceptions import ServiceUnavailableError + + mock_client = MagicMock() + mock_client.submit.side_effect = ServiceUnavailableError( + message='Internal server error', status_code=500 + ) + MockDoclingServiceClient.return_value = mock_client + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + + with pytest.raises(ParsingException): + driver.parse(path) + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_failure_status(self, MockDoclingServiceClient): + from docling.datamodel.base_models import ConversionStatus + + mock_client = MagicMock() + mock_job = Mock() + mock_job.task_id = 'test-task-123' + mock_result = Mock() + mock_result.status = ConversionStatus.FAILURE + mock_result.errors = [] + mock_job.result.return_value = mock_result + mock_client.submit.return_value = mock_job + MockDoclingServiceClient.return_value = mock_client + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + + with pytest.raises(ParsingException): + driver.parse(path) + + @patch('parxy_core.drivers.docling._TASK_NOT_FOUND_GRACE', 0.0) + @patch('parxy_core.drivers.docling.time.sleep') + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_poll_task_not_found( + self, MockDoclingServiceClient, mock_sleep + ): + from docling.service_client.exceptions import TaskNotFoundError + + mock_client = MagicMock() + mock_job = Mock() + mock_job.task_id = 'test-task-123' + mock_job.poll.side_effect = TaskNotFoundError('Task not found') + mock_client.submit.return_value = mock_job + MockDoclingServiceClient.return_value = mock_client + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + + with pytest.raises(ParsingException, match='task not found'): + driver.parse(path) + + # Grace period is 0 → immediate failure on the first poll, no sleep + assert mock_job.poll.call_count == 1 + mock_sleep.assert_not_called() + + @patch('parxy_core.drivers.docling.time.sleep') + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_task_not_found_retries_then_succeeds( + self, MockDoclingServiceClient, mock_sleep + ): + from docling.service_client.exceptions import TaskNotFoundError + + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + ) + mock_client = MagicMock() + mock_job = Mock() + mock_job.task_id = 'test-task-123' + # First poll raises TaskNotFoundError; second poll succeeds (returns Mock) + mock_job.poll.side_effect = [TaskNotFoundError('Task not found'), Mock()] + mock_job.result.return_value = _mock_conversion_result(json_content) + mock_client.submit.return_value = mock_job + MockDoclingServiceClient.return_value = mock_client + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path) + + assert document is not None + # First poll failed → sleep during grace retry → second poll succeeds + assert mock_job.poll.call_count == 2 + assert mock_sleep.call_count == 1 + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_connect_error(self, MockDoclingServiceClient): + from docling.service_client.exceptions import ServiceUnavailableError + + mock_client = MagicMock() + mock_client.submit.side_effect = ServiceUnavailableError( + message='Connection refused', status_code=None + ) + MockDoclingServiceClient.return_value = mock_client + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + + with pytest.raises(ParsingException): + driver.parse(path) + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_remote_protocol_error(self, MockDoclingServiceClient): + from docling.service_client.exceptions import DoclingServiceClientError + + mock_client = MagicMock() + mock_client.submit.side_effect = DoclingServiceClientError( + 'Server disconnected without sending a response.' + ) + MockDoclingServiceClient.return_value = mock_client + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + + with pytest.raises(ParsingException, match='Network error'): + driver.parse(path) + + # ── tracing ─────────────────────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + @patch('parxy_core.drivers.abstract_driver.tracer') + def test_docling_driver_tracing_span_created( + self, mock_tracer, MockDoclingServiceClient + ): + mock_span = MagicMock() + mock_span.__enter__ = Mock(return_value=mock_span) + mock_span.__exit__ = Mock(return_value=False) + mock_tracer.span = Mock(return_value=mock_span) + mock_tracer.count = Mock() + mock_tracer.info = Mock() + + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + driver.parse(path, level='page') + + mock_tracer.span.assert_called() + + span_calls = mock_tracer.span.call_args_list + doc_processing_call = [ + c for c in span_calls if c[0][0] == 'document-processing' + ][0] + + assert doc_processing_call[1]['driver'] == 'DoclingDriver' + assert doc_processing_call[1]['level'] == 'page' + + mock_tracer.count.assert_called_once() + count_call = mock_tracer.count.call_args + assert count_call[0][0] == 'documents.processed' + assert count_call[1]['driver'] == 'DoclingDriver' + + @patch('parxy_core.drivers.abstract_driver.tracer') + def test_docling_driver_tracing_exception_recorded(self, mock_tracer): + mock_span = MagicMock() + mock_span.__enter__ = Mock(return_value=mock_span) + mock_span.__exit__ = Mock(return_value=False) + mock_tracer.span = Mock(return_value=mock_span) + mock_tracer.count = Mock() + mock_tracer.error = Mock() + + driver = DoclingDriver() + path = self.__fixture_path('non-existing-file.pdf') + + with pytest.raises(FileNotFoundException): + driver.parse(path, level='page') + + mock_tracer.error.assert_called_once() + error_call = mock_tracer.error.call_args + assert error_call[0][0] == 'Parsing failed' + + mock_tracer.count.assert_called_once() + + # ── elapsed time ────────────────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_records_elapsed_time(self, MockDoclingServiceClient): + json_content = _docling_response( + pages={'1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}}, + texts=[ + { + 'self_ref': '#/texts/0', + 'text': 'content', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + } + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='page') + + assert document.parsing_metadata is not None + assert 'driver_elapsed_time' in document.parsing_metadata + assert isinstance(document.parsing_metadata['driver_elapsed_time'], float) + assert document.parsing_metadata['driver_elapsed_time'] > 0 + + # ── multi-page ──────────────────────────────────────────────────────────── + + @patch('parxy_core.drivers.docling.DoclingServiceClient') + def test_docling_driver_multi_page_page_numbers_start_at_1( + self, MockDoclingServiceClient + ): + json_content = _docling_response( + pages={ + '1': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 1}, + '2': {'size': {'width': 595.3, 'height': 841.9}, 'page_no': 2}, + }, + texts=[ + { + 'self_ref': '#/texts/0', + 'text': 'First page', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 1, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + }, + { + 'self_ref': '#/texts/1', + 'text': 'Second page', + 'label': 'paragraph', + 'prov': [ + { + 'page_no': 2, + 'bbox': { + 'l': 72.0, + 't': 720.0, + 'r': 540.0, + 'b': 700.0, + 'coord_origin': 'BOTTOMLEFT', + }, + } + ], + }, + ], + ) + MockDoclingServiceClient.return_value = _mock_docling_client(json_content) + + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='page') + + assert len(document.pages) == 2 + assert document.pages[0].number == 1 + assert document.pages[0].text == 'First page' + assert document.pages[1].number == 2 + assert document.pages[1].text == 'Second page' + + +@docling_live +class TestDoclingDriverLive: + def __fixture_path(self, file: str) -> str: + current_dir = os.path.dirname(os.path.abspath(__file__)) + fixtures_dir = os.path.join(os.path.dirname(current_dir), 'fixtures') + return os.path.join(fixtures_dir, file) + + def test_live_empty_doc_page_level(self): + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='page') + + assert document is not None + assert len(document.pages) == 1 + page = document.pages[0] + assert isinstance(page, Page) + assert page.number == 1 + assert page.blocks is None + assert page.text == '1' + + def test_live_empty_doc_block_level(self): + driver = DoclingDriver() + path = self.__fixture_path('empty-doc.pdf') + document = driver.parse(path, level='block') + + assert document is not None + assert len(document.pages) == 1 + page = document.pages[0] + assert isinstance(page, Page) + assert page.number == 1 + assert page.blocks is not None + assert len(page.blocks) == 1 + assert isinstance(page.blocks[0], TextBlock) + assert page.blocks[0].role == 'doc-pagefooter' + assert page.blocks[0].text == '1' + + def test_live_test_doc_page_level(self): + driver = DoclingDriver() + path = self.__fixture_path('test-doc.pdf') + document = driver.parse(path, level='page') + + assert document is not None + assert len(document.pages) == 1 + page = document.pages[0] + assert isinstance(page, Page) + assert page.number == 1 + assert page.blocks is None + assert page.text == ( + 'This is the header\n' + 'This is a test PDF to be used as input in unit tests\n' + 'This is a heading 1\n' + 'This is a paragraph below heading 1\n' + '1' + ) + + def test_live_test_doc_block_level(self): + driver = DoclingDriver() + path = self.__fixture_path('test-doc.pdf') + document = driver.parse(path, level='block') + + assert document is not None + assert len(document.pages) == 1 + page = document.pages[0] + assert page.blocks is not None + assert len(page.blocks) == 5 + + assert isinstance(page.blocks[0], TextBlock) + assert page.blocks[0].role == 'doc-pageheader' + assert page.blocks[0].text == 'This is the header' + + assert isinstance(page.blocks[1], TextBlock) + assert page.blocks[1].role == 'heading' + assert ( + page.blocks[1].text + == 'This is a test PDF to be used as input in unit tests' + ) + + assert isinstance(page.blocks[2], TextBlock) + assert page.blocks[2].role == 'heading' + assert page.blocks[2].text == 'This is a heading 1' + + assert isinstance(page.blocks[3], TextBlock) + assert page.blocks[3].role == 'paragraph' + assert page.blocks[3].text == 'This is a paragraph below heading 1' + + assert isinstance(page.blocks[4], TextBlock) + assert page.blocks[4].role == 'doc-pagefooter' + assert page.blocks[4].text == '1' diff --git a/uv.lock b/uv.lock index 3d9d41a..ea58672 100644 --- a/uv.lock +++ b/uv.lock @@ -78,6 +78,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/da/42/e921fccf5015463e32a3cf6ee7f980a6ed0f395ceeaa45060b61d86486c2/anyio-4.13.0-py3-none-any.whl", hash = "sha256:08b310f9e24a9594186fd75b4f73f4a4152069e3853f1ed8bfbf58369f4ad708", size = 114353, upload-time = "2026-03-24T12:59:08.246Z" }, ] +[[package]] +name = "attrs" +version = "26.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/9a/8e/82a0fe20a541c03148528be8cac2408564a6c9a0cc7e9171802bc1d26985/attrs-26.1.0.tar.gz", hash = "sha256:d03ceb89cb322a8fd706d4fb91940737b6642aa36998fe130a9bc96c985eff32", size = 952055, upload-time = "2026-03-19T14:22:25.026Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/64/b4/17d4b0b2a2dc85a6df63d1157e028ed19f90d4cd97c36717afef2bc2f395/attrs-26.1.0-py3-none-any.whl", hash = "sha256:c647aa4a12dfbad9333ca4e71fe62ddc36f4e63b2d260a37a8b83d2f043ac309", size = 67548, upload-time = "2026-03-19T14:22:23.645Z" }, +] + [[package]] name = "beautifulsoup4" version = "4.14.3" @@ -509,6 +518,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/66/66/150e406a2db5535533aa3c946de58f0371f2e412e23f050c704588023e6e/cymem-2.0.13-cp314-cp314t-win_arm64.whl", hash = "sha256:e9027764dc5f1999fb4b4cabee1d0322c59e330c0a6485b436a68275f614277f", size = 39715, upload-time = "2025-11-14T14:58:24.773Z" }, ] +[[package]] +name = "defusedxml" +version = "0.7.1" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/0f/d5/c66da9b79e5bdb124974bfe172b4daf3c984ebd9c2a06e2b8a4dc7331c72/defusedxml-0.7.1.tar.gz", hash = "sha256:1bb3032db185915b62d7c6209c5a8792be6a32ab2fedacc84e01b52c51aa3e69", size = 75520, upload-time = "2021-03-08T10:59:26.269Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/07/6c/aa3f2f849e01cb6a001cd8554a88d4c77c5c1a31c95bdf1cf9301e6d9ef4/defusedxml-0.7.1-py2.py3-none-any.whl", hash = "sha256:a352e7e428770286cc899e2542b6cdaedb2b4953ff269a210103ec58f6198a61", size = 25604, upload-time = "2021-03-08T10:59:24.45Z" }, +] + [[package]] name = "deprecated" version = "1.3.1" @@ -530,6 +548,54 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/12/b3/231ffd4ab1fc9d679809f356cebee130ac7daa00d6d6f3206dd4fd137e9e/distro-1.9.0-py3-none-any.whl", hash = "sha256:7bffd925d65168f85027d8da9af6bddab658135b840670a223589bc0c8ef02b2", size = 20277, upload-time = "2023-12-24T09:54:30.421Z" }, ] +[[package]] +name = "docling-core" +version = "2.75.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "defusedxml" }, + { name = "jsonref" }, + { name = "jsonschema" }, + { name = "latex2mathml" }, + { name = "pandas" }, + { name = "pillow" }, + { name = "pydantic" }, + { name = "pydantic-settings" }, + { name = "pyyaml" }, + { name = "tabulate" }, + { name = "typer" }, + { name = "typing-extensions" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/c3/39/179e119794e65e2376ce3d2224693b0e25705a5f1b26bbccaf404c4ce902/docling_core-2.75.0.tar.gz", hash = "sha256:7961be3c3f58855324b081fce9e1231b892da7c61d6babbaf3d49c28387eb782", size = 320615, upload-time = "2026-05-12T14:55:04.153Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/33/f40d2faebda5bd40cd4e2803b710dd7e26dde5150d5395379c2269fc04e8/docling_core-2.75.0-py3-none-any.whl", hash = "sha256:60f7bc4025f6511ba82eeb0aa677e756e9d3bf069d6f207c6ef2fb8be3176f32", size = 279045, upload-time = "2026-05-12T14:55:02.371Z" }, +] + +[[package]] +name = "docling-slim" +version = "2.93.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "certifi" }, + { name = "docling-core" }, + { name = "filetype" }, + { name = "pluggy" }, + { name = "pydantic" }, + { name = "pydantic-settings" }, + { name = "requests" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/c4/51/2ee874abcd62b990f0a86abec2e3b87b4cc00731b675ed446fefff6199d9/docling_slim-2.93.0.tar.gz", hash = "sha256:2962f4fc5bdf9dd6d67d6f36f09334f7187039985c0fd4b2d4b1d375e4799157", size = 390036, upload-time = "2026-05-07T11:54:14.562Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/f3/3f/f2195f79a62fd6cd10c768c04c758c9d6bd0c804a46175d0f3f10a784fca/docling_slim-2.93.0-py3-none-any.whl", hash = "sha256:98e3db67f7976f051f132e6a6e04f73e7fcd4017877eddf9e849e0969c39fbe7", size = 506046, upload-time = "2026-05-07T11:54:10.884Z" }, +] + +[package.optional-dependencies] +service-client = [ + { name = "httpx" }, + { name = "websockets" }, +] + [[package]] name = "emoji" version = "2.15.0" @@ -895,6 +961,42 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" }, ] +[[package]] +name = "jsonref" +version = "1.1.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/aa/0d/c1f3277e90ccdb50d33ed5ba1ec5b3f0a242ed8c1b1a85d3afeb68464dca/jsonref-1.1.0.tar.gz", hash = "sha256:32fe8e1d85af0fdefbebce950af85590b22b60f9e95443176adbde4e1ecea552", size = 8814, upload-time = "2023-01-16T16:10:04.455Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0c/ec/e1db9922bceb168197a558a2b8c03a7963f1afe93517ddd3cf99f202f996/jsonref-1.1.0-py3-none-any.whl", hash = "sha256:590dc7773df6c21cbf948b5dac07a72a251db28b0238ceecce0a2abfa8ec30a9", size = 9425, upload-time = "2023-01-16T16:10:02.255Z" }, +] + +[[package]] +name = "jsonschema" +version = "4.26.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "attrs" }, + { name = "jsonschema-specifications" }, + { name = "referencing" }, + { name = "rpds-py" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b3/fc/e067678238fa451312d4c62bf6e6cf5ec56375422aee02f9cb5f909b3047/jsonschema-4.26.0.tar.gz", hash = "sha256:0c26707e2efad8aa1bfc5b7ce170f3fccc2e4918ff85989ba9ffa9facb2be326", size = 366583, upload-time = "2026-01-07T13:41:07.246Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/69/90/f63fb5873511e014207a475e2bb4e8b2e570d655b00ac19a9a0ca0a385ee/jsonschema-4.26.0-py3-none-any.whl", hash = "sha256:d489f15263b8d200f8387e64b4c3a75f06629559fb73deb8fdfb525f2dab50ce", size = 90630, upload-time = "2026-01-07T13:41:05.306Z" }, +] + +[[package]] +name = "jsonschema-specifications" +version = "2025.9.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "referencing" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/19/74/a633ee74eb36c44aa6d1095e7cc5569bebf04342ee146178e2d36600708b/jsonschema_specifications-2025.9.1.tar.gz", hash = "sha256:b540987f239e745613c7a9176f3edb72b832a4ac465cf02712288397832b5e8d", size = 32855, upload-time = "2025-09-08T01:34:59.186Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/41/45/1a4ed80516f02155c51f51e8cedb3c1902296743db0bbc66608a0db2814f/jsonschema_specifications-2025.9.1-py3-none-any.whl", hash = "sha256:98802fee3a11ee76ecaca44429fda8a41bff98b00a0f2838151b113f210cc6fe", size = 18437, upload-time = "2025-09-08T01:34:57.871Z" }, +] + [[package]] name = "kiwisolver" version = "1.5.0" @@ -1007,6 +1109,15 @@ dependencies = [ ] sdist = { url = "https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz", hash = "sha256:cbc1fef89f8d062739774bd51eda3da3274006b3661d199c2655f6b3f6d605a0", size = 981474, upload-time = "2021-05-07T07:54:13.562Z" } +[[package]] +name = "latex2mathml" +version = "3.81.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/3b/62/35bb816c5c19d4d0cde5bdfb82ebb996306243d5f94e03f201658c629960/latex2mathml-3.81.0.tar.gz", hash = "sha256:4b959cdc3cac8686bc0e3e5aece8127dfb1b81ca1241bed8e00ef31b82bb4022", size = 77584, upload-time = "2026-04-15T00:55:27.977Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/e8/b1/c488b530994c4f68e46efa99a4d6ca6741aaf158e35779fe6c4d8a9a427d/latex2mathml-3.81.0-py3-none-any.whl", hash = "sha256:d317710393fe20579aea39cfe8928fa2ad9b8780896e585326c75e89c1d1d1a4", size = 79185, upload-time = "2026-04-15T00:55:29.301Z" }, +] + [[package]] name = "linkify-it-py" version = "2.1.0" @@ -1849,9 +1960,9 @@ name = "pandas" version = "3.0.2" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "numpy", marker = "python_full_version < '3.13' or sys_platform != 'win32'" }, - { name = "python-dateutil", marker = "python_full_version < '3.13' or sys_platform != 'win32'" }, - { name = "tzdata", marker = "(python_full_version < '3.13' and sys_platform == 'win32') or sys_platform == 'emscripten'" }, + { name = "numpy" }, + { name = "python-dateutil" }, + { name = "tzdata", marker = "sys_platform == 'emscripten' or sys_platform == 'win32'" }, ] sdist = { url = "https://files.pythonhosted.org/packages/da/99/b342345300f13440fe9fe385c3c481e2d9a595ee3bab4d3219247ac94e9a/pandas-3.0.2.tar.gz", hash = "sha256:f4753e73e34c8d83221ba58f232433fca2748be8b18dbca02d242ed153945043", size = 4645855, upload-time = "2026-03-31T06:48:30.816Z" } wheels = [ @@ -1901,6 +2012,7 @@ name = "parxy" version = "0.12.1" source = { editable = "." } dependencies = [ + { name = "httpx" }, { name = "importlib-resources" }, { name = "opentelemetry-api" }, { name = "opentelemetry-exporter-otlp" }, @@ -1918,6 +2030,7 @@ dependencies = [ [package.optional-dependencies] all = [ + { name = "docling-slim", extra = ["service-client"] }, { name = "landingai-ade" }, { name = "llama-cloud" }, { name = "llmwhisperer-client" }, @@ -1927,6 +2040,9 @@ all = [ { name = "textual" }, { name = "unstructured", extra = ["pdf"] }, ] +docling = [ + { name = "docling-slim", extra = ["service-client"] }, +] landingai = [ { name = "landingai-ade" }, ] @@ -1960,6 +2076,9 @@ dev = [ [package.metadata] requires-dist = [ + { name = "docling-slim", extras = ["service-client"], marker = "extra == 'all'", specifier = ">=2.93.0" }, + { name = "docling-slim", extras = ["service-client"], marker = "extra == 'docling'", specifier = ">=2.93.0" }, + { name = "httpx", specifier = ">=0.28.0" }, { name = "importlib-resources", specifier = ">=6.1.3" }, { name = "landingai-ade", marker = "extra == 'all'", specifier = ">=0.15.1" }, { name = "landingai-ade", marker = "extra == 'landingai'", specifier = ">=0.15.1" }, @@ -1990,7 +2109,7 @@ requires-dist = [ { name = "unstructured", extras = ["pdf"], marker = "extra == 'unstructured-local'", specifier = ">=0.18.13" }, { name = "validators", specifier = ">=0.35.0" }, ] -provides-extras = ["llama", "llmwhisperer", "unstructured-local", "landingai", "tui", "pypdfium2", "pdfplumber", "pdfminer", "all"] +provides-extras = ["llama", "llmwhisperer", "unstructured-local", "landingai", "tui", "pypdfium2", "pdfplumber", "pdfminer", "docling", "all"] [package.metadata.requires-dev] dev = [ @@ -2524,7 +2643,7 @@ name = "python-dateutil" version = "2.9.0.post0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "six", marker = "python_full_version < '3.13' or sys_platform != 'win32'" }, + { name = "six" }, ] sdist = { url = "https://files.pythonhosted.org/packages/66/c0/0c8b6ad9f17a802ee498c46e004a0eb49bc148f2fd230864601a86dcf6db/python-dateutil-2.9.0.post0.tar.gz", hash = "sha256:37dd54208da7e1cd875388217d5e00ebd4179249f90fb72437e91a35459a0ad3", size = 342432, upload-time = "2024-03-01T18:36:20.211Z" } wheels = [ @@ -2681,6 +2800,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/70/a6/51fc1b0e61e3326e1c68a61cfd0c6b3c34c843681c4b1eefbf0596f59162/rapidfuzz-3.14.5-cp314-cp314t-win_arm64.whl", hash = "sha256:3e91dcd2549b8f8d843f98ba03a17e01f3d8b72ce942adbbb6761bc58ffce813", size = 855409, upload-time = "2026-04-07T11:16:15.787Z" }, ] +[[package]] +name = "referencing" +version = "0.37.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "attrs" }, + { name = "rpds-py" }, + { name = "typing-extensions", marker = "python_full_version < '3.13'" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/22/f5/df4e9027acead3ecc63e50fe1e36aca1523e1719559c499951bb4b53188f/referencing-0.37.0.tar.gz", hash = "sha256:44aefc3142c5b842538163acb373e24cce6632bd54bdb01b21ad5863489f50d8", size = 78036, upload-time = "2025-10-13T15:30:48.871Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/2c/58/ca301544e1fa93ed4f80d724bf5b194f6e4b945841c5bfd555878eea9fcb/referencing-0.37.0-py3-none-any.whl", hash = "sha256:381329a9f99628c9069361716891d34ad94af76e461dcb0335825aecc7692231", size = 26766, upload-time = "2025-10-13T15:30:47.625Z" }, +] + [[package]] name = "regex" version = "2026.4.4" @@ -2809,6 +2942,87 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/82/3b/64d4899d73f91ba49a8c18a8ff3f0ea8f1c1d75481760df8c68ef5235bf5/rich-15.0.0-py3-none-any.whl", hash = "sha256:33bd4ef74232fb73fe9279a257718407f169c09b78a87ad3d296f548e27de0bb", size = 310654, upload-time = "2026-04-12T08:24:02.83Z" }, ] +[[package]] +name = "rpds-py" +version = "0.30.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/20/af/3f2f423103f1113b36230496629986e0ef7e199d2aa8392452b484b38ced/rpds_py-0.30.0.tar.gz", hash = "sha256:dd8ff7cf90014af0c0f787eea34794ebf6415242ee1d6fa91eaba725cc441e84", size = 69469, upload-time = "2025-11-30T20:24:38.837Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/03/e7/98a2f4ac921d82f33e03f3835f5bf3a4a40aa1bfdc57975e74a97b2b4bdd/rpds_py-0.30.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:a161f20d9a43006833cd7068375a94d035714d73a172b681d8881820600abfad", size = 375086, upload-time = "2025-11-30T20:22:17.93Z" }, + { url = "https://files.pythonhosted.org/packages/4d/a1/bca7fd3d452b272e13335db8d6b0b3ecde0f90ad6f16f3328c6fb150c889/rpds_py-0.30.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:6abc8880d9d036ecaafe709079969f56e876fcf107f7a8e9920ba6d5a3878d05", size = 359053, upload-time = "2025-11-30T20:22:19.297Z" }, + { url = "https://files.pythonhosted.org/packages/65/1c/ae157e83a6357eceff62ba7e52113e3ec4834a84cfe07fa4b0757a7d105f/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ca28829ae5f5d569bb62a79512c842a03a12576375d5ece7d2cadf8abe96ec28", size = 390763, upload-time = "2025-11-30T20:22:21.661Z" }, + { url = "https://files.pythonhosted.org/packages/d4/36/eb2eb8515e2ad24c0bd43c3ee9cd74c33f7ca6430755ccdb240fd3144c44/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a1010ed9524c73b94d15919ca4d41d8780980e1765babf85f9a2f90d247153dd", size = 408951, upload-time = "2025-11-30T20:22:23.408Z" }, + { url = "https://files.pythonhosted.org/packages/d6/65/ad8dc1784a331fabbd740ef6f71ce2198c7ed0890dab595adb9ea2d775a1/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f8d1736cfb49381ba528cd5baa46f82fdc65c06e843dab24dd70b63d09121b3f", size = 514622, upload-time = "2025-11-30T20:22:25.16Z" }, + { url = "https://files.pythonhosted.org/packages/63/8e/0cfa7ae158e15e143fe03993b5bcd743a59f541f5952e1546b1ac1b5fd45/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d948b135c4693daff7bc2dcfc4ec57237a29bd37e60c2fabf5aff2bbacf3e2f1", size = 414492, upload-time = "2025-11-30T20:22:26.505Z" }, + { url = "https://files.pythonhosted.org/packages/60/1b/6f8f29f3f995c7ffdde46a626ddccd7c63aefc0efae881dc13b6e5d5bb16/rpds_py-0.30.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47f236970bccb2233267d89173d3ad2703cd36a0e2a6e92d0560d333871a3d23", size = 394080, upload-time = "2025-11-30T20:22:27.934Z" }, + { url = "https://files.pythonhosted.org/packages/6d/d5/a266341051a7a3ca2f4b750a3aa4abc986378431fc2da508c5034d081b70/rpds_py-0.30.0-cp312-cp312-manylinux_2_31_riscv64.whl", hash = "sha256:2e6ecb5a5bcacf59c3f912155044479af1d0b6681280048b338b28e364aca1f6", size = 408680, upload-time = "2025-11-30T20:22:29.341Z" }, + { url = "https://files.pythonhosted.org/packages/10/3b/71b725851df9ab7a7a4e33cf36d241933da66040d195a84781f49c50490c/rpds_py-0.30.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a8fa71a2e078c527c3e9dc9fc5a98c9db40bcc8a92b4e8858e36d329f8684b51", size = 423589, upload-time = "2025-11-30T20:22:31.469Z" }, + { url = "https://files.pythonhosted.org/packages/00/2b/e59e58c544dc9bd8bd8384ecdb8ea91f6727f0e37a7131baeff8d6f51661/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:73c67f2db7bc334e518d097c6d1e6fed021bbc9b7d678d6cc433478365d1d5f5", size = 573289, upload-time = "2025-11-30T20:22:32.997Z" }, + { url = "https://files.pythonhosted.org/packages/da/3e/a18e6f5b460893172a7d6a680e86d3b6bc87a54c1f0b03446a3c8c7b588f/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:5ba103fb455be00f3b1c2076c9d4264bfcb037c976167a6047ed82f23153f02e", size = 599737, upload-time = "2025-11-30T20:22:34.419Z" }, + { url = "https://files.pythonhosted.org/packages/5c/e2/714694e4b87b85a18e2c243614974413c60aa107fd815b8cbc42b873d1d7/rpds_py-0.30.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7cee9c752c0364588353e627da8a7e808a66873672bcb5f52890c33fd965b394", size = 563120, upload-time = "2025-11-30T20:22:35.903Z" }, + { url = "https://files.pythonhosted.org/packages/6f/ab/d5d5e3bcedb0a77f4f613706b750e50a5a3ba1c15ccd3665ecc636c968fd/rpds_py-0.30.0-cp312-cp312-win32.whl", hash = "sha256:1ab5b83dbcf55acc8b08fc62b796ef672c457b17dbd7820a11d6c52c06839bdf", size = 223782, upload-time = "2025-11-30T20:22:37.271Z" }, + { url = "https://files.pythonhosted.org/packages/39/3b/f786af9957306fdc38a74cef405b7b93180f481fb48453a114bb6465744a/rpds_py-0.30.0-cp312-cp312-win_amd64.whl", hash = "sha256:a090322ca841abd453d43456ac34db46e8b05fd9b3b4ac0c78bcde8b089f959b", size = 240463, upload-time = "2025-11-30T20:22:39.021Z" }, + { url = "https://files.pythonhosted.org/packages/f3/d2/b91dc748126c1559042cfe41990deb92c4ee3e2b415f6b5234969ffaf0cc/rpds_py-0.30.0-cp312-cp312-win_arm64.whl", hash = "sha256:669b1805bd639dd2989b281be2cfd951c6121b65e729d9b843e9639ef1fd555e", size = 230868, upload-time = "2025-11-30T20:22:40.493Z" }, + { url = "https://files.pythonhosted.org/packages/ed/dc/d61221eb88ff410de3c49143407f6f3147acf2538c86f2ab7ce65ae7d5f9/rpds_py-0.30.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:f83424d738204d9770830d35290ff3273fbb02b41f919870479fab14b9d303b2", size = 374887, upload-time = "2025-11-30T20:22:41.812Z" }, + { url = "https://files.pythonhosted.org/packages/fd/32/55fb50ae104061dbc564ef15cc43c013dc4a9f4527a1f4d99baddf56fe5f/rpds_py-0.30.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e7536cd91353c5273434b4e003cbda89034d67e7710eab8761fd918ec6c69cf8", size = 358904, upload-time = "2025-11-30T20:22:43.479Z" }, + { url = "https://files.pythonhosted.org/packages/58/70/faed8186300e3b9bdd138d0273109784eea2396c68458ed580f885dfe7ad/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2771c6c15973347f50fece41fc447c054b7ac2ae0502388ce3b6738cd366e3d4", size = 389945, upload-time = "2025-11-30T20:22:44.819Z" }, + { url = "https://files.pythonhosted.org/packages/bd/a8/073cac3ed2c6387df38f71296d002ab43496a96b92c823e76f46b8af0543/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:0a59119fc6e3f460315fe9d08149f8102aa322299deaa5cab5b40092345c2136", size = 407783, upload-time = "2025-11-30T20:22:46.103Z" }, + { url = "https://files.pythonhosted.org/packages/77/57/5999eb8c58671f1c11eba084115e77a8899d6e694d2a18f69f0ba471ec8b/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:76fec018282b4ead0364022e3c54b60bf368b9d926877957a8624b58419169b7", size = 515021, upload-time = "2025-11-30T20:22:47.458Z" }, + { url = "https://files.pythonhosted.org/packages/e0/af/5ab4833eadc36c0a8ed2bc5c0de0493c04f6c06de223170bd0798ff98ced/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:692bef75a5525db97318e8cd061542b5a79812d711ea03dbc1f6f8dbb0c5f0d2", size = 414589, upload-time = "2025-11-30T20:22:48.872Z" }, + { url = "https://files.pythonhosted.org/packages/b7/de/f7192e12b21b9e9a68a6d0f249b4af3fdcdff8418be0767a627564afa1f1/rpds_py-0.30.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9027da1ce107104c50c81383cae773ef5c24d296dd11c99e2629dbd7967a20c6", size = 394025, upload-time = "2025-11-30T20:22:50.196Z" }, + { url = "https://files.pythonhosted.org/packages/91/c4/fc70cd0249496493500e7cc2de87504f5aa6509de1e88623431fec76d4b6/rpds_py-0.30.0-cp313-cp313-manylinux_2_31_riscv64.whl", hash = "sha256:9cf69cdda1f5968a30a359aba2f7f9aa648a9ce4b580d6826437f2b291cfc86e", size = 408895, upload-time = "2025-11-30T20:22:51.87Z" }, + { url = "https://files.pythonhosted.org/packages/58/95/d9275b05ab96556fefff73a385813eb66032e4c99f411d0795372d9abcea/rpds_py-0.30.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:a4796a717bf12b9da9d3ad002519a86063dcac8988b030e405704ef7d74d2d9d", size = 422799, upload-time = "2025-11-30T20:22:53.341Z" }, + { url = "https://files.pythonhosted.org/packages/06/c1/3088fc04b6624eb12a57eb814f0d4997a44b0d208d6cace713033ff1a6ba/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:5d4c2aa7c50ad4728a094ebd5eb46c452e9cb7edbfdb18f9e1221f597a73e1e7", size = 572731, upload-time = "2025-11-30T20:22:54.778Z" }, + { url = "https://files.pythonhosted.org/packages/d8/42/c612a833183b39774e8ac8fecae81263a68b9583ee343db33ab571a7ce55/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:ba81a9203d07805435eb06f536d95a266c21e5b2dfbf6517748ca40c98d19e31", size = 599027, upload-time = "2025-11-30T20:22:56.212Z" }, + { url = "https://files.pythonhosted.org/packages/5f/60/525a50f45b01d70005403ae0e25f43c0384369ad24ffe46e8d9068b50086/rpds_py-0.30.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:945dccface01af02675628334f7cf49c2af4c1c904748efc5cf7bbdf0b579f95", size = 563020, upload-time = "2025-11-30T20:22:58.2Z" }, + { url = "https://files.pythonhosted.org/packages/0b/5d/47c4655e9bcd5ca907148535c10e7d489044243cc9941c16ed7cd53be91d/rpds_py-0.30.0-cp313-cp313-win32.whl", hash = "sha256:b40fb160a2db369a194cb27943582b38f79fc4887291417685f3ad693c5a1d5d", size = 223139, upload-time = "2025-11-30T20:23:00.209Z" }, + { url = "https://files.pythonhosted.org/packages/f2/e1/485132437d20aa4d3e1d8b3fb5a5e65aa8139f1e097080c2a8443201742c/rpds_py-0.30.0-cp313-cp313-win_amd64.whl", hash = "sha256:806f36b1b605e2d6a72716f321f20036b9489d29c51c91f4dd29a3e3afb73b15", size = 240224, upload-time = "2025-11-30T20:23:02.008Z" }, + { url = "https://files.pythonhosted.org/packages/24/95/ffd128ed1146a153d928617b0ef673960130be0009c77d8fbf0abe306713/rpds_py-0.30.0-cp313-cp313-win_arm64.whl", hash = "sha256:d96c2086587c7c30d44f31f42eae4eac89b60dabbac18c7669be3700f13c3ce1", size = 230645, upload-time = "2025-11-30T20:23:03.43Z" }, + { url = "https://files.pythonhosted.org/packages/ff/1b/b10de890a0def2a319a2626334a7f0ae388215eb60914dbac8a3bae54435/rpds_py-0.30.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:eb0b93f2e5c2189ee831ee43f156ed34e2a89a78a66b98cadad955972548be5a", size = 364443, upload-time = "2025-11-30T20:23:04.878Z" }, + { url = "https://files.pythonhosted.org/packages/0d/bf/27e39f5971dc4f305a4fb9c672ca06f290f7c4e261c568f3dea16a410d47/rpds_py-0.30.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:922e10f31f303c7c920da8981051ff6d8c1a56207dbdf330d9047f6d30b70e5e", size = 353375, upload-time = "2025-11-30T20:23:06.342Z" }, + { url = "https://files.pythonhosted.org/packages/40/58/442ada3bba6e8e6615fc00483135c14a7538d2ffac30e2d933ccf6852232/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cdc62c8286ba9bf7f47befdcea13ea0e26bf294bda99758fd90535cbaf408000", size = 383850, upload-time = "2025-11-30T20:23:07.825Z" }, + { url = "https://files.pythonhosted.org/packages/14/14/f59b0127409a33c6ef6f5c1ebd5ad8e32d7861c9c7adfa9a624fc3889f6c/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:47f9a91efc418b54fb8190a6b4aa7813a23fb79c51f4bb84e418f5476c38b8db", size = 392812, upload-time = "2025-11-30T20:23:09.228Z" }, + { url = "https://files.pythonhosted.org/packages/b3/66/e0be3e162ac299b3a22527e8913767d869e6cc75c46bd844aa43fb81ab62/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1f3587eb9b17f3789ad50824084fa6f81921bbf9a795826570bda82cb3ed91f2", size = 517841, upload-time = "2025-11-30T20:23:11.186Z" }, + { url = "https://files.pythonhosted.org/packages/3d/55/fa3b9cf31d0c963ecf1ba777f7cf4b2a2c976795ac430d24a1f43d25a6ba/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:39c02563fc592411c2c61d26b6c5fe1e51eaa44a75aa2c8735ca88b0d9599daa", size = 408149, upload-time = "2025-11-30T20:23:12.864Z" }, + { url = "https://files.pythonhosted.org/packages/60/ca/780cf3b1a32b18c0f05c441958d3758f02544f1d613abf9488cd78876378/rpds_py-0.30.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:51a1234d8febafdfd33a42d97da7a43f5dcb120c1060e352a3fbc0c6d36e2083", size = 383843, upload-time = "2025-11-30T20:23:14.638Z" }, + { url = "https://files.pythonhosted.org/packages/82/86/d5f2e04f2aa6247c613da0c1dd87fcd08fa17107e858193566048a1e2f0a/rpds_py-0.30.0-cp313-cp313t-manylinux_2_31_riscv64.whl", hash = "sha256:eb2c4071ab598733724c08221091e8d80e89064cd472819285a9ab0f24bcedb9", size = 396507, upload-time = "2025-11-30T20:23:16.105Z" }, + { url = "https://files.pythonhosted.org/packages/4b/9a/453255d2f769fe44e07ea9785c8347edaf867f7026872e76c1ad9f7bed92/rpds_py-0.30.0-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:6bdfdb946967d816e6adf9a3d8201bfad269c67efe6cefd7093ef959683c8de0", size = 414949, upload-time = "2025-11-30T20:23:17.539Z" }, + { url = "https://files.pythonhosted.org/packages/a3/31/622a86cdc0c45d6df0e9ccb6becdba5074735e7033c20e401a6d9d0e2ca0/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:c77afbd5f5250bf27bf516c7c4a016813eb2d3e116139aed0096940c5982da94", size = 565790, upload-time = "2025-11-30T20:23:19.029Z" }, + { url = "https://files.pythonhosted.org/packages/1c/5d/15bbf0fb4a3f58a3b1c67855ec1efcc4ceaef4e86644665fff03e1b66d8d/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:61046904275472a76c8c90c9ccee9013d70a6d0f73eecefd38c1ae7c39045a08", size = 590217, upload-time = "2025-11-30T20:23:20.885Z" }, + { url = "https://files.pythonhosted.org/packages/6d/61/21b8c41f68e60c8cc3b2e25644f0e3681926020f11d06ab0b78e3c6bbff1/rpds_py-0.30.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:4c5f36a861bc4b7da6516dbdf302c55313afa09b81931e8280361a4f6c9a2d27", size = 555806, upload-time = "2025-11-30T20:23:22.488Z" }, + { url = "https://files.pythonhosted.org/packages/f9/39/7e067bb06c31de48de3eb200f9fc7c58982a4d3db44b07e73963e10d3be9/rpds_py-0.30.0-cp313-cp313t-win32.whl", hash = "sha256:3d4a69de7a3e50ffc214ae16d79d8fbb0922972da0356dcf4d0fdca2878559c6", size = 211341, upload-time = "2025-11-30T20:23:24.449Z" }, + { url = "https://files.pythonhosted.org/packages/0a/4d/222ef0b46443cf4cf46764d9c630f3fe4abaa7245be9417e56e9f52b8f65/rpds_py-0.30.0-cp313-cp313t-win_amd64.whl", hash = "sha256:f14fc5df50a716f7ece6a80b6c78bb35ea2ca47c499e422aa4463455dd96d56d", size = 225768, upload-time = "2025-11-30T20:23:25.908Z" }, + { url = "https://files.pythonhosted.org/packages/86/81/dad16382ebbd3d0e0328776d8fd7ca94220e4fa0798d1dc5e7da48cb3201/rpds_py-0.30.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:68f19c879420aa08f61203801423f6cd5ac5f0ac4ac82a2368a9fcd6a9a075e0", size = 362099, upload-time = "2025-11-30T20:23:27.316Z" }, + { url = "https://files.pythonhosted.org/packages/2b/60/19f7884db5d5603edf3c6bce35408f45ad3e97e10007df0e17dd57af18f8/rpds_py-0.30.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:ec7c4490c672c1a0389d319b3a9cfcd098dcdc4783991553c332a15acf7249be", size = 353192, upload-time = "2025-11-30T20:23:29.151Z" }, + { url = "https://files.pythonhosted.org/packages/bf/c4/76eb0e1e72d1a9c4703c69607cec123c29028bff28ce41588792417098ac/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f251c812357a3fed308d684a5079ddfb9d933860fc6de89f2b7ab00da481e65f", size = 384080, upload-time = "2025-11-30T20:23:30.785Z" }, + { url = "https://files.pythonhosted.org/packages/72/87/87ea665e92f3298d1b26d78814721dc39ed8d2c74b86e83348d6b48a6f31/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ac98b175585ecf4c0348fd7b29c3864bda53b805c773cbf7bfdaffc8070c976f", size = 394841, upload-time = "2025-11-30T20:23:32.209Z" }, + { url = "https://files.pythonhosted.org/packages/77/ad/7783a89ca0587c15dcbf139b4a8364a872a25f861bdb88ed99f9b0dec985/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:3e62880792319dbeb7eb866547f2e35973289e7d5696c6e295476448f5b63c87", size = 516670, upload-time = "2025-11-30T20:23:33.742Z" }, + { url = "https://files.pythonhosted.org/packages/5b/3c/2882bdac942bd2172f3da574eab16f309ae10a3925644e969536553cb4ee/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4e7fc54e0900ab35d041b0601431b0a0eb495f0851a0639b6ef90f7741b39a18", size = 408005, upload-time = "2025-11-30T20:23:35.253Z" }, + { url = "https://files.pythonhosted.org/packages/ce/81/9a91c0111ce1758c92516a3e44776920b579d9a7c09b2b06b642d4de3f0f/rpds_py-0.30.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47e77dc9822d3ad616c3d5759ea5631a75e5809d5a28707744ef79d7a1bcfcad", size = 382112, upload-time = "2025-11-30T20:23:36.842Z" }, + { url = "https://files.pythonhosted.org/packages/cf/8e/1da49d4a107027e5fbc64daeab96a0706361a2918da10cb41769244b805d/rpds_py-0.30.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:b4dc1a6ff022ff85ecafef7979a2c6eb423430e05f1165d6688234e62ba99a07", size = 399049, upload-time = "2025-11-30T20:23:38.343Z" }, + { url = "https://files.pythonhosted.org/packages/df/5a/7ee239b1aa48a127570ec03becbb29c9d5a9eb092febbd1699d567cae859/rpds_py-0.30.0-cp314-cp314-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:4559c972db3a360808309e06a74628b95eaccbf961c335c8fe0d590cf587456f", size = 415661, upload-time = "2025-11-30T20:23:40.263Z" }, + { url = "https://files.pythonhosted.org/packages/70/ea/caa143cf6b772f823bc7929a45da1fa83569ee49b11d18d0ada7f5ee6fd6/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:0ed177ed9bded28f8deb6ab40c183cd1192aa0de40c12f38be4d59cd33cb5c65", size = 565606, upload-time = "2025-11-30T20:23:42.186Z" }, + { url = "https://files.pythonhosted.org/packages/64/91/ac20ba2d69303f961ad8cf55bf7dbdb4763f627291ba3d0d7d67333cced9/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:ad1fa8db769b76ea911cb4e10f049d80bf518c104f15b3edb2371cc65375c46f", size = 591126, upload-time = "2025-11-30T20:23:44.086Z" }, + { url = "https://files.pythonhosted.org/packages/21/20/7ff5f3c8b00c8a95f75985128c26ba44503fb35b8e0259d812766ea966c7/rpds_py-0.30.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:46e83c697b1f1c72b50e5ee5adb4353eef7406fb3f2043d64c33f20ad1c2fc53", size = 553371, upload-time = "2025-11-30T20:23:46.004Z" }, + { url = "https://files.pythonhosted.org/packages/72/c7/81dadd7b27c8ee391c132a6b192111ca58d866577ce2d9b0ca157552cce0/rpds_py-0.30.0-cp314-cp314-win32.whl", hash = "sha256:ee454b2a007d57363c2dfd5b6ca4a5d7e2c518938f8ed3b706e37e5d470801ed", size = 215298, upload-time = "2025-11-30T20:23:47.696Z" }, + { url = "https://files.pythonhosted.org/packages/3e/d2/1aaac33287e8cfb07aab2e6b8ac1deca62f6f65411344f1433c55e6f3eb8/rpds_py-0.30.0-cp314-cp314-win_amd64.whl", hash = "sha256:95f0802447ac2d10bcc69f6dc28fe95fdf17940367b21d34e34c737870758950", size = 228604, upload-time = "2025-11-30T20:23:49.501Z" }, + { url = "https://files.pythonhosted.org/packages/e8/95/ab005315818cc519ad074cb7784dae60d939163108bd2b394e60dc7b5461/rpds_py-0.30.0-cp314-cp314-win_arm64.whl", hash = "sha256:613aa4771c99f03346e54c3f038e4cc574ac09a3ddfb0e8878487335e96dead6", size = 222391, upload-time = "2025-11-30T20:23:50.96Z" }, + { url = "https://files.pythonhosted.org/packages/9e/68/154fe0194d83b973cdedcdcc88947a2752411165930182ae41d983dcefa6/rpds_py-0.30.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:7e6ecfcb62edfd632e56983964e6884851786443739dbfe3582947e87274f7cb", size = 364868, upload-time = "2025-11-30T20:23:52.494Z" }, + { url = "https://files.pythonhosted.org/packages/83/69/8bbc8b07ec854d92a8b75668c24d2abcb1719ebf890f5604c61c9369a16f/rpds_py-0.30.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:a1d0bc22a7cdc173fedebb73ef81e07faef93692b8c1ad3733b67e31e1b6e1b8", size = 353747, upload-time = "2025-11-30T20:23:54.036Z" }, + { url = "https://files.pythonhosted.org/packages/ab/00/ba2e50183dbd9abcce9497fa5149c62b4ff3e22d338a30d690f9af970561/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0d08f00679177226c4cb8c5265012eea897c8ca3b93f429e546600c971bcbae7", size = 383795, upload-time = "2025-11-30T20:23:55.556Z" }, + { url = "https://files.pythonhosted.org/packages/05/6f/86f0272b84926bcb0e4c972262f54223e8ecc556b3224d281e6598fc9268/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5965af57d5848192c13534f90f9dd16464f3c37aaf166cc1da1cae1fd5a34898", size = 393330, upload-time = "2025-11-30T20:23:57.033Z" }, + { url = "https://files.pythonhosted.org/packages/cb/e9/0e02bb2e6dc63d212641da45df2b0bf29699d01715913e0d0f017ee29438/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9a4e86e34e9ab6b667c27f3211ca48f73dba7cd3d90f8d5b11be56e5dbc3fb4e", size = 518194, upload-time = "2025-11-30T20:23:58.637Z" }, + { url = "https://files.pythonhosted.org/packages/ee/ca/be7bca14cf21513bdf9c0606aba17d1f389ea2b6987035eb4f62bd923f25/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e5d3e6b26f2c785d65cc25ef1e5267ccbe1b069c5c21b8cc724efee290554419", size = 408340, upload-time = "2025-11-30T20:24:00.2Z" }, + { url = "https://files.pythonhosted.org/packages/c2/c7/736e00ebf39ed81d75544c0da6ef7b0998f8201b369acf842f9a90dc8fce/rpds_py-0.30.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:626a7433c34566535b6e56a1b39a7b17ba961e97ce3b80ec62e6f1312c025551", size = 383765, upload-time = "2025-11-30T20:24:01.759Z" }, + { url = "https://files.pythonhosted.org/packages/4a/3f/da50dfde9956aaf365c4adc9533b100008ed31aea635f2b8d7b627e25b49/rpds_py-0.30.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:acd7eb3f4471577b9b5a41baf02a978e8bdeb08b4b355273994f8b87032000a8", size = 396834, upload-time = "2025-11-30T20:24:03.687Z" }, + { url = "https://files.pythonhosted.org/packages/4e/00/34bcc2565b6020eab2623349efbdec810676ad571995911f1abdae62a3a0/rpds_py-0.30.0-cp314-cp314t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:fe5fa731a1fa8a0a56b0977413f8cacac1768dad38d16b3a296712709476fbd5", size = 415470, upload-time = "2025-11-30T20:24:05.232Z" }, + { url = "https://files.pythonhosted.org/packages/8c/28/882e72b5b3e6f718d5453bd4d0d9cf8df36fddeb4ddbbab17869d5868616/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:74a3243a411126362712ee1524dfc90c650a503502f135d54d1b352bd01f2404", size = 565630, upload-time = "2025-11-30T20:24:06.878Z" }, + { url = "https://files.pythonhosted.org/packages/3b/97/04a65539c17692de5b85c6e293520fd01317fd878ea1995f0367d4532fb1/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:3e8eeb0544f2eb0d2581774be4c3410356eba189529a6b3e36bbbf9696175856", size = 591148, upload-time = "2025-11-30T20:24:08.445Z" }, + { url = "https://files.pythonhosted.org/packages/85/70/92482ccffb96f5441aab93e26c4d66489eb599efdcf96fad90c14bbfb976/rpds_py-0.30.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:dbd936cde57abfee19ab3213cf9c26be06d60750e60a8e4dd85d1ab12c8b1f40", size = 556030, upload-time = "2025-11-30T20:24:10.956Z" }, + { url = "https://files.pythonhosted.org/packages/20/53/7c7e784abfa500a2b6b583b147ee4bb5a2b3747a9166bab52fec4b5b5e7d/rpds_py-0.30.0-cp314-cp314t-win32.whl", hash = "sha256:dc824125c72246d924f7f796b4f63c1e9dc810c7d9e2355864b3c3a73d59ade0", size = 211570, upload-time = "2025-11-30T20:24:12.735Z" }, + { url = "https://files.pythonhosted.org/packages/d0/02/fa464cdfbe6b26e0600b62c528b72d8608f5cc49f96b8d6e38c95d60c676/rpds_py-0.30.0-cp314-cp314t-win_amd64.whl", hash = "sha256:27f4b0e92de5bfbc6f86e43959e6edd1425c33b5e69aab0984a72047f2bcf1e3", size = 226532, upload-time = "2025-11-30T20:24:14.634Z" }, +] + [[package]] name = "ruff" version = "0.15.12" @@ -3090,6 +3304,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" }, ] +[[package]] +name = "tabulate" +version = "0.10.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/46/58/8c37dea7bbf769b20d58e7ace7e5edfe65b849442b00ffcdd56be88697c6/tabulate-0.10.0.tar.gz", hash = "sha256:e2cfde8f79420f6deeffdeda9aaec3b6bc5abce947655d17ac662b126e48a60d", size = 91754, upload-time = "2026-03-04T18:55:34.402Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/99/55/db07de81b5c630da5cbf5c7df646580ca26dfaefa593667fc6f2fe016d2e/tabulate-0.10.0-py3-none-any.whl", hash = "sha256:f0b0622e567335c8fabaaa659f1b33bcb6ddfe2e496071b743aa113f8774f2d3", size = 39814, upload-time = "2026-03-04T18:55:31.284Z" }, +] + [[package]] name = "tenacity" version = "9.1.4" @@ -3342,7 +3565,7 @@ wheels = [ [[package]] name = "typer" -version = "0.25.0" +version = "0.24.2" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "annotated-doc" }, @@ -3350,9 +3573,9 @@ dependencies = [ { name = "rich" }, { name = "shellingham" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/7b/27/ede8cec7596e0041ba7e7b80b47d132562f56ff454313a16f6084e555c9f/typer-0.25.0.tar.gz", hash = "sha256:123eaf9f19bb40fd268310e12a542c0c6b4fab9c98d9d23342a01ff95e3ce930", size = 120150, upload-time = "2026-04-26T08:46:14.767Z" } +sdist = { url = "https://files.pythonhosted.org/packages/83/b8/9ebb531b6c2d377af08ac6746a5df3425b21853a5d2260876919b58a2a4a/typer-0.24.2.tar.gz", hash = "sha256:ec070dcfca1408e85ee203c6365001e818c3b7fffe686fd07ff2d68095ca0480", size = 119849, upload-time = "2026-04-22T17:45:34.413Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/9a/72/193d4e586ec5a4db834a36bbeb47641a62f951f114ffd0fe5b1b46e8d56f/typer-0.25.0-py3-none-any.whl", hash = "sha256:ac01b48823d3db9a83c9e164338057eadbb1c9957a2a6b4eeb486669c560b5dc", size = 55993, upload-time = "2026-04-26T08:46:15.889Z" }, + { url = "https://files.pythonhosted.org/packages/39/d1/9484b497e0a0410b901c12b8251c3e746e1e863f7d28419ffe06f7892fda/typer-0.24.2-py3-none-any.whl", hash = "sha256:b618bc3d721f9a8d30f3e05565be26416d06e9bcc29d49bc491dc26aba674fa8", size = 55977, upload-time = "2026-04-22T17:45:33.055Z" }, ] [[package]] @@ -3555,6 +3778,51 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/f4/24/2a3e3df732393fed8b3ebf2ec078f05546de641fe1b667ee316ec1dcf3b7/webencodings-0.5.1-py2.py3-none-any.whl", hash = "sha256:a0af1213f3c2226497a97e2b3aa01a7e4bee4f403f95be16fc9acd2947514a78", size = 11774, upload-time = "2017-04-05T20:21:32.581Z" }, ] +[[package]] +name = "websockets" +version = "16.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/04/24/4b2031d72e840ce4c1ccb255f693b15c334757fc50023e4db9537080b8c4/websockets-16.0.tar.gz", hash = "sha256:5f6261a5e56e8d5c42a4497b364ea24d94d9563e8fbd44e78ac40879c60179b5", size = 179346, upload-time = "2026-01-10T09:23:47.181Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/84/7b/bac442e6b96c9d25092695578dda82403c77936104b5682307bd4deb1ad4/websockets-16.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:71c989cbf3254fbd5e84d3bff31e4da39c43f884e64f2551d14bb3c186230f00", size = 177365, upload-time = "2026-01-10T09:22:46.787Z" }, + { url = "https://files.pythonhosted.org/packages/b0/fe/136ccece61bd690d9c1f715baaeefd953bb2360134de73519d5df19d29ca/websockets-16.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8b6e209ffee39ff1b6d0fa7bfef6de950c60dfb91b8fcead17da4ee539121a79", size = 175038, upload-time = "2026-01-10T09:22:47.999Z" }, + { url = "https://files.pythonhosted.org/packages/40/1e/9771421ac2286eaab95b8575b0cb701ae3663abf8b5e1f64f1fd90d0a673/websockets-16.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:86890e837d61574c92a97496d590968b23c2ef0aeb8a9bc9421d174cd378ae39", size = 175328, upload-time = "2026-01-10T09:22:49.809Z" }, + { url = "https://files.pythonhosted.org/packages/18/29/71729b4671f21e1eaa5d6573031ab810ad2936c8175f03f97f3ff164c802/websockets-16.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9b5aca38b67492ef518a8ab76851862488a478602229112c4b0d58d63a7a4d5c", size = 184915, upload-time = "2026-01-10T09:22:51.071Z" }, + { url = "https://files.pythonhosted.org/packages/97/bb/21c36b7dbbafc85d2d480cd65df02a1dc93bf76d97147605a8e27ff9409d/websockets-16.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e0334872c0a37b606418ac52f6ab9cfd17317ac26365f7f65e203e2d0d0d359f", size = 186152, upload-time = "2026-01-10T09:22:52.224Z" }, + { url = "https://files.pythonhosted.org/packages/4a/34/9bf8df0c0cf88fa7bfe36678dc7b02970c9a7d5e065a3099292db87b1be2/websockets-16.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:a0b31e0b424cc6b5a04b8838bbaec1688834b2383256688cf47eb97412531da1", size = 185583, upload-time = "2026-01-10T09:22:53.443Z" }, + { url = "https://files.pythonhosted.org/packages/47/88/4dd516068e1a3d6ab3c7c183288404cd424a9a02d585efbac226cb61ff2d/websockets-16.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:485c49116d0af10ac698623c513c1cc01c9446c058a4e61e3bf6c19dff7335a2", size = 184880, upload-time = "2026-01-10T09:22:55.033Z" }, + { url = "https://files.pythonhosted.org/packages/91/d6/7d4553ad4bf1c0421e1ebd4b18de5d9098383b5caa1d937b63df8d04b565/websockets-16.0-cp312-cp312-win32.whl", hash = "sha256:eaded469f5e5b7294e2bdca0ab06becb6756ea86894a47806456089298813c89", size = 178261, upload-time = "2026-01-10T09:22:56.251Z" }, + { url = "https://files.pythonhosted.org/packages/c3/f0/f3a17365441ed1c27f850a80b2bc680a0fa9505d733fe152fdf5e98c1c0b/websockets-16.0-cp312-cp312-win_amd64.whl", hash = "sha256:5569417dc80977fc8c2d43a86f78e0a5a22fee17565d78621b6bb264a115d4ea", size = 178693, upload-time = "2026-01-10T09:22:57.478Z" }, + { url = "https://files.pythonhosted.org/packages/cc/9c/baa8456050d1c1b08dd0ec7346026668cbc6f145ab4e314d707bb845bf0d/websockets-16.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:878b336ac47938b474c8f982ac2f7266a540adc3fa4ad74ae96fea9823a02cc9", size = 177364, upload-time = "2026-01-10T09:22:59.333Z" }, + { url = "https://files.pythonhosted.org/packages/7e/0c/8811fc53e9bcff68fe7de2bcbe75116a8d959ac699a3200f4847a8925210/websockets-16.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:52a0fec0e6c8d9a784c2c78276a48a2bdf099e4ccc2a4cad53b27718dbfd0230", size = 175039, upload-time = "2026-01-10T09:23:01.171Z" }, + { url = "https://files.pythonhosted.org/packages/aa/82/39a5f910cb99ec0b59e482971238c845af9220d3ab9fa76dd9162cda9d62/websockets-16.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e6578ed5b6981005df1860a56e3617f14a6c307e6a71b4fff8c48fdc50f3ed2c", size = 175323, upload-time = "2026-01-10T09:23:02.341Z" }, + { url = "https://files.pythonhosted.org/packages/bd/28/0a25ee5342eb5d5f297d992a77e56892ecb65e7854c7898fb7d35e9b33bd/websockets-16.0-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:95724e638f0f9c350bb1c2b0a7ad0e83d9cc0c9259f3ea94e40d7b02a2179ae5", size = 184975, upload-time = "2026-01-10T09:23:03.756Z" }, + { url = "https://files.pythonhosted.org/packages/f9/66/27ea52741752f5107c2e41fda05e8395a682a1e11c4e592a809a90c6a506/websockets-16.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c0204dc62a89dc9d50d682412c10b3542d748260d743500a85c13cd1ee4bde82", size = 186203, upload-time = "2026-01-10T09:23:05.01Z" }, + { url = "https://files.pythonhosted.org/packages/37/e5/8e32857371406a757816a2b471939d51c463509be73fa538216ea52b792a/websockets-16.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:52ac480f44d32970d66763115edea932f1c5b1312de36df06d6b219f6741eed8", size = 185653, upload-time = "2026-01-10T09:23:06.301Z" }, + { url = "https://files.pythonhosted.org/packages/9b/67/f926bac29882894669368dc73f4da900fcdf47955d0a0185d60103df5737/websockets-16.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6e5a82b677f8f6f59e8dfc34ec06ca6b5b48bc4fcda346acd093694cc2c24d8f", size = 184920, upload-time = "2026-01-10T09:23:07.492Z" }, + { url = "https://files.pythonhosted.org/packages/3c/a1/3d6ccdcd125b0a42a311bcd15a7f705d688f73b2a22d8cf1c0875d35d34a/websockets-16.0-cp313-cp313-win32.whl", hash = "sha256:abf050a199613f64c886ea10f38b47770a65154dc37181bfaff70c160f45315a", size = 178255, upload-time = "2026-01-10T09:23:09.245Z" }, + { url = "https://files.pythonhosted.org/packages/6b/ae/90366304d7c2ce80f9b826096a9e9048b4bb760e44d3b873bb272cba696b/websockets-16.0-cp313-cp313-win_amd64.whl", hash = "sha256:3425ac5cf448801335d6fdc7ae1eb22072055417a96cc6b31b3861f455fbc156", size = 178689, upload-time = "2026-01-10T09:23:10.483Z" }, + { url = "https://files.pythonhosted.org/packages/f3/1d/e88022630271f5bd349ed82417136281931e558d628dd52c4d8621b4a0b2/websockets-16.0-cp314-cp314-macosx_10_15_universal2.whl", hash = "sha256:8cc451a50f2aee53042ac52d2d053d08bf89bcb31ae799cb4487587661c038a0", size = 177406, upload-time = "2026-01-10T09:23:12.178Z" }, + { url = "https://files.pythonhosted.org/packages/f2/78/e63be1bf0724eeb4616efb1ae1c9044f7c3953b7957799abb5915bffd38e/websockets-16.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:daa3b6ff70a9241cf6c7fc9e949d41232d9d7d26fd3522b1ad2b4d62487e9904", size = 175085, upload-time = "2026-01-10T09:23:13.511Z" }, + { url = "https://files.pythonhosted.org/packages/bb/f4/d3c9220d818ee955ae390cf319a7c7a467beceb24f05ee7aaaa2414345ba/websockets-16.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:fd3cb4adb94a2a6e2b7c0d8d05cb94e6f1c81a0cf9dc2694fb65c7e8d94c42e4", size = 175328, upload-time = "2026-01-10T09:23:14.727Z" }, + { url = "https://files.pythonhosted.org/packages/63/bc/d3e208028de777087e6fb2b122051a6ff7bbcca0d6df9d9c2bf1dd869ae9/websockets-16.0-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:781caf5e8eee67f663126490c2f96f40906594cb86b408a703630f95550a8c3e", size = 185044, upload-time = "2026-01-10T09:23:15.939Z" }, + { url = "https://files.pythonhosted.org/packages/ad/6e/9a0927ac24bd33a0a9af834d89e0abc7cfd8e13bed17a86407a66773cc0e/websockets-16.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:caab51a72c51973ca21fa8a18bd8165e1a0183f1ac7066a182ff27107b71e1a4", size = 186279, upload-time = "2026-01-10T09:23:17.148Z" }, + { url = "https://files.pythonhosted.org/packages/b9/ca/bf1c68440d7a868180e11be653c85959502efd3a709323230314fda6e0b3/websockets-16.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:19c4dc84098e523fd63711e563077d39e90ec6702aff4b5d9e344a60cb3c0cb1", size = 185711, upload-time = "2026-01-10T09:23:18.372Z" }, + { url = "https://files.pythonhosted.org/packages/c4/f8/fdc34643a989561f217bb477cbc47a3a07212cbda91c0e4389c43c296ebf/websockets-16.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:a5e18a238a2b2249c9a9235466b90e96ae4795672598a58772dd806edc7ac6d3", size = 184982, upload-time = "2026-01-10T09:23:19.652Z" }, + { url = "https://files.pythonhosted.org/packages/dd/d1/574fa27e233764dbac9c52730d63fcf2823b16f0856b3329fc6268d6ae4f/websockets-16.0-cp314-cp314-win32.whl", hash = "sha256:a069d734c4a043182729edd3e9f247c3b2a4035415a9172fd0f1b71658a320a8", size = 177915, upload-time = "2026-01-10T09:23:21.458Z" }, + { url = "https://files.pythonhosted.org/packages/8a/f1/ae6b937bf3126b5134ce1f482365fde31a357c784ac51852978768b5eff4/websockets-16.0-cp314-cp314-win_amd64.whl", hash = "sha256:c0ee0e63f23914732c6d7e0cce24915c48f3f1512ec1d079ed01fc629dab269d", size = 178381, upload-time = "2026-01-10T09:23:22.715Z" }, + { url = "https://files.pythonhosted.org/packages/06/9b/f791d1db48403e1f0a27577a6beb37afae94254a8c6f08be4a23e4930bc0/websockets-16.0-cp314-cp314t-macosx_10_15_universal2.whl", hash = "sha256:a35539cacc3febb22b8f4d4a99cc79b104226a756aa7400adc722e83b0d03244", size = 177737, upload-time = "2026-01-10T09:23:24.523Z" }, + { url = "https://files.pythonhosted.org/packages/bd/40/53ad02341fa33b3ce489023f635367a4ac98b73570102ad2cdd770dacc9a/websockets-16.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:b784ca5de850f4ce93ec85d3269d24d4c82f22b7212023c974c401d4980ebc5e", size = 175268, upload-time = "2026-01-10T09:23:25.781Z" }, + { url = "https://files.pythonhosted.org/packages/74/9b/6158d4e459b984f949dcbbb0c5d270154c7618e11c01029b9bbd1bb4c4f9/websockets-16.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:569d01a4e7fba956c5ae4fc988f0d4e187900f5497ce46339c996dbf24f17641", size = 175486, upload-time = "2026-01-10T09:23:27.033Z" }, + { url = "https://files.pythonhosted.org/packages/e5/2d/7583b30208b639c8090206f95073646c2c9ffd66f44df967981a64f849ad/websockets-16.0-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:50f23cdd8343b984957e4077839841146f67a3d31ab0d00e6b824e74c5b2f6e8", size = 185331, upload-time = "2026-01-10T09:23:28.259Z" }, + { url = "https://files.pythonhosted.org/packages/45/b0/cce3784eb519b7b5ad680d14b9673a31ab8dcb7aad8b64d81709d2430aa8/websockets-16.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:152284a83a00c59b759697b7f9e9cddf4e3c7861dd0d964b472b70f78f89e80e", size = 186501, upload-time = "2026-01-10T09:23:29.449Z" }, + { url = "https://files.pythonhosted.org/packages/19/60/b8ebe4c7e89fb5f6cdf080623c9d92789a53636950f7abacfc33fe2b3135/websockets-16.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:bc59589ab64b0022385f429b94697348a6a234e8ce22544e3681b2e9331b5944", size = 186062, upload-time = "2026-01-10T09:23:31.368Z" }, + { url = "https://files.pythonhosted.org/packages/88/a8/a080593f89b0138b6cba1b28f8df5673b5506f72879322288b031337c0b8/websockets-16.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:32da954ffa2814258030e5a57bc73a3635463238e797c7375dc8091327434206", size = 185356, upload-time = "2026-01-10T09:23:32.627Z" }, + { url = "https://files.pythonhosted.org/packages/c2/b6/b9afed2afadddaf5ebb2afa801abf4b0868f42f8539bfe4b071b5266c9fe/websockets-16.0-cp314-cp314t-win32.whl", hash = "sha256:5a4b4cc550cb665dd8a47f868c8d04c8230f857363ad3c9caf7a0c3bf8c61ca6", size = 178085, upload-time = "2026-01-10T09:23:33.816Z" }, + { url = "https://files.pythonhosted.org/packages/9f/3e/28135a24e384493fa804216b79a6a6759a38cc4ff59118787b9fb693df93/websockets-16.0-cp314-cp314t-win_amd64.whl", hash = "sha256:b14dc141ed6d2dde437cddb216004bcac6a1df0935d79656387bd41632ba0bbd", size = 178531, upload-time = "2026-01-10T09:23:35.016Z" }, + { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" }, +] + [[package]] name = "wrapt" version = "2.1.2"