OneOffTech · avvertix · Apr 15, 2026 · Apr 15, 2026 · Apr 15, 2026 · Apr 15, 2026
diff --git a/.github/workflows/update-docs.yml b/.github/workflows/update-docs.yml
@@ -0,0 +1,38 @@
+name: Update reference docs
+
+on:
+  pull_request:
+    paths:
+      - "src/parxy_cli/commands/**"
+      - "src/parxy_cli/cli.py"
+      - "src/parxy_core/models/config.py"
+      - "scripts/generate_docs.py"
+
+jobs:
+  update-docs:
+    name: Regenerate reference docs
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 1
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v7.3.1
+        with:
+          enable-cache: true
+
+      - name: Install dependencies
+        run: uv sync
+
+      - name: Generate reference docs
+        run: uv run python scripts/generate_docs.py
+
+      - name: Commit if changed
+        uses: stefanzweifel/git-auto-commit-action@v7.1.0
+        with:
+          commit_message: "docs: sync CLI and configuration reference"
+          file_pattern: "docs/reference/*.md"
diff --git a/docs/howto/add_new_parser.md b/docs/howto/add_new_parser.md
@@ -1,3 +1,8 @@
+---
+title: Add a new parser
+description: How to implement a custom driver, register it with Parxy at runtime, and make it available alongside the built-in parsers.
+---
+
 # How to Add a New Parser to Parxy
 
 Parxy is designed to be **extensible** — you can integrate new parsing backends (drivers) or create custom variants of existing ones directly from your Python code, without modifying the core library.

diff --git a/docs/howto/batch_processing.md b/docs/howto/batch_processing.md
@@ -1,3 +1,8 @@
+---
+title: Process multiple documents in parallel
+description: How to use Parxy's batch API to parse many documents concurrently, control worker count, handle per-file errors, and collect structured results.
+---
+
 # How to Process Multiple Documents in Parallel
 
 Parxy provides a `batch` method for processing multiple documents in parallel, with support for per-file configuration. This is useful when you need to parse many documents efficiently or when different documents require different parsing strategies.

diff --git a/docs/howto/configure_landingai.md b/docs/howto/configure_landingai.md
@@ -1,3 +1,8 @@
+---
+title: Configure LandingAI ADE
+description: How to set up the LandingAI Agentic Document Extraction driver, configure the API key and environment, and override parsing options per document.
+---
+
 # How to Configure LandingAI ADE
 
 This guide shows you how to configure the LandingAI ADE (Agentic Document Extraction) driver for document processing, including setting default options and overriding them on a per-document basis.

diff --git a/docs/howto/configure_llamaparse.md b/docs/howto/configure_llamaparse.md
@@ -1,3 +1,8 @@
+---
+title: Configure LlamaParse
+description: How to set up the LlamaParse driver, configure the API key and parsing mode, and override options on a per-document basis for better extraction results.
+---
+
 # How to Configure LlamaParse
 
 This guide shows you how to configure the LlamaParse driver for document processing, including setting default options and overriding them on a per-document basis.

diff --git a/docs/howto/configure_llmwhisperer.md b/docs/howto/configure_llmwhisperer.md
@@ -1,3 +1,8 @@
+---
+title: Configure LLMWhisperer
+description: How to set up the LLMWhisperer driver, configure the API key and parsing mode, and override options on a per-document basis for better extraction results.
+---
+
 # How to Configure LLMWhisperer
 
 This guide shows you how to configure the LLMWhisperer driver for document processing, including setting default options and overriding them on a per-document basis.

diff --git a/docs/howto/configure_observability.md b/docs/howto/configure_observability.md
@@ -1,3 +1,8 @@
+---
+title: Configure observability
+description: How to enable OpenTelemetry tracing and metrics in Parxy, connect to an OTLP collector, and monitor document processing operations in your observability stack.
+---
+
 # How to Configure Observability
 
 This guide shows you how to enable and configure OpenTelemetry-based observability in Parxy to monitor document processing operations.

diff --git a/docs/howto/configure_pdfact.md b/docs/howto/configure_pdfact.md
@@ -1,3 +1,8 @@
+---
+title: Configure PdfAct
+description: How to set up the PdfAct driver against a self-hosted or remote service instance, configure the base URL and API key, and run PdfAct locally with Docker.
+---
+
 # How to Configure PdfAct
 
 This guide shows you how to configure the PdfAct driver for document processing using a self-hosted or remote PdfAct service.

diff --git a/docs/howto/configure_pymupdf.md b/docs/howto/configure_pymupdf.md
@@ -1,3 +1,8 @@
+---
+title: Configure PyMuPDF
+description: How to use Parxy's default PyMuPDF driver, choose the right extraction level for your use case, and adjust the output when working with local PDF files.
+---
+
 # How to Configure PyMuPDF
 
 This guide shows you how to use the PyMuPDF driver for document processing. PyMuPDF is the default driver in Parxy and requires no external services or API keys.

diff --git a/docs/howto/configure_unstructured_local.md b/docs/howto/configure_unstructured_local.md
@@ -1,3 +1,8 @@
+---
+title: Configure Unstructured library
+description: How to install and configure the Unstructured local driver for offline PDF parsing without external APIs, including extraction levels and output options.
+---
+
 # How to Configure Unstructured Local
 
 This guide shows you how to configure the Unstructured Local driver for document processing. This driver uses the open-source `unstructured` library for local document parsing without requiring external services.

diff --git a/docs/howto/pdf_manipulation.md → docs/howto/merge_and_split_pdfs.md b/docs/howto/pdf_manipulation.md → docs/howto/merge_and_split_pdfs.md
@@ -1,3 +1,8 @@
+---
+title: Merge and split PDFs
+description: How to merge multiple PDFs and split a single PDF into pages or ranges from the command line using parxy pdf:merge and parxy pdf:split.
+---
+
 # How to Manipulate PDFs with Parxy
 
 Parxy provides powerful **PDF manipulation commands** that allow you to merge multiple PDF files into one or split a single PDF into multiple files — all from the command line.

diff --git a/docs/howto/pdf_attachments.md b/docs/howto/pdf_attachments.md
@@ -1,3 +1,8 @@
+---
+title: Work with PDF attachments
+description: How to add, list, extract, and remove file attachments embedded in a PDF using Parxy's CLI commands, with examples for common attachment workflows.
+---
+
 # How to Work with PDF Attachments
 
 Parxy provides comprehensive **PDF attachment commands** that allow you to add, list, extract, and remove file attachments in PDF documents — all from the command line.
@@ -473,6 +478,6 @@ parxy attach:remove --help
 
 ## Related Documentation
 
-- [PDF Manipulation](./pdf_manipulation.md) - Learn about merging and splitting PDFs
+- [Merge and split PDFs](./merge_and_split_pdfs.md)
 - [Getting Started Tutorial](../tutorials/getting_started.md) - General introduction to Parxy CLI
 - [Using the CLI](../tutorials/using_cli.md) - Basic CLI usage patterns
diff --git a/docs/installation_and_setup.md b/docs/installation_and_setup.md
@@ -0,0 +1,94 @@
+---
+title: Installation and setup
+description: Quick instructions to install Parxy via pip, uv, or uvx and configuration via environment variables.
+weight: 3
+---
+
+# Installation and Setup
+
+## Requirements
+
+- Python **3.12** or **3.13**
+
+## Installation
+
+Parxy can be installed via pip or uv, or run without installation using uvx.
+
+### Via pip
+
+```bash
+pip install parxy        # Basic installation (PyMuPDF and PdfAct drivers)
+pip install parxy[all]   # All drivers included
+```
+
+### Via uv
+
+```bash
+uv add parxy             # Basic installation
+uv add parxy --extra all # All drivers included
+```
+
+### Without installation (uvx)
+
+[`uvx`](https://docs.astral.sh/uv/guides/tools/) runs Parxy in an isolated environment without a permanent install:
+
+```bash
+# Basic drivers only
+uvx parxy --help
+```
+
+```bash
+# All drivers included
+uvx --from 'parxy[all]' parxy --help
+```
+
+### Installing specific drivers
+
+If you only need a particular driver, install its extra instead of `all`:
+
+```bash
+pip install parxy[llama]          # LlamaParse
+pip install parxy[llmwhisperer]   # LLMWhisperer
+pip install parxy[landingai]      # Landing AI
+pip install parxy[unstructured_local]  # Unstructured library
+```
+
+See [Supported Services](./supported_services.md) for the full list of drivers and their extras.
+
+## Environment variables and API keys
+
+Some drivers require an API key. Parxy reads these from environment variables, which can be set in a `.env` file in your project root.
+
+To generate a template `.env` file:
+
+```bash
+parxy env
+```
+
+Then fill in the keys for the services you use:
+
+```bash
+# LlamaParse
+PARXY_LLAMAPARSE_API_KEY=
+
+# Unstract LLMWhisperer
+PARXY_LLMWHISPERER_API_KEY=
+```
+
+### Core environment variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PARXY_DEFAULT_DRIVER` | `pymupdf` | Driver used when none is specified |
+| `PARXY_LOGGING_LEVEL` | `INFO` | Logging verbosity |
+| `PARXY_LOGGING_FILE` | *(none)* | Path to write log output |
+
+### Self-hosted services
+
+Some drivers (such as PdfAct) can be run locally via Docker. To generate a Docker Compose configuration:
+
+```bash
+parxy docker
+```
+
+This produces a `compose.yaml` you can start with `docker compose up`.
diff --git a/docs/introduction.md b/docs/introduction.md
@@ -0,0 +1,86 @@
+---
+title: Introduction
+description: What Parxy is, how it works, and a quick look at the CLI commands and Python library API before you dive in.
+weight: 1
+---
+
+# Introduction
+
+Parxy is a document processing gateway with a unified interface for multiple document parsing services. Via a common unified model it allows to swap providers without rewriting your application.
+
+- Single API across different providers (local libraries and remote APIs)
+- Supports PyMuPDF, Unstructured, LlamaParse, LLMWhisperer, PdfAct, and more
+- Custom drivers can be registered directly in your application code
+- Execution tracing to help debug parsing issues
+
+## Available as CLI and library
+
+Parxy works as a command line tool or as a Python library.
+
+The quickest way to try it out is via [`uvx`](https://docs.astral.sh/uv/concepts/tools/#execution-vs-installation):
+
+```bash
+uvx parxy --help
+```
+
+To include all supported drivers:
+
+```bash
+uvx --from 'parxy[all]' parxy --help
+```
+
+See [Installation and Setup](./installation_and_setup.md) for the full installation options.
+
+## CLI overview
+
+Once installed, `parxy` provides the following commands:
+
+| Command | Description |
+|---------|-------------|
+| `parxy parse` | Extract text content from documents with customizable granularity levels and output formats |
+| `parxy markdown` | Convert documents into Markdown format, with optional combining of multiple documents |
+| `parxy drivers` | List available document processing drivers |
+| `parxy env` | Create a configuration file with default settings |
+| `parxy docker` | Generate a Docker Compose configuration for self-hosted services |
+| `parxy pdf:merge` | Merge multiple PDF files into one, with support for selecting specific page ranges |
+| `parxy pdf:split` | Split a PDF file into individual pages |
+
+```bash
+# Parse a PDF to markdown
+parxy parse --mode markdown document.pdf
+
+# Launch interactive TUI for parser comparison
+parxy tui ./documents
+
+# Merge multiple PDFs with page ranges
+parxy pdf:merge cover.pdf doc1.pdf[1:10] doc2.pdf -o merged.pdf
+```
+
+Run `parxy --help` for the full list of options.
+
+## Library overview
+
+Parxy can also be used directly in Python. After installation, import the `Parxy` facade:
+
+```python
+from parxy_core.facade import Parxy
+
+# Parse a document using the default driver
+doc = Parxy.parse('path/to/document.pdf')
+
+print(f"Pages: {len(doc.pages)}")
+print(f"Title: {doc.metadata.title}")
+
+# Use a specific driver
+doc = Parxy.driver(Parxy.LLAMAPARSE).parse('path/to/document.pdf')
+```
+
+Every driver returns the same `Document` structure, so you can switch providers without changing how you process the output.
+
+For a step-by-step walkthrough, see the [Getting Started tutorial](./tutorials/getting_started.md).
+
+## Next steps
+
+- [Installation and first run](./installation_and_setup.md)
+- [Available drivers](./supported_services.md) and their installation
+- [Parse your first document](./tutorials/getting_started.md)