<div align="center">
<p align="center" style="width: 100%;">
    <img src="https://raw.githubusercontent.com/vlm-run/.github/refs/heads/main/profile/assets/vlm-black.svg" alt="VLM Run Logo" width="80" style="margin-bottom: -5px; color: #2e3138; vertical-align: middle; padding-right: 5px;"><br>
</p>
<p align="center">
  <a href="https://docs.vlm.run"><b>Website</b></a> | 
  <a href="https://docs.vlm.run/"><b>API Docs</b></a> | 
  <a href="https://docs.vlm.run/blog"><b>Blog</b></a> | 
  <a href="https://discord.gg/AMApC2UzVY"><b>Discord</b></a> | 
  <a href="https://chat.vlm.run"><b>Chat</b></a>
</p>
</div>

# VLM Run Orion - Book Proofreading from PDF


This comprehensive cookbook demonstrates [VLM Run Orion's](https://vlm.run/orion) capabilities to perform intelligent book proofreading directly from a PDF manuscript. Leveraging multimodal understanding and structured output, Orion can detect grammatical errors, stylistic inconsistencies, formatting issues, and even factual discrepancies‚Äîall with precise page references.

For this notebook, we'll cover how to use the **VLM Run Agent Chat Completions API**‚Äîan OpenAI-compatible interface that supports image and data inputs alongside text‚Äîto generate plots directly from structured data.

We'll cover the following topics:
1. **Document Ingestion** ‚Äì Upload a book manuscript (PDF) via URL
2. **Error Detection & Categorization** ‚Äì Identify grammar, style, and formatting issues
3. **Structured Feedback** ‚Äì Get machine-readable results with page numbers and context
4. **Multi-Pass Review** ‚Äì Chain operations (e.g., ‚Äúfirst check grammar, then verify character names‚Äù)

## Prerequisites

- Python 3.10+
- VLM Run API key (get one at [app.vlm.run](https://app.vlm.run))
- VLM Run Python Client with OpenAI extra: `pip install "vlmrun[openai]"`

## Setup

First, install the required packages and configure the environment.

In [1]:
# Install required packages
!pip install vlmrun[openai] --upgrade --quiet
!pip install pillow requests numpy --quiet


[notice] A new release of pip is available: 25.0.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 25.0.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import os
import getpass
import json
from typing import List, Any
from functools import cached_property

import numpy as np
from PIL import Image
from pydantic import BaseModel, Field

VLMRUN_API_KEY = os.getenv("VLMRUN_API_KEY", None)
if VLMRUN_API_KEY is None:
    VLMRUN_API_KEY = getpass.getpass("Enter your VLM Run API key: ")

## Initialize the VLM Run Client

We use the OpenAI-compatible chat completions interface through the VLM Run SDK.

In [3]:
from vlmrun.client import VLMRun

client = VLMRun(
    api_key=VLMRUN_API_KEY, base_url="https://agent.vlm.run/v1", timeout=1000
)
print("VLM Run client initialized successfully!")
print(f"Base URL: https://agent.vlm.run/v1")
print(f"Model: vlmrun-orion-1")

VLM Run client initialized successfully!
Base URL: https://agent.vlm.run/v1
Model: vlmrun-orion-1


## Response Models (dtypes)

We define structured Pydantic models to capture proofreading feedback with full traceability.



In [4]:
class ProofreadingIssue(BaseModel):
    issue_type: str = Field(..., description="Category: 'grammar', 'spelling', 'style', 'consistency', 'formatting', 'factual'")
    page_number: int = Field(..., description="Page where the issue occurs")
    excerpt: str = Field(..., description="Short quoted text containing the issue")
    context: str = Field(..., description="Surrounding paragraph for clarity")
    suggestion: str = Field(..., description="Recommended correction or improvement")
    severity: str = Field(..., description="One of: 'low', 'medium', 'high'")

class ProofreadingReport(BaseModel):
    total_issues: int = Field(..., description="Total number of detected issues")
    issues_by_page: dict[int, int] = Field(..., description="Mapping of page ‚Üí issue count")
    issues: List[ProofreadingIssue] = Field(..., description="Full list of annotated issues")
    summary: str = Field(..., description="High-level overview of manuscript quality")

    def __repr__(self):
        return f"ProofreadingReport(pages={len(self.issues_by_page)}, issues={self.total_issues}, severity_distribution={self._severity_dist()})"

    def _severity_dist(self):
        from collections import Counter
        counts = Counter(issue.severity for issue in self.issues)
        return dict(counts)


print("‚úÖ Proofreading response models defined!")

‚úÖ Proofreading response models defined!


## Helper Functions

We create helper functions to simplify making chat completion requests with structured outputs.

In [5]:
import hashlib
import cachetools
from typing import Type, TypeVar
from IPython.display import HTML
from vlmrun.common.image import encode_image


T = TypeVar('T', bound=BaseModel)


def display(images: Image.Image | list[Image.Image], texts: list[str] | None = None, width: int = 300):
    if isinstance(images, Image.Image):
        images = [images]
    if texts == None:
        texts = [None] * len(images)
    elif isinstance(texts, str):
        texts = [texts]
    elif len(texts) != len(images):
        raise ValueError("`texts` must be a list of the same length as `images`")

    imgs_html = ""
    for image, text in zip(images, texts):
        W, H = image.size
        if W > width:
            H = int(H * width / W)
            W = width
            image = image.resize((W, H))
        im_bytes = encode_image(image, format="JPEG")
        imgs_html += f"<div style='display:inline-block; margin:5px; text-align:center'>"
        imgs_html += f"<img src='{im_bytes}' style='width:{width}px; border-radius:6px'>"
        if text:
            imgs_html += f"<div style='font-size:12px; color:#666; margin-top:5px'>{text}</div>"
        imgs_html += f"</div>"
    return HTML(f"<div style='display:flex; flex-wrap:wrap'>{imgs_html}</div>")


def custom_key(prompt: str, images: list[Image.Image] | list[str] | None = None, doc: list[str] | None = None, response_model: Type[T] | None = None, model: str = "vlmrun-orion-1:auto"):
    """Custom key for caching chat_completion."""
    image_keys = []
    if images:
        for image in images:
            if isinstance(image, Image.Image):
                thumb = image.copy()
                thumb.thumbnail((128, 128))
                encoded = encode_image(thumb, format="JPEG")
                image_keys.append(encoded)
            elif isinstance(image, str):
                image_keys.append(image)

    doc_keys = []
    if doc:
        if isinstance(doc, str):
            doc_keys.append(doc)
        elif isinstance(doc, list):
            for d_url in doc:
                doc_keys.append(d_url)

    response_key = hashlib.sha256(json.dumps(response_model.model_json_schema(), sort_keys=True).encode()).hexdigest() if response_model else ""
    return (prompt, tuple(image_keys), tuple(doc_keys), response_key, model)


@cachetools.cached(cache=cachetools.TTLCache(maxsize=1000, ttl=3600), key=custom_key)
def chat_completion(
    prompt: str,
    images: list[Image.Image] | list[str] | None = None,
    doc: list[str] | None = None,
    response_model: Type[T] | None = None,
    model: str = "vlmrun-orion-1:auto"
) -> Any:
    """
    Make a chat completion request with optional images and structured output.

    Args:
        prompt: The text prompt/instruction
        images: Optional list of images to process (either PIL Images or URLs)
        response_model: Optional Pydantic model for structured output
        model: Model to use (default: vlmrun-orion-1:auto)

    Returns:
        Parsed response model if response_model provided, else raw response text
    """
    content = []
    content.append({"type": "text", "text": prompt})
    if doc:
        if isinstance(doc, str):
            content.append({
                    "type": "file_url",
                    "file_url": {"url": doc, "detail": "auto"}
                })
        elif isinstance(doc, list):
            for d_url in doc:
                assert isinstance(d_url, str) and d_url.startswith("http"), "Document URLs must be strings starting with http or https"
                content.append({
                    "type": "file_url",
                    "file_url": {"url": d_url, "detail": "auto"}
                })


    if images:
        for image in images:
            if isinstance(image, str):
                assert image.startswith("http"), "Image URLs must start with http or https"
                content.append({
                    "type": "image_url",
                    "image_url": {"url": image, "detail": "auto"}
                })
            elif isinstance(image, Image.Image):
                content.append({
                    "type": "image_url",
                    "image_url": {"url": encode_image(image, format="JPEG"), "detail": "auto"}
                })
            else:
                raise ValueError("Images must be either PIL Images or URLs")

    kwargs = {
        "model": model,
        "messages": [{"role": "user", "content": content}]
    }

    if response_model:
        kwargs["response_format"] = {
            "type": "json_schema",
            "schema": response_model.model_json_schema()
        }

    response = client.agent.completions.create(**kwargs)
    response_text = response.choices[0].message.content

    if response_model:
        return response_model.model_validate_json(response_text), response.session_id

    return response_text, response.session_id

print("Helper functions defined!")

Helper functions defined!


### 1. Basic Proofreading Pass


Upload a book manuscript and request a full proofreading review.

In [6]:
# BOOK_URL = "https://web.stanford.edu/~zwicky/mistakes.pdf" 
BOOK_URL= "https://www.oasisacademywoodview.org/uploaded/Woodview/home_learning/year4/week4/Tuesday_English.pdf"
report, session_id = chat_completion(
    prompt=(
        "Perform a professional proofreading pass on this book manuscript. "
        "Identify grammar, spelling, punctuation, and basic style issues. "
        "Return every issue with its page number, excerpt, context, and a suggested fix. "
        "Prioritize clarity and readability."
    ),
    doc=BOOK_URL,
    response_model=ProofreadingReport
)

print(f"üìÑ Manuscript analyzed across {len(report.issues_by_page)} pages")
print(f"‚ùó Total issues found: {report.total_issues}")
print(f"üìä Severity: {report._severity_dist()}")
print(f"\nüìù Summary:\n{report.summary}")

üìÑ Manuscript analyzed across 0 pages
‚ùó Total issues found: 6
üìä Severity: {'high': 2, 'medium': 2, 'low': 2}

üìù Summary:
The manuscript proofreading pass revealed critical consistency issues, primarily a mismatch between the worksheet content (Worksheet 6) and the provided answer key (Worksheet 1). There is also a factual error in the instructions, which refer to 'circled' words that are not actually circled. Minor grammar and formatting adjustments are recommended to improve pedagogical clarity and professional presentation.


### 2. Style & Consistency Check



Now focus on authorial voice and narrative consistency.

In [7]:
consistency_report, _ = chat_completion(
    prompt=(
        "Review the manuscript for stylistic consistency: "
        "- Character name spellings (e.g., 'Jon' vs 'John') "
        "- Timeline contradictions "
        "- POV shifts or tense inconsistencies "
        "- Repetitive phrasing or overused words "
        "Only report high- or medium-severity issues with page numbers."
    ),
    doc=BOOK_URL,
    response_model=ProofreadingReport
)

print(f"üîç Consistency issues: {consistency_report.total_issues}")
for issue in consistency_report.issues[:3]:  # Show top 3
    print(f"\nPage {issue.page_number} | {issue.issue_type.title()} ({issue.severity})")
    print(f"Excerpt: ‚Äú{issue.excerpt}‚Äù")
    print(f"Suggestion: {issue.suggestion}")

üîç Consistency issues: 4

Page 6 | Timeline (high)
Excerpt: ‚Äú"Mistake (6)" (Page 6) vs. "Mistake (1) Answers" (Page 7)‚Äù
Suggestion: Reorder the manuscript so that the answer key immediately following "Mistake (6)" corresponds to the questions asked on that page. Ensure the numbering of the sets follows a logical numerical order.

Page 6 | Spelling (medium)
Excerpt: ‚Äú"Mickey was poppuler at school." vs. "What hite is Dad compared to Mike?"‚Äù
Suggestion: Determine if "Mickey" and "Mike" are intended to be the same character; if so, standardize the spelling to one version throughout the manuscript.

Page 6 | Style (medium)
Excerpt: ‚Äú"The spelling mistakes in these sentences have been circled..." / "Each sentence below has one word that is incorrect..."‚Äù
Suggestion: Use a single set of instructions at the top of the page or simplify subsequent instructions to reduce clutter.


### 3. Formatting & Publishing Readiness

Check for typesetting and publishing standards.

In [8]:
format_report, _ = chat_completion(
    prompt=(
        "Evaluate the manuscript for publishing-ready formatting: "
        "- Missing chapter headings "
        "- Improper dialogue punctuation (e.g., missing em-dashes or quotes) "
        "- Inconsistent heading levels or spacing "
        "- Page breaks in awkward places (e.g., mid-paragraph) "
        "Assume this is for print-on-demand paperback."
    ),
    doc=BOOK_URL,
    response_model=ProofreadingReport
)

print(f"üìê Formatting issues: {format_report.total_issues}")

üìê Formatting issues: 4


### 4. Generate an Editor‚Äôs Summary Dashboard

Combine all passes into a single editorial dashboard.

In [9]:
class EditorialDashboard(BaseModel):
    manuscript_url: str
    total_pages: int
    grammar_issues: int
    consistency_issues: int
    formatting_issues: int
    critical_pages: List[int] = Field(..., description="Pages with ‚â•3 high-sev issues")
    executive_summary: str

dashboard, _ = chat_completion(
    prompt=(
        "Create an editor‚Äôs dashboard summarizing the manuscript‚Äôs proofreading status. "
        "Include total pages, issue counts by category, and list pages needing urgent attention. "
        "Write a 3-sentence executive summary for the author."
    ),
    doc=BOOK_URL,
    response_model=EditorialDashboard
)

print("üìã EDITORIAL DASHBOARD")
print(f"Pages: {dashboard.total_pages}")
print(f"Grammar: {dashboard.grammar_issues} | Consistency: {dashboard.consistency_issues} | Formatting: {dashboard.formatting_issues}")
print(f"üö® Critical pages: {dashboard.critical_pages}")
print(f"\n{dashboard.executive_summary}")

üìã EDITORIAL DASHBOARD
Pages: 2
Grammar: 0 | Consistency: 3 | Formatting: 0
üö® Critical pages: []

The manuscript displays a critical mismatch where Worksheet (6) is followed by an Answer Key for Worksheet (1). This error is compounded by a discrepancy in item counts between the exercise and the key, rendering the evaluation tool ineffective. While formatting and individual page structures are sound, the document requires immediate synchronization of content and keys to be fit for publication.


## Conclusion

This cookbook shows how VLM Run Orion transforms raw PDF manuscripts into actionable editorial insights‚Äîno manual page-flipping required.


### Key Takeaways

1. Page-accurate feedback with contextual excerpts.

2. Categorized issues (grammar, style, formatting, etc.).

3. Chainable workflows run multiple specialized passes.

4. Structured JSON output for integration into editing tools or CMS.

5. Zero setup, no local OCR, NLP pipelines, or rule engines.


### Use Cases

- Self-publishing authors preparing for print
- Publishing houses automating first-pass edits
- Localization teams checking translated manuscripts
- Accessibility auditors ensuring readable structure

### Next Steps

- Explore the [VLM Run Documentation](https://docs.vlm.run) for more details
- Join our [Discord community](https://discord.gg/AMApC2UzVY) for support
- Check out more examples in the [VLM Run Cookbook](https://github.com/vlm-run/vlmrun-cookbook)
- Review domain-specific redaction agents for financial, healthcare, legal, and other industries

Happy building!