<div align="center">
<p align="center" style="width: 100%;">
    <img src="https://raw.githubusercontent.com/vlm-run/.github/refs/heads/main/profile/assets/vlm-black.svg" alt="VLM Run Logo" width="80" style="margin-bottom: -5px; color: #2e3138; vertical-align: middle; padding-right: 5px;"><br>
</p>
<p align="center"><a href="https://docs.vlm.run"><b>Website</b></a> | <a href="https://docs.vlm.run/"><b>API Docs</b></a> | <a href="https://docs.vlm.run/blog"><b>Blog</b></a> | <a href="https://discord.gg/AMApC2UzVY"><b>Discord</b></a> | <a href="https://chat.vlm.run"><b>Chat</b></a>
</p>
</div>

# VLM Run Orion - Handwritten Equation to LaTeX Conversion

This notebook demonstrates how to use [VLM Run Orion's](https://vlm.run/orion) vision capabilities to convert handwritten mathematical equations and derivations into perfectly formatted LaTeX code. This is particularly useful for transcribing complex multi-line mathematical derivations, such as the Navier-Stokes equations or custom loss functions.

For more details on the API, see the [Agent API docs](https://docs.vlm.run/agents/introduction).

## Prerequisites

- Python 3.10+
- VLM Run API key (get one at [app.vlm.run](https://app.vlm.run))
- VLM Run Python Client with OpenAI extra `vlmrun[openai]`


## Setup

First, install the required packages and configure the environment.


In [2]:
# Install required packages
%pip install vlmrun[openai] --upgrade --quiet
%pip install cachetools pillow requests numpy --quiet


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
import os
import getpass

VLMRUN_API_KEY = os.getenv("VLMRUN_API_KEY", None)
if VLMRUN_API_KEY is None:
    VLMRUN_API_KEY = getpass.getpass("Enter your VLM Run API key: ")


## Initialize the VLM Run Client

We use the OpenAI-compatible chat completions interface through the VLM Run SDK.


In [4]:
from vlmrun.client import VLMRun

BASE_URL = os.getenv("VLMRUN_BASE_URL", "https://agent.vlm.run/v1")
client = VLMRun(api_key=VLMRUN_API_KEY, base_url=BASE_URL)
print("VLM Run client initialized successfully!")
print(f"Base URL: {BASE_URL}")


VLM Run client initialized successfully!
Base URL: https://agent.vlm.run/v1


## Response Models

We define Pydantic models for structured outputs. The response will include the LaTeX code for the transcribed equation.


In [5]:
from pydantic import BaseModel, Field


class LaTeXResponse(BaseModel):
    """Response containing the LaTeX code for the transcribed equation."""
    latex_code: str = Field(..., description="The complete, compilable LaTeX code for the transcribed mathematical equation")

print("Response models defined successfully!")


Response models defined successfully!


## Helper Functions

We create helper functions to simplify making chat completion requests with structured outputs.


In [6]:
import hashlib
import json
from typing import Any, Type, TypeVar

import cachetools
from vlmrun.common.image import encode_image
from PIL import Image


T = TypeVar('T', bound=BaseModel)


def custom_key(prompt: str, image_path: str | None = None, response_model: Type[T] | None = None, model: str = "vlmrun-orion-1:auto"):
    """Custom key for caching chat_completion."""
    response_key = hashlib.sha256(json.dumps(response_model.model_json_schema(), sort_keys=True).encode()).hexdigest() if response_model else ""
    image_key = hashlib.sha256(image_path.encode()).hexdigest() if image_path else ""
    return (prompt, image_key, response_key, model)


@cachetools.cached(cache=cachetools.TTLCache(maxsize=100, ttl=3600), key=custom_key)
def chat_completion(
    prompt: str,
    image_path: str | None = None,
    response_model: Type[T] | None = None,
    model: str = "vlmrun-orion-1:auto"
) -> tuple[BaseModel | str, str]:
    """
    Make a chat completion request with structured output for LaTeX conversion.

    Args:
        prompt: The prompt describing the LaTeX conversion task
        image_path: Path to the image file containing the handwritten equation
        response_model: Pydantic model for structured output
        model: Model to use (default: vlmrun-orion-1:auto)

    Returns:
        Tuple of (parsed response model or text, session_id)
    """
    content = [{"type": "text", "text": prompt}]
    
    # Add image if provided
    if image_path:
        image = Image.open(image_path)
        image_data = encode_image(image, format="JPEG")
        content.append({"type": "image_url", "image_url": {"url": image_data}})

    kwargs = {
        "model": model,
        "messages": [{"role": "user", "content": content}]
    }

    if response_model:
        kwargs["response_format"] = {
            "type": "json_schema",
            "schema": response_model.model_json_schema()
        }

    response = client.agent.completions.create(**kwargs)
    response_text = response.choices[0].message.content
    session_id = response.session_id

    if response_model:
        result = response_model.model_validate_json(response_text)
        return result, session_id

    return response_text, session_id

print("Helper functions defined!")


Helper functions defined!


## Handwritten Equation to LaTeX Conversion

Convert a handwritten mathematical derivation into high-quality LaTeX code. Provide the path to an image file containing the handwritten equation.


In [7]:
# Combined prompt for LaTeX conversion
LATEX_CONVERSION_PROMPT = """
Task: Transcribe the handwritten mathematical derivation of the Navier-Stokes equations into high-quality LaTeX code.

Instructions:

Logical Flow: Maintain the exact step-by-step structure of the derivation. Use the amsmath environments such as \\begin{aligned} ... \\end{aligned} or \\begin{gather} ... \\end{gather} to ensure proper alignment of equality signs and multi-line expressions.

Symbolic Precision: Pay close attention to fluid dynamics notation. Specifically:
- Distinguish between partial derivatives ($\\partial$), material derivatives ($D/Dt$), and gradients ($\\nabla$).
- Correctly render Greek letters commonly used in this context, such as density ($\\rho$), dynamic viscosity ($\\mu$), and kinematic viscosity ($\\nu$).
- Ensure vector quantities are appropriately formatted (e.g., \\mathbf{u} or \\vec{u}).

Formatting: Use \\left( and \\right) for scaling delimiters around large fractions or grouped terms to ensure professional visual balance.

Output: Return the complete, compilable LaTeX code block.
"""

print("LaTeX conversion prompt prepared!")
print(f"\nPrompt length: {len(LATEX_CONVERSION_PROMPT)} characters")


LaTeX conversion prompt prepared!

Prompt length: 1018 characters


In [None]:
image_path = "navier.jpg"


result, session_id = chat_completion(
    prompt=LATEX_CONVERSION_PROMPT,
    image_path=image_path,
    response_model=LaTeXResponse,
    model="vlmrun-orion-1:auto"
)

print(">> RESPONSE")
print(result)
print(f"\n>> SESSION ID: {session_id}")
print("\n>> GENERATED LATEX CODE")
print("=" * 80)
print(result.latex_code)
print("=" * 80)


>> RESPONSE
latex_code='\\documentclass{article}\n\\usepackage{amsmath}\n\n\\begin{document}\n\nIn Cartesian coordinates, the equations are:\n\\begin{gather*}\n\\text{Coordinates: } x_1 = x, \\quad x_2 = y, \\quad x_3 = z \\\\\n\\text{Velocities: } u_1 = U, \\quad u_2 = V, \\quad u_3 = W\n\\end{gather*}\n\nFor $i=1, j=1,2,3$ (with summation in $j$):\n\\begin{equation*}\n\\rho \\left[ \\frac{\\partial U}{\\partial t} + U \\frac{\\partial U}{\\partial x} + V \\frac{\\partial U}{\\partial y} + W \\frac{\\partial U}{\\partial z} \\right] = -\\frac{\\partial P}{\\partial x} + \\mu \\left[ \\frac{\\partial^2 U}{\\partial x^2} + \\frac{\\partial^2 U}{\\partial y^2} + \\frac{\\partial^2 U}{\\partial z^2} \\right] + \\rho F_{Bx}\n\\end{equation*}\n\nFor $i=2, j=1,2,3$ (with summation in $j$):\n\\begin{equation*}\n\\rho \\left[ \\frac{\\partial V}{\\partial t} + U \\frac{\\partial V}{\\partial x} + V \\frac{\\partial V}{\\partial y} + W \\frac{\\partial V}{\\partial z} \\right] = -\\frac{\\parti

---

## Conclusion

This notebook demonstrated how to use **VLM Run Orion** to convert handwritten mathematical equations into perfectly formatted LaTeX code.

### Key Takeaways

1. **Structured Prompts**: Well-structured prompts with clear instructions for logical flow, symbolic precision, and formatting produce better results.
2. **Session Management**: Use `session_id` to track your conversion requests.
3. **Image Input**: Provide clear, high-quality images of handwritten equations for best results.
4. **LaTeX Output**: The model returns compilable LaTeX code that can be directly used in documents.

### Next Steps

- Experiment with different types of mathematical equations (differential equations, integrals, matrices, etc.)
- Try different formatting styles and LaTeX environments
- Explore the [VLM Run Documentation](https://docs.vlm.run) for more capabilities
- Join our [Discord community](https://discord.gg/AMApC2UzVY) for support

Happy transcribing! üìê
