Model assets for MathCraft OCR, the ONNX-only OCR runtime used by LaTeXSnipper.
MathCraft OCR recognizes formulae, text, and mixed mathematical documents with a compact ONNX model set. This repository provides the model release assets and the source package for the PyPI package mathcraft-ocr used by LaTeXSnipper.
Current PyPI release: mathcraft-ocr 0.1.8.
Install the library and CLI without choosing an ONNX Runtime backend:
pip install mathcraft-ocr
mathcraft --helpInstall exactly one ONNX Runtime backend before running OCR inference.
CPU:
pip install "mathcraft-ocr[cpu]"GPU:
pip install "mathcraft-ocr[gpu]"Use only one ONNX Runtime backend in the same environment. Do not install onnxruntime and onnxruntime-gpu together.
LaTeXSnipper's dependency wizard selects the ONNX Runtime GPU wheel line from the detected CUDA toolkit. CUDA 11.x uses the ONNX Runtime CUDA 11 package feed, CUDA 12.x uses the stable PyPI GPU wheels, and CUDA 13.x uses the ONNX Runtime CUDA 13 nightly feed. Static mathcraft-ocr[gpu] package metadata cannot inspect the local CUDA toolkit, so CUDA 11.x users installing manually should use the CUDA 11 feed shown by the wizard.
Upgrade the current release with a chosen backend:
pip install -U "mathcraft-ocr[gpu]"
mathcraft --helpCheck the runtime:
mathcraft doctor --provider auto
mathcraft models check
mathcraft warmup --profile mixed --provider autoRecognize an image:
mathcraft ocr "C:\path\to\formula.png" --profile formula --provider auto --jsonMixed OCR to Markdown:
mathcraft ocr "C:\path\to\page.png" --profile mixed --provider auto --output result.md
mathcraft ocr "C:\path\to\page.png" --profile mixed --provider auto --output-dir "D:\MathCraft\outputs"When a file is written, the CLI prints the resolved output path:
[MATHCRAFT_OUTPUT] written to D:\MathCraft\outputs\page.md
PowerShell custom model cache:
$env:MATHCRAFT_HOME="D:\MathCraft\models"
mathcraft doctor --provider autoPersistent user-level cache path:
setx MATHCRAFT_HOME "D:\MathCraft\models"Open a new terminal after setx.
Restore the default cache path:
[Environment]::SetEnvironmentVariable("MATHCRAFT_HOME", $null, "User")
Remove-Item Env:\MATHCRAFT_HOME -ErrorAction SilentlyContinue
mathcraft doctor --provider autoOpen a new terminal after removing the persistent variable. The default root is:
%APPDATA%\MathCraft\models
from mathcraft_ocr import MathCraftRuntime
runtime = MathCraftRuntime(provider_preference="auto")
result = runtime.recognize_mixed(r"C:\path\to\page.png")
print(result.text)
for block in result.blocks:
print(block.role, block.kind, block.text[:80])| Profile | Use Case | Output |
|---|---|---|
formula |
Formula screenshots | LaTeX formula text |
text |
Plain text OCR | Text |
mixed |
Text + formula documents | Markdown-ready structured text |
Active release: v1.0.0
| Model ID | Runtime | Purpose |
|---|---|---|
mathcraft-formula-det |
ONNX | Mathematical formula region detection |
mathcraft-formula-rec |
ONNX | Formula-to-LaTeX recognition |
mathcraft-text-det |
ONNX | Fast multilingual text detection |
mathcraft-text-rec |
ONNX | Fast multilingual text recognition |
Release assets:
mathcraft-formula-det.zip
mathcraft-formula-rec.zip
mathcraft-text-det.zip
mathcraft-text-rec.zip
SHA256SUMS.txt
Default writable model root:
%APPDATA%\MathCraft\models
The runtime checks the manifest before initialization. Missing or incomplete model folders are repaired automatically by downloading only the affected model asset.
Interrupted downloads are resumable. Partial archives are stored under the active writable model root:
<MATHCRAFT_HOME>\.downloads\<model_id>.zip.part
After a model archive is fully downloaded, verified, and extracted, the .part file is removed automatically.
The examples below are generated from MathCraft's structured block output. Boxes show detected roles, order, column metadata, score, and layout flags.
Formula-heavy English mathematical prose with dense inline and display formulae.
Formula-dominant journal page with display equations, anchors, labels, headers, and page numbers.
Chinese mathematical document page with mixed text and formula blocks.
Sparse title/cover-style page used to check layout stability.
Local block_layout_regression_v4 telemetry:
| Metric | Value |
|---|---|
| Pages | 10 |
| Total blocks | 495 |
| Text characters | 21,417 |
| Markdown lines | 304 |
| Mean page time | 8.34 s |
| Fastest page | 1.33 s |
| Slowest page | 18.53 s |
Environment:
Provider: CUDAExecutionProvider
Runtime: MathCraft OCR v1
Backend: ONNX Runtime
- ONNX Runtime only, no active PyTorch inference dependency.
- Stable MathCraft-owned model IDs and folders.
- Manifest-based file checks and cache repair.
- Resumable model downloads for slow or interrupted networks.
- Formula detection before text OCR.
- Structured blocks for headings, paragraphs, display formulae, headers, page numbers, and columns.
LaTeXSnipper already integrates MathCraft OCR. Normal users do not need to install this package manually. Use this repository when you need standalone OCR, mirrored model assets, or an offline package.
Bundled offline model root:
<LaTeXSnipper>\_internal\MathCraft\models
Missing or repaired files are written to the user model root, not into the bundled read-only directory.