v0.1.0
First public release. Lexo (Local EXtraction and OCR) is a local-first desktop
document OCR tool that turns PDFs and images into clean, editable text, with
strong support for Burmese (Myanmar script) using free, high-accuracy Google
Docs OCR. Everything runs on your machine; the only network call is the optional
OCR, on your own Google account.
It is a complete, from-scratch rebuild of the old OCR Text Extractor (the legacy
Tkinter app is gone).
Highlights
- Smart OCR routing — digital PDF pages use their embedded text layer
(instant, lossless); only scanned pages are OCR'd.--force-ocroverrides. - Free Burmese-first OCR — Google Docs OCR via the Drive API, on your own
account, behind a pluggable provider port. - PDF operations — extract page ranges, split, crop, rotate, merge, and
split two-up spreads. - Desktop GUI + CLI — a PySide6 app with a visual crop/split editor and a
proofread pane, and a scriptable Typer CLI, both over the same engine. - Burmese-aware text — NFC normalization, zero-width-space-safe cleaning,
and a bundled Noto Sans Myanmar font. - Exports — plain text (default), Markdown (YAML frontmatter), and JSONL.
Install
uv tool install lexoOCR needs a one-time Google Drive API setup (bring your own OAuth client). See the
README.
Full changelog: see CHANGELOG.md.