Ruby OCR on macOS via the Apple Vision framework.
mac_ocr is a small Ruby gem that wraps a bundled Swift helper which calls
VNRecognizeTextRequest.
It returns recognized text, per-line confidence, and normalized bounding
boxes — the same shape exposed by the Python package
ocrmac, so code written
against ocrmac ports over with minimal changes.
- macOS only (the gem targets
universal-darwin, macOS 11+) - Ruby 3.0+
- No runtime dependencies beyond the bundled helper
From rubygems (once published):
# Gemfile
gem "mac_ocr"Directly from GitHub:
# Gemfile
gem "mac_ocr", git: "https://github.com/TeamMilestone/mac_ocr"The gem ships with a pre-built universal (arm64 + x86_64) helper binary at
bin/mac_ocr_helper, so no compilation is required at install time. If you
clone the repo to develop, build it once with rake helper:build (needs
Xcode Command Line Tools for swiftc and lipo).
require "mac_ocr"
ocr = MacOcr::OCR.new(
"image.png",
recognition_level: :accurate, # :accurate (default) or :fast
language_preference: ["ko-KR", "en-US"] # optional, nil for default
)
ocr.recognize
# => [
# ["Hello, world!", 1.0, [0.041, 0.575, 0.400, 0.237]],
# ["안녕, 세상!", 1.0, [0.041, 0.171, 0.321, 0.282]]
# ]Each row is [text, confidence, [x, y, width, height]]. Bounding box
values are in Vision's normalized coordinate space: floats in [0, 1],
origin at the bottom-left of the image. No coordinate conversion is
performed by the Ruby layer — this matches ocrmac's contract.
The captcha-style helper from Python ports almost verbatim:
def run_ocr(image_path)
items = MacOcr::OCR.new(
image_path,
recognition_level: :accurate,
language_preference: ["ko-KR", "en-US"]
).recognize
# Sort top-to-bottom: larger Vision y is higher on the page.
items.sort_by { |_text, _conf, bbox| -bbox[1] }.map { |t, _c, _b| t }
end| Exception | Raised when |
|---|---|
MacOcr::InvalidArgumentError |
Bad constructor args, missing image file, unsupported language code |
MacOcr::HelperExecutionError |
Vision request failed, helper produced unexpected output (carries code, exit_status, stderr) |
MacOcr::HelperNotFoundError |
Bundled binary missing or not executable (e.g. forgot rake helper:build in a checkout) |
All inherit from MacOcr::Error, so rescue MacOcr::Error catches them all.
MacOcr::OCR shells out to bin/mac_ocr_helper, a Swift CLI that takes
an image path plus flags and emits a JSON document:
mac_ocr_helper <image_path> [--level accurate|fast]
[--languages ko-KR,en-US]
[--confidence 0.0]
{ "results": [ { "text": "...", "confidence": 0.5, "bbox": [x, y, w, h] } ] }Process spawn overhead is a few milliseconds; Vision itself takes 50–200 ms per image, so the shell-out cost is negligible for typical workloads.
See docs/HANDOFF.md for the design rationale (including why we chose a
Swift CLI over a Ruby/FFI binding or an Objective-C bridge gem).
bundle install
rake helper:build # rebuild bin/mac_ocr_helper
rake test # run minitest suite
rake release:prepare # clean+build+test+gem buildThis gem is a Ruby port of the Vision-framework surface of the Python
package ocrmac by
Maximilian Strauss. The public API and coordinate conventions are
deliberately compatible. Many thanks to the ocrmac maintainers.
MIT — see LICENSE.