Skip to content

TeamMilestone/mac_ocr

Repository files navigation

mac_ocr

Ruby OCR on macOS via the Apple Vision framework.

mac_ocr is a small Ruby gem that wraps a bundled Swift helper which calls VNRecognizeTextRequest. It returns recognized text, per-line confidence, and normalized bounding boxes — the same shape exposed by the Python package ocrmac, so code written against ocrmac ports over with minimal changes.

  • macOS only (the gem targets universal-darwin, macOS 11+)
  • Ruby 3.0+
  • No runtime dependencies beyond the bundled helper

Installation

From rubygems (once published):

# Gemfile
gem "mac_ocr"

Directly from GitHub:

# Gemfile
gem "mac_ocr", git: "https://github.com/TeamMilestone/mac_ocr"

The gem ships with a pre-built universal (arm64 + x86_64) helper binary at bin/mac_ocr_helper, so no compilation is required at install time. If you clone the repo to develop, build it once with rake helper:build (needs Xcode Command Line Tools for swiftc and lipo).

Usage

require "mac_ocr"

ocr = MacOcr::OCR.new(
  "image.png",
  recognition_level: :accurate,            # :accurate (default) or :fast
  language_preference: ["ko-KR", "en-US"]  # optional, nil for default
)

ocr.recognize
# => [
#      ["Hello, world!", 1.0, [0.041, 0.575, 0.400, 0.237]],
#      ["안녕, 세상!",   1.0, [0.041, 0.171, 0.321, 0.282]]
#    ]

Each row is [text, confidence, [x, y, width, height]]. Bounding box values are in Vision's normalized coordinate space: floats in [0, 1], origin at the bottom-left of the image. No coordinate conversion is performed by the Ruby layer — this matches ocrmac's contract.

Porting from ocrmac

The captcha-style helper from Python ports almost verbatim:

def run_ocr(image_path)
  items = MacOcr::OCR.new(
    image_path,
    recognition_level: :accurate,
    language_preference: ["ko-KR", "en-US"]
  ).recognize

  # Sort top-to-bottom: larger Vision y is higher on the page.
  items.sort_by { |_text, _conf, bbox| -bbox[1] }.map { |t, _c, _b| t }
end

Errors

Exception Raised when
MacOcr::InvalidArgumentError Bad constructor args, missing image file, unsupported language code
MacOcr::HelperExecutionError Vision request failed, helper produced unexpected output (carries code, exit_status, stderr)
MacOcr::HelperNotFoundError Bundled binary missing or not executable (e.g. forgot rake helper:build in a checkout)

All inherit from MacOcr::Error, so rescue MacOcr::Error catches them all.

Architecture

MacOcr::OCR shells out to bin/mac_ocr_helper, a Swift CLI that takes an image path plus flags and emits a JSON document:

mac_ocr_helper <image_path> [--level accurate|fast]
                            [--languages ko-KR,en-US]
                            [--confidence 0.0]
{ "results": [ { "text": "...", "confidence": 0.5, "bbox": [x, y, w, h] } ] }

Process spawn overhead is a few milliseconds; Vision itself takes 50–200 ms per image, so the shell-out cost is negligible for typical workloads.

See docs/HANDOFF.md for the design rationale (including why we chose a Swift CLI over a Ruby/FFI binding or an Objective-C bridge gem).

Development

bundle install
rake helper:build   # rebuild bin/mac_ocr_helper
rake test           # run minitest suite
rake release:prepare  # clean+build+test+gem build

Credits

This gem is a Ruby port of the Vision-framework surface of the Python package ocrmac by Maximilian Strauss. The public API and coordinate conventions are deliberately compatible. Many thanks to the ocrmac maintainers.

License

MIT — see LICENSE.

About

Ruby OCR on macOS via the Apple Vision framework. A Ruby port of the ocrmac (Python) public surface.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors