-
-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tesseract-lang 4.0.0 (new formula) #36988
Conversation
Side note: I used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Please bump the revision number of the main formula, and add a caveats
so users are made aware that extra languages are available under another formula.
Finally (and I believe it will be ready to merge): please squash everything into two commits, one per formula ( |
Introduces a formula that contains all the language data files for tesseract. The tesseract formula is patched to include only English and needed packages.
adc70b8
to
f006a07
Compare
Done. Anything else that should be done? |
Many thanks @albertosottile for taking care of this! |
For reference: the compressed |
Closes Homebrew#36988. Signed-off-by: FX Coudert <fxcoudert@gmail.com>
brew install --build-from-source <formula>
, where<formula>
is the name of the formula you're submitting?brew test <formula>
, where<formula>
is the name of the formula you're submitting?brew audit --strict <formula>
(after doingbrew install <formula>
)?Following the indications in PR #36786, this PR includes a new
tesseract-lang
formula that depends ontesseract
and installs all the language data files (besides English and osd, needed to run the software) in itsshare
folder.The original
tesseract
formula is also updated to embed only the essential data files, and amended to search for language data files in#{HOMEBREW_PREFIX}/share
. Symlinks do the rest of the trick.In this way, the installed size of
tesseract
drops to 30 MB, while the size oftesseract-lang
is about 650 MB.The
test
block of both formulas has also been updated to perform an actual OCR task on an example taken from the upstream test repository. The test is made using only English intesseract
and using also German intesseract-lang
.