Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesseract-lang 4.0.0 (new formula) #36988

Closed
wants to merge 2 commits into from

Conversation

albertosottile
Copy link
Contributor

  • Have you followed the guidelines for contributing?
  • Have you checked that there aren't other open pull requests for the same formula update/change?
  • Have you built your formula locally with brew install --build-from-source <formula>, where <formula> is the name of the formula you're submitting?
  • Is your test running fine brew test <formula>, where <formula> is the name of the formula you're submitting?
  • Does your build pass brew audit --strict <formula> (after doing brew install <formula>)?

Following the indications in PR #36786, this PR includes a new tesseract-lang formula that depends on tesseract and installs all the language data files (besides English and osd, needed to run the software) in its share folder.

The original tesseract formula is also updated to embed only the essential data files, and amended to search for language data files in #{HOMEBREW_PREFIX}/share. Symlinks do the rest of the trick.

In this way, the installed size of tesseract drops to 30 MB, while the size of tesseract-langis about 650 MB.

The test block of both formulas has also been updated to perform an actual OCR task on an example taken from the upstream test repository. The test is made using only English in tesseractand using also German in tesseract-lang.

@albertosottile
Copy link
Contributor Author

Side note: I used inreplace in the updated tesseract formula to show that only the makefiles relative to tessdata are affected by this change, but of course the three replaces could be removed in favor of passing datarootdir=#{share} to make install.

@fxcoudert fxcoudert added the new formula PR adds a new formula to Homebrew/homebrew-core label Feb 14, 2019
Copy link
Member

@fxcoudert fxcoudert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Please bump the revision number of the main formula, and add a caveats so users are made aware that extra languages are available under another formula.

Formula/tesseract-lang.rb Outdated Show resolved Hide resolved
Formula/tesseract-lang.rb Outdated Show resolved Hide resolved
Formula/tesseract-lang.rb Outdated Show resolved Hide resolved
Formula/tesseract.rb Outdated Show resolved Hide resolved
Formula/tesseract.rb Outdated Show resolved Hide resolved
Formula/tesseract-lang.rb Outdated Show resolved Hide resolved
Formula/tesseract.rb Outdated Show resolved Hide resolved
@fxcoudert fxcoudert added the almost there PR is nearly ready to merge label Feb 15, 2019
@fxcoudert
Copy link
Member

Finally (and I believe it will be ready to merge): please squash everything into two commits, one per formula (tesseract-lang 4.0.0 (new formula) and tesseract: remove languages).

Introduces a formula that contains all the language data files for tesseract.
The tesseract formula is patched to include only English and needed packages.
@albertosottile
Copy link
Contributor Author

Done. Anything else that should be done?

@fxcoudert
Copy link
Member

Many thanks @albertosottile for taking care of this!

@fxcoudert fxcoudert closed this in 584ba36 Feb 16, 2019
@fxcoudert
Copy link
Member

For reference: the compressed tesseract bottles have gone back from 339.9 MB to 11.8 MB, while the new tesseract-lang bottles are 328.1 MB.

kaazoo pushed a commit to kaazoo/homebrew-core that referenced this pull request Feb 19, 2019
Closes Homebrew#36988.

Signed-off-by: FX Coudert <fxcoudert@gmail.com>
@lock lock bot added the outdated PR was locked due to age label Mar 18, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Mar 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
almost there PR is nearly ready to merge new formula PR adds a new formula to Homebrew/homebrew-core outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants