This repository has been archived by the owner. It is now read-only.

Tesseract: add with-training-tools and with-opencl options #43223

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
5 participants
@ryanfb
Contributor

ryanfb commented Aug 24, 2015

No description provided.

depends_on "libtiff" => :recommended
depends_on "leptonica"
+ if build.with? "training-tools"

This comment has been minimized.

@DomT4

DomT4 Aug 24, 2015

Contributor

How widely used is this likely to be? I'm a little bleh about introducing so many optional dependencies here unless it's going to prove popular/widely useful/etc.

@DomT4

DomT4 Aug 24, 2015

Contributor

How widely used is this likely to be? I'm a little bleh about introducing so many optional dependencies here unless it's going to prove popular/widely useful/etc.

This comment has been minimized.

@ryanfb

ryanfb Aug 24, 2015

Contributor

If you want to use Tesseract with anything other than pre-built .traineddata etc. files, you need the training tools. I don't have a sense for the general need for that, but a lot of the discussion I see on the Tesseract-OCR Google Group is about custom OCR that needs the training tools.

The other issue may be that due to the way some of the training tools use Pango/Cairo font rendering there seem to be a lot of issues with them on OS X vs. Linux. Part of why I'd like to have this as an option is so that people on OS X who do need to do OCR training can start to report those issues upstream in Tesseract and start getting them fixed.

@ryanfb

ryanfb Aug 24, 2015

Contributor

If you want to use Tesseract with anything other than pre-built .traineddata etc. files, you need the training tools. I don't have a sense for the general need for that, but a lot of the discussion I see on the Tesseract-OCR Google Group is about custom OCR that needs the training tools.

The other issue may be that due to the way some of the training tools use Pango/Cairo font rendering there seem to be a lot of issues with them on OS X vs. Linux. Part of why I'd like to have this as an option is so that people on OS X who do need to do OCR training can start to report those issues upstream in Tesseract and start getting them fixed.

This comment has been minimized.

@DomT4

DomT4 Aug 24, 2015

Contributor

I see. I'm not super-opposed to the option, I guess, it's just the rather heavy dependencies involved.

Sadly, the way brew handles if build.with? dependencies is less than ideal and consequently we've been trying to get rid of if build.with? outside of def install. Essentially the option looks like it doesn't require anything extra under brew info tesseract:

==> Dependencies
Required: leptonica ✘
Recommended: libtiff ✔
==> Options
--all-languages
    Install recognition data for all languages
--with-opencl
    Enable OpenCL support
--with-training-tools
    Install OCR training tools

But obviously, there's actually a bunch of "silent" dependencies there when the option is invoked. You can pass --with-training-tools to brew info tesseract to get the "true" dependencies list (minus X11), but that's not particularly intuitive.

I'll let the other maintainers weigh in on this particular use case and see what the consensus is.

@DomT4

DomT4 Aug 24, 2015

Contributor

I see. I'm not super-opposed to the option, I guess, it's just the rather heavy dependencies involved.

Sadly, the way brew handles if build.with? dependencies is less than ideal and consequently we've been trying to get rid of if build.with? outside of def install. Essentially the option looks like it doesn't require anything extra under brew info tesseract:

==> Dependencies
Required: leptonica ✘
Recommended: libtiff ✔
==> Options
--all-languages
    Install recognition data for all languages
--with-opencl
    Enable OpenCL support
--with-training-tools
    Install OCR training tools

But obviously, there's actually a bunch of "silent" dependencies there when the option is invoked. You can pass --with-training-tools to brew info tesseract to get the "true" dependencies list (minus X11), but that's not particularly intuitive.

I'll let the other maintainers weigh in on this particular use case and see what the consensus is.

This comment has been minimized.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Ah, ok. It wasn't clear to me from the Homebrew example formula if there was a better way of doing depends_on for options.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Ah, ok. It wasn't clear to me from the Homebrew example formula if there was a better way of doing depends_on for options.

Library/Formula/tesseract.rb
@@ -17,6 +17,7 @@ class Tesseract < Formula
depends_on "automake" => :build
depends_on "libtool" => :build
depends_on "pkg-config" => :build
+ depends_on "cairo" => :build

This comment has been minimized.

@DomT4

DomT4 Aug 24, 2015

Contributor

Why is this necessary now, but not previously?

@DomT4

DomT4 Aug 24, 2015

Contributor

Why is this necessary now, but not previously?

This comment has been minimized.

@ryanfb

ryanfb Aug 24, 2015

Contributor

There appears to be a change in Tesseract HEAD somewhere along the way (since 3.04.00) that introduced a warning about Cairo during the default ./configure step (I believe it's only used for training-tools though).

@ryanfb

ryanfb Aug 24, 2015

Contributor

There appears to be a change in Tesseract HEAD somewhere along the way (since 3.04.00) that introduced a warning about Cairo during the default ./configure step (I believe it's only used for training-tools though).

This comment has been minimized.

@tdsmith

tdsmith Aug 24, 2015

Contributor

Cairo is unlikely to be a :build dependency; the shared libraries are probably needed once the tools are installed.

@tdsmith

tdsmith Aug 24, 2015

Contributor

Cairo is unlikely to be a :build dependency; the shared libraries are probably needed once the tools are installed.

This comment has been minimized.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Removed and squashed/pushed - seems like the warning only occurs during a ./configure issue with default clang + OpenMP on current Tesseract HEAD (see tesseract-ocr/tesseract#76).

@ryanfb

ryanfb Aug 24, 2015

Contributor

Removed and squashed/pushed - seems like the warning only occurs during a ./configure issue with default clang + OpenMP on current Tesseract HEAD (see tesseract-ocr/tesseract#76).

Library/Formula/tesseract.rb
@@ -58,8 +70,16 @@ def install
ENV.cxx11
system "./autogen.sh" if build.head?
- system "./configure", "--disable-dependency-tracking", "--prefix=#{prefix}"
+ if build.with? "opencl"

This comment has been minimized.

@DomT4

DomT4 Aug 24, 2015

Contributor

Just use an args array here, rather than duplicating the whole block.

args = %W[
  --disable-dependency-tracking
  --prefix=#{prefix}
]

args << "--enable-opencl" if build.with? "opencl"

system "./configure", *args
@DomT4

DomT4 Aug 24, 2015

Contributor

Just use an args array here, rather than duplicating the whole block.

args = %W[
  --disable-dependency-tracking
  --prefix=#{prefix}
]

args << "--enable-opencl" if build.with? "opencl"

system "./configure", *args

This comment has been minimized.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Done and squashed/pushed.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Done and squashed/pushed.

Library/Formula/tesseract.rb
system "make", "install"
+ if build.with? "training-tools"
+ system "make training"

This comment has been minimized.

@DomT4

DomT4 Aug 24, 2015

Contributor

system "make", "training"

@DomT4

DomT4 Aug 24, 2015

Contributor

system "make", "training"

This comment has been minimized.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Done and squashed/pushed.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Done and squashed/pushed.

Library/Formula/tesseract.rb
system "make", "install"
+ if build.with? "training-tools"
+ system "make training"
+ system "make training-install"

This comment has been minimized.

@DomT4

DomT4 Aug 24, 2015

Contributor

system "make", "training-install"

@DomT4

DomT4 Aug 24, 2015

Contributor

system "make", "training-install"

This comment has been minimized.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Done and squashed/pushed.

@ryanfb

ryanfb Aug 24, 2015

Contributor

Done and squashed/pushed.

@tdsmith

This comment has been minimized.

Show comment
Hide comment
@tdsmith

tdsmith Aug 24, 2015

Contributor

OpenCL is supported on 10.6+ so let's turn it on by default.

Contributor

tdsmith commented Aug 24, 2015

OpenCL is supported on 10.6+ so let's turn it on by default.

@ryanfb

This comment has been minimized.

Show comment
Hide comment
@ryanfb

ryanfb Aug 24, 2015

Contributor

@tdsmith While OpenCL is supported in OS X, I would say it's more experimental in Tesseract from my experience, so I'd rather sacrifice it as an option than turn it on by default.

Contributor

ryanfb commented Aug 24, 2015

@tdsmith While OpenCL is supported in OS X, I would say it's more experimental in Tesseract from my experience, so I'd rather sacrifice it as an option than turn it on by default.

Tesseract: add with-training-tools and with-opencl options
Use an args array instead of duplicating the block

Fix arguments to system calls

Remove default cairo build dependency in HEAD
@MikeMcQuaid

This comment has been minimized.

Show comment
Hide comment
@MikeMcQuaid

MikeMcQuaid Aug 26, 2015

Member

Let's leave this open for a while to see if people 👍

Member

MikeMcQuaid commented Aug 26, 2015

Let's leave this open for a while to see if people 👍

@callmewhy

This comment has been minimized.

Show comment
Hide comment
@callmewhy

callmewhy Aug 30, 2015

--with-training-tools is really necessary if you want to train your own traineddata. I look for this anwser for a long time...Hope it can be mergeed soon : )

--with-training-tools is really necessary if you want to train your own traineddata. I look for this anwser for a long time...Hope it can be mergeed soon : )

@MikeMcQuaid

This comment has been minimized.

Show comment
Hide comment
@MikeMcQuaid

MikeMcQuaid Sep 6, 2015

Member

Thanks for your contribution to Homebrew! Without people like you submitting PRs we couldn't run this project. You rock!

Member

MikeMcQuaid commented Sep 6, 2015

Thanks for your contribution to Homebrew! Without people like you submitting PRs we couldn't run this project. You rock!

@Homebrew Homebrew locked and limited conversation to collaborators Jul 10, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.