Skip to content

Commit

Permalink
you do not need download all tessdata repository
Browse files Browse the repository at this point in the history
  • Loading branch information
zdenop committed Aug 31, 2016
1 parent 77af7cf commit b14f735
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions INSTALL.GIT.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ So, the steps for making Tesseract are:
$ sudo make training-install

You need to install at least English language data file to TESSDATA_PREFIX
directory. All language data files can be retrieved from git repository:
directory. You can retrieve single file with tools like [wget](https://www.gnu.org/software/wget/), [curl](https://curl.haxx.se/), [GithubDownloader](https://github.com/intezer/GithubDownloader) or browser.

All language data files can be retrieved from git repository (usefull only for packagers!):

$ git clone https://github.com/tesseract-ocr/tessdata.git tesseract-ocr.tessdata

(Repository is huge - more that 1.2 GB. You do not need to download
all languages)
all languages).

To compile ScrollView.jar you need to download piccolo2d-core-3.0.jar
and [piccolo2d-extras-3.0.jar](http://search.maven.org/#search|ga|1|g%3A%22org.piccolo2d%22) and place them to tesseract/java.
Expand Down

8 comments on commit b14f735

@amitdo
Copy link
Collaborator

@amitdo amitdo commented on b14f735 Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdenop, users should also download osd traineddata.

@Shreeshrii
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both osd and eng trained data are Required to be downloaded by user, does it not make sense to include both of these as part of tesseract base package.

@zdenop
Copy link
Contributor Author

@zdenop zdenop commented on b14f735 Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shreeshrii: leptonica is also must for tesseract and we do not included it to tesseract library. ;-)

@amitdo
Copy link
Collaborator

@amitdo amitdo commented on b14f735 Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Shreeshrii.

@Shreeshrii
Copy link
Collaborator

@Shreeshrii Shreeshrii commented on b14f735 Sep 1, 2016 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdenop
Copy link
Contributor Author

@zdenop zdenop commented on b14f735 Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for discussion on dev forum not here ;-)

@Shreeshrii
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me You need to install at least English language data file to TESSDATA_PREFIX directory implies that english traineddata files are MANDATORY / REQUIRED for correct functioning of tesseract.

IMHO, It would make it much easier for users/developers using git as source if this along with osd.traineddata and other required files were included with source.

Probably an easy way would be to add sym links? from tesseract-ocr/tesseract/tessdata to tesseract-ocr/tessdata so that files have to updated only in tessdata repository.

@Shreeshrii
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdenop OK.

Please sign in to comment.