Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows Tesseract PSM up to 13 #177

Closed
wants to merge 1 commit into from
Closed

Allows Tesseract PSM up to 13 #177

wants to merge 1 commit into from

Conversation

rferreira
Copy link

This also includes allowing mode 0 which was previously not matched by the regex
Changed wording on exception to clearly reference page segmentation

This also includes allowing mode 0 which was previously not matched by the regex
Changed wording on exception to clearly reference page segmentation
@rferreira
Copy link
Author

rferreira commented May 4, 2017

For context on supported PSM (mode 11 works particularly well for us):

$ tesseract --version
tesseract 3.05.00
 leptonica-1.74.1
  libjpeg 8d : libpng 1.6.29 : libtiff 4.0.7 : zlib 1.2.8

$ tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
			bypassing hacks that are Tesseract-specific.

@rferreira
Copy link
Author

hey folks, any thoughts on this PR?

@tballison
Copy link
Contributor

I'll put this in this morning. I want it in Tika 1.15. Thank you!

@dameikle
Copy link
Member

dameikle commented May 8, 2017

Hi @tballison - was about to do this, shall I stop?

@tballison
Copy link
Contributor

tballison commented May 8, 2017

Er, please do add it. Thank you!

@dameikle
Copy link
Member

dameikle commented May 8, 2017

Merged in 0aaa121

Thanks Rafael!

@dameikle dameikle closed this May 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants