Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ptsize' is not respected when '--find_fonts' and '--render_per_font' are used #6

Closed
Shreeshrii opened this issue Mar 27, 2016 · 3 comments

Comments

@Shreeshrii
Copy link
Contributor

It is possible that this is by design and that the find_fonts feature is just to be used as a sample of what the different text faces look like.

@Shreeshrii
Copy link
Contributor Author

$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text --font Kokila  --outputbase san.Kokila.exp0 --ptsize=32

Rendered page 0 to file san.Kokila.exp0.tif
Rendered page 1 to file san.Kokila.exp0.tif
Rendered page 2 to file san.Kokila.exp0.tif
Rendered page 3 to file san.Kokila.exp0.tif
Rendered page 4 to file san.Kokila.exp0.tif

creates the image at text of 32pt.

san kokila exp-1

But when --find_fonts and --render_per_font is used --ptsize=32 does not have any effect and only one page is output at default size of 12 pt.

$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text  --outputbase san.exp-1 --ptsize=32 --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --find_fonts --min_coverage=.9 --degrade_image=1 --underline_start_prob=.05 --underline_continuation_prob=.01 


Font AR JULIAN Medium failed with 12996 hits = 16.92%
Aksharyogini2 : 76815 hits = 100.00%, raw = 118 = 99.16%
Rendered page 0 to file san.exp-1.Aksharyogini2.tif
Aksharyogini2 Bold : 76815 hits = 100.00%, raw = 118 = 99.16%
Rendered page 1 to file san.exp-1.Aksharyogini2_Bold.tif
Aparajita : 76816 hits = 100.00%, raw = 119 = 100.00%
Rendered page 2 to file san.exp-1.Aparajita.tif
Aparajita Bold : 76816 hits = 100.00%, raw = 119 = 100.00%
Rendered page 3 to file san.exp-1.Aparajita_Bold.tif
Aparajita Bold Italic : 76816 hits = 100.00%, raw = 119 = 100.00%
Rendered page 4 to file san.exp-1.Aparajita_Bold_Italic.tif
Aparajita Italic : 76816 hits = 100.00%, raw = 119 = 100.00%
Rendered page 5 to file san.exp-1.Aparajita_Italic.tif
Font Arial failed with 12997 hits = 16.92%
Font Arial Bold failed with 12997 hits = 16.92%

Sample Output with find fonts is attached after converting to png.
Please note the first line in English (mingw64 is giving error regarding this line, I think)

san exp-1 aksharyogini2_bold
san exp-1 aksharyogini2
eng arial exp0
ara arial exp0

@amitdo
Copy link
Owner

amitdo commented Mar 27, 2016

It is possible that this is by design and that the find_fonts feature is just to be used as a sample of what the different text faces look like.

Yes, I believe it's by design.

./text2tif --text=../langdata/eng/eng.training.txt --outputbase=../langdata/gen/eng --fonts_dir=/usr/share/fonts --find_fonts --min_coverage=1.0
ls ../langdata/gen | sed 's/eng.//' | sed 's/.tif//'

Or, even better:

./text2tif --text=../langdata/eng/eng.training.txt --outputbase=../langdata/gen/eng --fonts_dir=/usr/share/fonts --find_fonts --min_coverage=1.0 --render_per_font=false

This will produce 'eng.fontlist.txt' file.

I added the second command form to TrainingTesseract wiki page.

@Shreeshrii
Copy link
Contributor Author

This will be very useful. Thanks!

ShreeDevi


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sun, Mar 27, 2016 at 6:43 PM, Amit Dovev notifications@github.com
wrote:

It is possible that this is by design and that the find_fonts feature is
just to be used as a sample of what the different text faces look like.

Yes, I believe it's by design.

./text2tif --text=../langdata/eng/eng.training.txt --outputbase=../langdata/gen/eng --fonts_dir=/usr/share/fonts --find_fonts --min_coverage=1.0
ls ../langdata/gen | sed 's/eng.//'

Or, even better:

./text2tif --text=../langdata/eng/eng.training.txt --outputbase=../langdata/gen/eng --fonts_dir=/usr/share/fonts --find_fonts --min_coverage=1.0 --render_per_font=false

This will produce 'eng.fontlist.txt' file.

I added the second command form to TrainingTesseract
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract#automated-method-new-in-303
wiki page.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#6 (comment)

@amitdo amitdo closed this as completed Mar 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants