Single-pass DjVu encoder based on DjVu Libre, minidjvu and ImageMagick
Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
NEWS
README
TODO
img2djvu

README

This is a single-pass DjVu encoder based on DjVu Libre and ImageMagick, optionally also minidjvu and ocrodjvu plus cuneiform

img2djvu is in a public domain. Many thanks to members of forum.ru-board.com for ideas, suggestions and corrections, especially to U235, anagnost96 and monday2000.

===

SIMPLE USE

> img2djvu out

(where "out" is a name of folder wich contains some images)

img2djvu will output some diagnostics: first, total number of files in folder, then will show processing status of every file. Skipped non-graphic files are labeled with dots (.), graphic files are labeled with angle brackets (<>). Black and white pages are labeled with B or M (if minidjvu used), color pages with C, F (if cpaldjvu used) or L (if layer separation used). OCR labeled as an additional layer (+1) or additional plus (+). Aggressivity is the number of enclosed brackets (e.g., <<<>>> for -a 2).

===

ADVANCED USE

> img2djvu -d 600 out

Will set resolution to 600 dpi (default is 300 dpi)

> img2djvu -t 17 out

Among color files, only files with color count < 17 will be sent to cpaldjvu DjVu coder

> img2djvu -t 0 out

Will skip ImageMagick color count (this is much faster!) and send all color files to c44 DjVu coder; cpaldjvu will not run

> img2djvu -m 10 out

For black and white pages, will use minidjvu DjVu coder (with 10 pages per dictionary) instead of default cjb2 coder

> img2djvu -l 1 out

For color pages, img2djvu will perform layer separation*** (assuming that the file is the result of Scan Tailor* mixed mode output**), then blur color part and start forced segmentation****; it is slow but usually produce compact output.

===

*    Scan Tailor: http://scantailor.sourceforge.net/

**   Mixed mode outputs image colors in [1...254] rank whereas text is pure black [0] and page background is pure white [255]

***  Layer separation after Scan Tailor: http://forum.ru-board.com/topic.cgi?forum=5&topic=32945 (in Russian), http://alexrey036.narod.ru/LayerTailor/LayerTailor.zip

**** Forced segmentation: http://djvu-soft.narod.ru/scan/back_glue.htm (in Russian)

> img2djvu -a 2 -m 20 out

Will produce more compact output; however, check the result carefully---artifacts could appear!

> img2djvu -l 4 out

Will use four-fold downsampling for color layers; result will be extremely compact. Coefficient should be an integer between 1 and 12; best choices are 2, 3, or 4.

> img2djvu -p "" out

Will NOT use blur and contrast for processing color layers

> img2djvu -l 1 -r rus -e cuneiform -j 2 -a 1 out

After creation of the final DjVu, will run two OCR jobs of cuneiform with "-rus" language option via ocrodjvu and insert text layer in place

> img2djvu -c 1 out

Will create temporary files in current directory

===

CAVEATS

Absolute paths are not allowed

It is expected that all images inside a folder have the same resolution

Folder and image names are very important; they, for example, determine the page sequence in future DjVu file. It is strongly recommended to rename files sequentially before im2djvu run. Also, please avoid using anything except [0-9a-z_-] in the file and folder names. No spaces and non-ASCII letters, please!

It is NOT recommended to use -l option with files which did not come from Scan Tailor output